br schemes usually requires a large and diverse image dataset
schemes usually requires a large and diverse image dataset (e.g., using a dataset with 128,175 retinal images ), which is often unavailable in cancer imaging field.
In order to overcome these limitations, our recent studies indicate that quantitative image markers extracted from the whole breast areas of mammograms (namely, the global features) can be used to predict short-term breast cancer risk with significantly higher prediction power [16, 17]. Thus, we hypothesized in this study that it GsMTx4 is possible to identify and fuse the global image features computed from the whole breast area depicting on mammograms without lesion segmentation, which enables to generate a new quantitative imaging marker for predicting the likelihood of a testing case being malignant. This new global mammographic image feature-based approach cannot only avoid lesion segmentation, but also reduce the requirement of large training dataset as the conventional deep learning approach . Thus. the objective of this study is to develop a new global mammographic image feature analysis-based CAD scheme and validate our study hypothesis. The experimental details are presented as follows. r> 2. Materials and Methods
2.1 Image Dataset
From the IRB-approved retrospective study protocols, we have assembled a digital mammography image database as reported in our previous studies (e.g., [4, 10, 11]). From the assembled image database, we selected an image dataset for this study, which consists of fully anonymized digital mammograms acquired from 275 women participants in breast cancer screening. Each case has one suspicious mass-type lesion identified and detected by the radiologists in original mammogram reading and interpretation. All suspicious lesions were biopsied and confirmed by histopathology examinations. Among these cases, 134 were confirmed to be malignant; while the other 141 cases were benign. In addition, cancer was detected only in one breast in this dataset.
All digital mammography screening examinations were performed using Hologic Selenia (Hologic Inc) full-field digital mammography (FFDM) systems. Each mammography screening case has four images including two cranio-caudal (CC) and two medio-lateral oblique (MLO) view images of left and right breasts. Since this study only focused on the cases depicting soft tissue mass type lesions, the mammograms were subsampled to reduce the image size, which is a
common practice used in CAD research field including the commercialized CAD schemes . Specifically, as reported in our previous computerized scheme , the original FFDM images with a pixel size of 0.07mm were pre-subsampled using the average pixel value computed from a 5 × 5 scanning window. Thus, the actual pixel size used in the subsampled image is 0.35mm.
As summarized in table 1, the mammographic density information of malignant and benign cases was identified by radiologists according to BIRADS guidelines, which shows no significant difference between two classes of malignant and benign cases using the BIRADS based mammographic density ratings. Figure 1 illustrates the example images of one malignant case and one benign case, each of which contains with four CC and MLO view images of the left and right breasts. Lesions of the example cases exhibit low conspicuity and high fuzziness, making it difficult for lesion segmentation and risk prediction.
Table 1. Distribution of mammographic density (BIRADS ratings) for two groups of the cases in dataset.
Characteristic Malignant cases Benign cases
Figure 1. Examples of malignant and benign cases with four view display. a) Malignant case. b) Benign case.
Our proposed CAD scheme was developed in the following three steps namely, feature computation, feature selection and case classification. We first built an initial feature pool containing four different groups of features. Next, a particle swarm optimization (PSO) algorithm was applied to select optimal features so that redundant features can be removed from the feature pool. Finally, a popular machine learning classifier namely support vector machine (SVM) was used to predict the risk or likelihood of a case being malignant. The CAD scheme was implemented in MATLAB software environment.