Rationale and Objectives
We conducted an observer study to investigate whether radiologists can judge similarities in pairs of breast masses and lung nodules consistently and reproducibly.
Materials and Methods
Institutional review board approval and informed observer consent were obtained. This study was compliant with the Health Insurance Portability and Accountability Act. We used eight pairs of breast masses on mammograms and eight pairs of lung nodules on computed tomographic images, for which subjective similarity ratings ranging from 0 to 1 were determined in our previous studies. From these, four sets of image pairs were created (ie, a set of eight mass pairs, a set of eight nodule pairs, and two mixed sets of four mass and four nodule pairs). Eight radiologists, including four breast radiologists and four chest radiologists, compared all combinations of the eight pairs in each set using a two-alternative forced-choice (2AFC) method to determine the similarity ranking scores by identifying which pair was more similar than the other pair based on the overall impression for diagnosis.
Results
In the mass set and nodule set, the relationship between the average subjective similarity ratings and the average similarity ranking scores by 2AFC indicated very high correlations ( r = 0.91 and 0.88). Moreover, in the two mixed sets, the correlations between the average subjective similarity ratings and the average similarity ranking scores were also very high ( r = 0.90 and 0.98). Thus, radiologists were able to compare the similarities for pairs of lesions consistently, even in the unusual comparison of pairs of completely different types of lesions.
Conclusion
The subjective similarity of a pair of lesions in medical images can be quantified consistently by a group of radiologists. The concept of similarity of lesions in medical images can be subjected to rigorous scientific research and investigation in the future.
In radiologic diagnosis, the term similar images generally implies that one lesion in an image pair closely resembles another lesion in the pair based on radiologists’ diagnostic criteria ( ). In many clinical situations, availability of similar images could be very useful for radiologists ( ). For diagnosis of a new unknown lesion, for example, radiologists routinely make use of similar images with known pathology from textbooks, teaching files, and past clinical cases. For less experienced radiologists, it is common to view teaching files with many images with similar lesions for improving their interpretive skills as a part of their training. To retrieve similar images automatically from a database, it is necessary to quantify the radiologists’ impression of similarity of lesions when one lesion is compared with another in medical images. In previous studies, investigators have attempted to measure subjective similarities of image pairs for masses ( ) and clustered microcalcifications ( ) on mammograms and for lung nodules on computed tomographic (CT) images ( ) by groups of radiologists in observer performance studies. However, if the subjective similarities of image pairs by radiologists were to differ from modality to modality and/or from lesion to lesion, it would be irreconcilably difficult to quantify the similarities of image pairs of lesions. It is thus crucially important to understand whether the concept of similarity in medical images can be generalized independent of the types of lesions and/or imaging modalities. Therefore, we consider it is necessary to further investigate the consistency and reproducibility of radiologists’ subjective judgment regarding similarities of many pairs of lesions to establish a method for searching reliably similar images to assist radiologists’ image interpretation.
In this study, to investigate whether radiologists can judge the similarities of image pairs consistently and reproducibly, we conducted an observer study on comparison of image pairs of lesions (ie, mass pairs and nodule pairs) using a two-alternative forced-choice (2AFC) method for the selection of more similar image pairs based on the overall impression for diagnosis. We decided to employ the 2AFC method, also known as a paired comparison method ( ), because it is highly sensitive in the distinction of a small difference in comparison of two patterns placed side by side. In this study, we obtained subjective similarity ranking scores by radiologists, which can indicate the relative rankings of similarities among the many image pairs examined.
Materials and methods
Get Radiology Tree app to read full this article<
Databases
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Observer Study
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Statistical Analysis
Get Radiology Tree app to read full this article<
Results
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Discussion
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
Acknowledgments
Get Radiology Tree app to read full this article<
Get Radiology Tree app to read full this article<
References
1. Doi K.: Current status and future potential of computer-aided diagnosis in medical imaging. Br J Radiol 2005; 78: pp. s3-s19.
2. Doi K.: Computer-aided diagnosis in medical imaging: historical review, current status and future potential. Comput Med Imaging Graph 2007; 31: pp. 198-211.
3. Doi K.: Computer-aided diagnosis moves from breast to other systems. Diagnostic Imaging 2007; pp. 37-40.
4. Mizuno K., Sasaki Y., Ishikawa I., et. al.: Evaluation of clinical usefulness of web-based results on distribution of brain MRI. Nippon Igaku Hoshasen Gakkai Zasshi 2003; 63: pp. 585-587. (article in Japanese)
5. Muramatsu C., Li Q., Suzuki K., et. al.: Investigation of psychophysical measure for evaluation of similar images for mammographic masses: preliminary results. Med Phys 2005; 32: pp. 2295-2304.
6. Zheng B., Lu A., Hardesty L.A., et. al.: A method to improve visual similarity of breast masses for an interactive computer-aided diagnosis environment. Med Phys 2006; 33: pp. 111-117.
7. Muramatsu C., Li Q., Schmidt R.A., et. al.: Experimental determination of subjective similarity for pairs of clustered microcalcifications on mammograms: observer study results. Med Phys 2006; 33: pp. 3460-3468.
8. Nishikawa R.M., Yang Y., Huo D., et. al.: Observers’ ability to judge the similarity of clustered calcifications on mammograms. Proc SPIE 2004; 5372: pp. 192-198.
9. Li Q., Li F., Shiraishi J., et. al.: Investigation of new psychophysical measures for evaluation of similar images on thoracic computed tomography for distinction between benign and malignant nodules. Med Phys 2003; 30: pp. 2584-2593.
10. Kendall M., Gibbons J.D.: Rank correlation methods.5th ed.1990.Oxford University PressNew York
11. Heath M., Bowyer K., Kopans D., et. al.: Current status of the digital database for screening mammography.1998.Kluwer Academic PublishersDordrecht, Germany:pp. 457-460.
12. Muramatsu C., Li Q., Schmidt R.A., et. al.: Determination of subjective similarity for pairs of masses and pairs of clustered microcalcifications on mammograms: comparison of similarity ranking scores and absolute similarity ratings. Med Phys 2007; 34: pp. 2890-2895.
13. Sone S., Takashima S., Li F., et. al.: Mass screening for lung cancer with mobile spiral computed tomography scanner. Lancet 1998; 351: pp. 1242-1245.
14. Li F., Sone S., Abe H., et. al.: Lung cancers missed at low-dose helical CT screening in a general population: comparison of clinical, histopathologic, and imaging findings. Radiology 2002; 225: pp. 673-683.
15. Li F., Aoyama M., Shiraishi J., et. al.: Radiologists’ performance for differentiating benign from malignant lung nodules on high-resolution CT using computer-estimated likelihood of malignancy. AJR Am J Roentgenol 2004; 183: pp. 1209-1215.
16. Li F., Arimura H., Suzuki K., et. al.: Computer-aided detection of peripheral lung cancers missed at CT: ROC analyses without and with localization. Radiology 2005; 237: pp. 684-690.
17. Tagare H.D., Jaffe C.C., Duncan J.: Medical image databases: a content-based retrieval approach. J Am Med Inform Assoc 1997; 4: pp. 184-198.
18. Petrakis E.G.M., Faloutsos A.: Similarity searching in medical image databases. IEEE Trans Knowl Data Engin 1997; 9: pp. 435-447.
19. Aisen A.M., Broderick L.S., Winer-Muram H., et. al.: Automated storage and retrieval of thin-section CT images to assist diagnosis: system description and preliminary assessment. Radiology 2003; 228: pp. 265-270.
20. Qi H., Snyder W.E.: Content-based image retrieval in picture archiving and communications systems. J Digit Imaging 1999; 12: pp. 81-83.
21. Dy J.G., Brodley C.E., Kak A., et. al.: Unsupervised feature selection applied to content-based retrieval of lung images. IEEE Trans Pattern Anal Machine Intell 2003; 25: pp. 373-378.
22. Lehmann T.M., Plodowski B., Spitzer K., et. al.: Extended query refinement for content-based access to large medical-image databases. Proc SPIE 2004; 5371: pp. 90-98.
23. El-Naqa I., Yang Y., Galatsanos N.P., et. al.: A similarity learning approach to content-based image retrieval: application to digital mammography. IEEE Trans Med Imaging 2004; 23: pp. 1233-1244.
24. Tourassi G.D., Harrawood B., Singh S., et. al.: Evaluation of information-theoretic similarity measures for content-based retrieval and detection of masses in mammograms. Med Phys 2007; 34: pp. 140-150.
25. Perner P.: Image mining: issues, framework, a generic tool and its application to medical-image diagnosis. Engin Appl Artif Intell 2002; 15: pp. 205-216.
26. Chen W., Meer P., Georgescu B., et. al.: Image mining for investigative pathology using optimized feature extraction and data fusion. Comput Methods Programs Biomed 2005; 79: pp. 59-72.
27. Müller H., Clough P., Hersh W., et. al.: Evaluation axes for medical image retrieval systems: the image CLEF experience. Proc ACM Multimedia 2005; pp. 1014-1022.