Home Objectively Measuring and Comparing Performance Levels of Diagnostic Imaging Systems and Practices
Post
Cancel

Objectively Measuring and Comparing Performance Levels of Diagnostic Imaging Systems and Practices

In this issue of Academic Radiology there is an important comprehensive review/tutorial article entitled “Assessment of Medical Imaging Systems and Computer Aids: A Tutorial Review”, authored by Drs. Wagner, Metz and Campbell ( ). The article focuses on an important paradigm which results in a receiver operating characteristic (ROC) curve, and other estimated performance curves ( ). The ROC curve is one of the more commonly used analytical tool for assessing and comparing the performance of diagnostic systems both with and/or without the diagnostician or the observer being incorporated as an integral part of the diagnostic system. Diagnostic medicine in general and diagnostic imaging in particular has been evolving gradually over the last few decades from being primarily an art to becoming, at least partially and in many instances largely, a science. As an integral part of this transition there has been an increasing interest in defining different performance measures (summary indices) that enable relevant assessments as well as comparisons of imaging systems and clinical practices ( ).

One of the more profound advancements in this field is the “fully crossed, multiple-reader, multiple-case” (MRMC) approach to executing and analyzing performance assessment studies. These studies are frequently performed under the ROC paradigm in which there is an underlying recognition that a multivariate approach is needed to account for different variance components or uncertainties associated with and generated by the cases and readers that are included in these studies ( ).

The authors provide a comprehensive discussion on the comparisons of performance between diagnostic systems using the very well established ROC paradigm and later expand the discussion to other related approaches such as the Free Response ROC (FROC) and the Localization ROC (LROC) ( ).

Most relevant to the experimentalist in this field is the detailed discussion in the article of several practical issues and concerns. These include but are not limited to approaches to handle a study when the clinical truth is not known, how to estimate the needed sample size in a particular study, when partial areas under the curves may be of interest rather than the total areas, and some of the issues associated with sequential readings, in particular as related to the assessment of computer aided diagnosis (CAD) ( ).

The rapid development in recent years of analytical tools to assess performance in general stemmed from the seminal work done by the authors of the article. In addition, the development and validation of the MRMC approach, which is so eloquently described in the article, remains one of the more important and most frequently used analytical tools to assess and compare overall performance of imaging systems and practices in the laboratory environment. However, when conducting a performance assessment study, whether under the ROC, LROC, FROC, or any other paradigm for that matter and in particular when observers are included in the study, one has to remember that it is often the associated practical issues that become very important factors to the experimentalist in assuring a successful study rather than the specific analytical tool being used to analyze the data ascertained. These practical issues include but are not limited to the selection of cases and controls, the definition of the abnormalities in question, the selection and training of the observers, the actual adherence (or not) of observers to the specific instructions, the appropriate handling of positive and negative cases with more than one abnormality in the case mix, and the actual well controlled and adequately quality assured execution of the study ( ).

It is important to emphasize that as much as possible the clinical problem at hand should drive (determine) the specific study design to be implemented rather than an available and very specific analytical tool. As we progress in this field and gain experience in executing ROC type studies we frequently realize that these types of studies fit many clinical paradigms but not all. It is only prudent for the investigator to assess the best available methodology for the problem being investigated and not fall into a mental trap that ROC is the optimal way to address all observer, CAD, or practice related performance issues. There are many scenarios where FROC or even sensitivity/specificity type studies may be more natural to the observer and perhaps more relevant to the clinical problem being investigated. In addition, one has to remember that all laboratory experiments are just that and there are very limited data showing that inferences generated from these laboratory studies are directly generalizeable to the clinic. The majority of references in this regard suggest to the contrary that this may not be the case in many situations ( ).

Get Radiology Tree app to read full this article<

References

  • 1. Wagner R.F., Metz C.E., Campbell G.: Assessment of medical imaging system and computer aids: a tutorial review. Acad Radiology 2007; 14: pp. 723-748.

  • 2. Dorfman D.D., Berbaum K.S., Metz C.E.: Receiver operating characteristic rating analysis. Invest Radiol 1992; 27: pp. 723-731.

  • 3. Obuchowski N.A., Beiden S.V., Berbaum K.S., et. al.: Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol 2004; 11: pp. 980-995.

  • 4. Chakraborty D.P., Berbaum K.S.: Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 2004; 31: pp. 2313-2330.

  • 5. Swensson R.G.: Unified measurement of observer performance in detecting and localizing target objects on images. Med Phys 1996; 23: pp. 1709-1725.

  • 6. Wagner R.F., Beiden S.V., Campbell G., Metz C.E., Sacks W.M.: Assessment of medical imaging and computer-assist systems: lessons from recent experience. Acad Radiol 2002; 9: pp. 1264-1277.

  • 7. Jiang Y., Nishikawa R.M., Schmidt R.A., Metz C.E., Giger M.L., Doi K.: Improving breast cancer diagnosis with computer-aided diagnosis. Acad Radiol 1999; 6: pp. 22-33.

  • 8. Pisano E.D., Gatsonis C., Hendrick E., et. al.: Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005; 353: pp. 1773-1783.

  • 9. Obuchowski N.A., Rockette H.E.: Hypothesis testing of the diagnostic accuracy for multiple diagnostic tests: An ANOVA approach with dependent observations. Communications in Statistics: Simulation and Computation 1995; 24: pp. 285-308.

  • 10. Toledano A.Y., Gatsonis C.: Ordinal regression methodology for ROC curves derived from correlated data. Stat Med 1996; 15: pp. 1807-1826.

  • 11. Chakraborty D.P.: A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 2006; 51: pp. 3449-3462.

  • 12. Revesz G., Kundel H.L., Bonitatibus M.: The effect of verification on the assessment of imaging techniques. Invest Radiol 1983; 18: pp. 194-198.

  • 13. Begg C.B., Metz C.E.: Consensus diagnoses and “Gold Standards”. Med Decis Making 1990; 10: pp. 29-30.

  • 14. Hillis S.L., Berbaum K.S.: Power estimation for the Dorfman-Berbaum-Metz method. Acad Radiol 2004; 11: pp. 1260-1273.

  • 15. Jiang Y., Metz C.E., Nishikawa R.M.: A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996; 201: pp. 745-750.

  • 16. McClish D.K.: Analyzing a portion of the ROC curve. Med Decis Making 1989; 9: pp. 190-195.

  • 17. Metz C.E.: Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol 1989; 24: pp. 234-245.

  • 18. Gur D., King J.L., Rockette H.E., Britton C.A., Thaete F.L., Hoy R.J.: Practical issues of experimental ROC: Selection of controls. Invest Radiol 1990; 25: pp. 583. 586

  • 19. Rutter C.M., Taplin S.: Assessing mammographers’ accuracy. J Clin Epidemiol 2000; 53: pp. 443-450.

This post is licensed under CC BY 4.0 by the author.