Home Comparative Statistical Properties of Expected Utility and Area Under the ROC Curve for Laboratory Studies of Observer Performance in Screening Mammography
Post
Cancel

Comparative Statistical Properties of Expected Utility and Area Under the ROC Curve for Laboratory Studies of Observer Performance in Screening Mammography

Rationale and Objectives

Our objective is to determine whether expected utility (EU) and the area under the receiver operator characteristic (AUC) are consistent with one another as endpoints of observer performance studies in mammography. These two measures characterize receiver operator characteristic performance somewhat differently. We compare these two study endpoints at the level of individual reader effects, statistical inference, and components of variance across readers and cases.

Materials and Methods

We reanalyze three previously published laboratory observer performance studies that investigate various x-ray breast imaging modalities using EU and AUC. The EU measure is based on recent estimates of relative utility for screening mammography.

Results

The AUC and EU measures are correlated across readers for individual modalities ( r = 0.93) and differences in modalities ( r = 0.94 to 0.98). Statistical inference for modality effects based on multi-reader multi-case analysis is very similar, with significant results ( P < .05) in exactly the same conditions. Power analyses show mixed results across studies, with a small increase in power on average for EU that corresponds to approximately a 7% reduction in the number of readers. Despite a large number of crossing receiver operator characteristic curves (59% of readers), modality effects only rarely have opposite signs for EU and AUC (6%).

Conclusions

We do not find any evidence of systematic differences between EU and AUC in screening mammography observer studies. Thus, when utility approaches are viable (i.e., an appropriate value of relative utility exists), practical effects such as statistical efficiency may be used to choose study endpoints.

Laboratory observer performance studies using receiver operating characteristic (ROC) methodology have become a mainstay for demonstrating improvements in technology or methodology for radiological imaging . These studies are now found widely in the radiological literature and are often used as evidence in submissions to regulatory agencies such as the US Food and Drug Administration . Most commonly, ROC studies use a rating of suspicion (e.g., probability of malignancy) to determine the tradeoff between sensitivity and false-positive fraction in a diagnostic task. To generalize results to the population of patients and readers, a substantial effort is required to collect a sample of relevant cases and to evaluate a sample of readers from which inferences regarding the imaging modality are obtained .

These inferences are based on an index, or figure of merit, extracted from the ROC curve. The figure of merit summarizes a ROC curve with a single number representing overall performance that can be compared across readers, cases, and modalities. The predominant figure of merit for ROC studies of observer performance has been the area under the ROC curve (AUC), which is the diagnostic sensitivity averaged over all possible false-positive fractions . The AUC is independent of disease prevalence and can also be interpreted as a measure of class separability because it represents the probability that a case from the abnormal population will considered more suspicious than a case from the normal population.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Materials and methods

Figures of Merit for Observer Performance in ROC Studies

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Figure 1, This diagram shows how area under the receiver operating characteristic (ROC) curve (AUC) and expected utility (EU) are determined for a given ROC curve. A smooth ROC curve is fitted to observed (hypothetical) data using the contaminated binormal model and maximum likelihood fitting. The AUC is depicted in gray. Under the assumption that task utilities result in iso-utility lines with a given slope, the y -intercept of the highest iso-utility line that intersects the ROC curve defines the EU measure. Note that the iso-utility line is tangent to the ROC curve at the optimal operating point.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

EU=Max(TPF,FPF)∈R(TPF−βFPF). EU

=

Max

(

TPF,

FPF

)

R

(

TPF

β

FPF

)

.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Reader Data

Get Radiology Tree app to read full this article<

Table 1

Reader Studies

Data Study Identification M1 M2 M3 Readers Cases (P, N) DMIST D1 SFM: GE Soft-copy DM: GE Hard-copy DM: GE 120 (48, 72) D2 SFM: Fuji Soft-copy DM: Fuji Hard-copy DM: Fuji 12 98 (27, 71) D3 SFM: Fisher Soft-copy DM: Fisher Hard-copy DM: Fisher 6 115 (42, 73) University of Michigan UM No CAD Pre-CAD With CAD 10 253 (138, 115) Hologic H1 DM alone DM with 2-view DBT None 12 312 (48, 264) H2 DM alone DM with 1-view DBT DM with 2-view DBT 15 310 (51, 259)

CAD, computer-aided diagnosis; DBT, digital breast tomosynthesis; DM, digital mammography; M1, modality 1; M2, modality 2; M3, modality 3; N, negative; P, positive; SFM, screen-film mammography.

Get Radiology Tree app to read full this article<

DMIST reader studies

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

University of Michigan CAD study

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Hologic tomosynthesis studies

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

ROC Analysis and Inference

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Modality Effects and Crossing ROC Curves

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Results

Get Radiology Tree app to read full this article<

Table 2

DBM Inference

Study Identification FOM M1-Ave. M2-Ave. M3-Ave.P : All_P_ : M1-M2P : M1-M3P : M2-M3 D1 AUC 0.82 0.78 0.79 .1476 .1109 .2241 .4914 EU 0.51 0.45 0.45 .2598 .2408 .2062 .8624 D2 AUC 0.78 0.74 0.76 .2573 .1262 .5739 .1862 EU 0.43 0.37 0.41 .2995 .1405 .5820 .2664 D3 AUC 0.75 0.72 0.69 .4132 .5615 .1448 .5046 EU 0.41 0.35 0.33 .4012 .4095 .1612 .7125 UM AUC 0.79 0.81 0.84 .0015 ∗ .1184 .0043 ∗ .0022 ∗ EU 0.44 0.48 0.52 .0056 ∗ .1672 .0105 ∗ .0094 ∗ H1 AUC 0.81 0.89 NA .0029 ∗ NA NA NA EU 0.51 0.68 NA <.0001 ∗ NA NA NA H2 AUC 0.83 0.86 0.89 <.0001 ∗ .0057 ∗ .0004 ∗ .0262 ∗ EU 0.54 0.60 0.67 <.0001 ∗ .0031 ∗ <.0001 ∗ .0043 ∗

AUC, area under the receiver operating characteristic curve; EU, expected utility; FOM, figure of merit; M1-Ave, modality 1 average; M2-Ave, modality 2 average; M3-Ave, modality 3 average; NA, not available.

For each study considered, the table gives the AUC or EU FOMs averaged across readers in each modality as well as the P values from tests for modality differences in the whole study (all) and each paired modality comparison (e.g., M1-M2).

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Figure 2, Modality effects. The area under the receiver operating characteristic curve (AUC) and expected utility (EU) figures of merit are shown for each reader and modality in the scatterplot of performance measures (a) . Pairwise differences between modalities for each reader are shown for each of the three studies considered (b–d) with differences arranged so that the average difference across readers for any comparison is positive. In each study, the equation of the least-squares fitted line relating effect sizes is given. DMIST, Digital Mammographic Imaging Screening Trial; UM, University of Michigan.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Table 3

DBM Components of Variance for AUC and EU

Study Endpoint R C TR TC RC TRC D1 AUC 0.0012 0.0954 0 0.0221 0.0370 0.3222 EU 0.0040 0.2521 0 0.0685 0.0762 0.3634 D2 AUC 0.0009 0.1470 0 0.0229 0.0351 0.2036 EU 0.0029 0.5120 0.0008 0.0465 0.0844 0.3371 D3 AUC 0.0016 0.0962 0 0.0364 0.0701 0.6160 EU 0.0071 0.2369 0.0012 0.0813 0.1108 0.5799 UM AUC 0.0010 0.0741 0 0.0036 0.0538 0.1836 EU 0.0032 0.2625 0 0.0159 0.1687 0.9523 H1 AUC 0 0.1841 0.0011 0.0591 0.0841 0.2770 EU 0.0012 0.6267 0.0015 0.1524 0.1940 0.4981 H2 AUC 0 0.1932 0 0.0251 0.1215 0.2157 EU 0.0010 0.6263 0 0.0843 0.2397 0.7469

AUC, area under the receiver operating characteristic curve; C, case variance; EU, expected utility; R, reader variance; RC, reader by case interaction; TC, treatment by case interaction; TR, treatment by read interaction; TRC, residual error.

Table entries of 0 occur when the variance component estimate is less than 0.0001.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Figure 3, Crossing receiver operating characteristic (ROC) curves. The plot shows the fraction readers with crossing ROC curves in each study as well as the fraction of readers with modality differences in area under the ROC curve and expected utility that have different (opposite) signs. Note that both of these are elevated in the Digital Mammographic Imaging Screening Trial studies, in which there is less of a modality effect. UM, University of Michigan. See text for D1–D3, H1, and H2 study definitions.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Discussion

Get Radiology Tree app to read full this article<

Effect Sizes and Components of Variance

Get Radiology Tree app to read full this article<

Figure 4, The relative size of components of variance. Ratios of the elements of Table 2 are shown for each component of variance. The ratio is only shown for variance components greater than 0.0001. *The three components directly related to modality comparisons in the multireader, multicase design. AUC, area under the receiver operating characteristic curve; EU, expected utility; UM, University of Michigan. See text for D1–D3, H1, and H2 study definitions.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Power Calculation

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Figure 5, Power analysis. Plots of statistical power based on the Hillis and Berbaum method (35) are plotted as a function of the number of readers for area under the receiver operating characteristic curve (AUC) (a) and expected utility (EU) (b) figures of merit in each study; legend in (a) applies to both plots. The number of cases used in the power calculation is the same as the number in the actual study. The effect size was set to the largest difference in reader averaged performance across modalities, except in the Digital Mammographic Imaging Screening Trial studies, in which a default of 0.1 was used for AUC and 0.136 was used for EU based on the regression line in Figure 2 b. The number of readers needed to get 80% power (c) varies considerably from study to study. On average, EU results in a 7% reduction in the number of readers needed to achieve 80% power. See text for D1–D3, H1, and H2 study definitions.

Get Radiology Tree app to read full this article<

Crossing ROC Curves

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Summary and conclusions

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Acknowledgments

Get Radiology Tree app to read full this article<

Appendix

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Table A1

Additional Study Statistics

Study Endpoint SE-Mod. SE-Dif. CC D1 AUC 0.036 0.029 0.69 EU 0.058 0.041 0.75 D2 AUC 0.045 0.029 0.80 EU 0.080 0.041 0.87 D3 AUC 0.049 0.049 0.50 EU 0.072 0.059 0.66 UM AUC 0.022 0.013 0.83 EU 0.043 0.030 0.76 H1 AUC 0.031 0.027 0.63 EU 0.054 0.039 0.74 H2 AUC 0.028 0.016 0.84 EU 0.051 0.029 0.83

AUC, area under the receiver operating characteristic curve; CC, Pearson correlation coefficient between modalities; EU, expected utility; SE-Dif, standard error estimates for modality differences; SE-Mod, standard error estimates for performance in a given modality.

Get Radiology Tree app to read full this article<

References

  • 1. Goodenough D.J., Rossmann K., Lusted L.B.: Radiographic applications of receiver operating characteristic (ROC) curves. Radiology 1974; 110: pp. 89-95.

  • 2. Metz C.E.: ROC methodology in radiologic imaging. Invest Radiol 1986; 21: pp. 720-733.

  • 3. Metz C.E.: ROC analysis in medical imaging: a tutorial review of the literature. Radiol Phys Technol 2008; 1: pp. 2-12.

  • 4. Obuchowski N.A.: Receiver operating characteristic curves and their use in radiology. Radiology 2003; 229: pp. 3-8.

  • 5. Shiraishi J., Pesce L.L., Metz C.E., et. al.: Experimental design and data analysis in receiver operating characteristic studies: lessons learned from reports in radiology from 1997 to 2006. Radiology 2009; 253: pp. 822-830.

  • 6. Gallas B.D., Chan H.P., D’Orsi C.J., et. al.: Evaluating imaging and computer-aided detection and diagnosis devices at the FDA. Acad Radiol 2012; 19: pp. 463-477.

  • 7. Beiden S.V., Wagner R.F., Campbell G., et. al.: Components-of-variance models for random-effects ROC analysis: the case of unequal variance structures across modalities. Acad Radiol 2001; 8: pp. 605-615.

  • 8. Dorfman D.D., Berbaum K.S., Metz C.E.: Receiver operating characteristic rating analysis. Generalization to the population of readers and patients with the jackknife method. Invest Radiol 1992; 27: pp. 723-731.

  • 9. Gallas B.D.: One-shot estimate of MRMC variance: AUC. Acad Radiol 2006; 13: pp. 353-362.

  • 10. Hillis S.L., Berbaum K.S., Metz C.E.: Recent developments in the Dorfman-Berbaum-Metz procedure for multireader ROC study analysis. Acad Radiol 2008; 15: pp. 647-661.

  • 11. Hanley J.A., McNeil B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982; 143: pp. 29-36.

  • 12. Metz C.E.: Basic principles of ROC analysis. Semin Nucl Med 1978; 8: pp. 283-298.

  • 13. Hilden J.: The area under the ROC curve and its competitors. Med Decis Making 1991; 11: pp. 95-101.

  • 14. Hilden J. Evaluation of diagnostic tests - the schism. Society for Medical Decision Making Newsletter. 2004; (4):5–6.

  • 15. Jiang Y., Metz C.E., Nishikawa R.M.: A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology 1996; 201: pp. 745-750.

  • 16. Obuchowski N.A., McClish D.K.: Sample size determination for diagnostic accuracy studies involving binormal ROC curve indices. Stat Med 1997; 16: pp. 1529-1542.

  • 17. McClish D.K.: Analyzing a portion of the ROC curve. Med Decis Making 1989; 9: pp. 190-195.

  • 18. Hand D.J.: Measuring classifier performance: a coherent alternative to the area under the ROC curve. Machine Learning 2009; 77: pp. 103-123.

  • 19. Halpern E.J., Albert M., Krieger A.M., et. al.: Comparison of receiver operating characteristic curves on the basis of optimal operating points. Acad Radiol 1996; 3: pp. 245-253.

  • 20. Sunshine J.: Contributed comment. Acad Radiol 1995; 2: S72–S74

  • 21. Wagner R.F., Beam C.A., Beiden S.V.: Reader variability in mammography and its implications for expected utility over the population of readers and cases. Med Decis Making 2004; 24: pp. 561-572.

  • 22. Swets J.A., Pickett R.M.: Evaluation of diagnostic systems: methods from signal detection theory.1982.Academic PressNew York

  • 23. Abbey C.K., Eckstein M.P., Boone J.M.: An equivalent relative utility metric for evaluating screening mammography. Med Decis Making 2010; 30: pp. 113-122.

  • 24. Lusted L.B.: Introduction to medical decision making.1968.ThomasSpringfield, IL

  • 25. Pisano E.D., Gatsonis C., Hendrick E., et. al.: Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med 2005; 353: pp. 1773-1783.

  • 26. Barlow W.E., Chi C., Carney P.A., et. al.: Accuracy of screening mammography interpretation by characteristics of radiologists. J Natl Cancer Inst 2004; 96: pp. 1840-1850.

  • 27. Abbey C.K., Samuelson F.W., Gallas B.D.: Statistical power considerations for a utility endpoint in observer performance studies. Acad Radiol 2013; 20: pp. 798-806.

  • 28. Abbey C.K., Eckstein M.P., Boone J.M.: Estimating the relative utility of screening mammography. Med Decis Making 2013; 33: pp. 510-520.

  • 29. Hendrick R.E., Cole E.B., Pisano E.D., et. al.: Accuracy of soft-copy digital mammography versus that of screen-film mammography according to digital manufacturer: ACRIN DMIST retrospective multireader study. Radiology 2008; 247: pp. 38-48.

  • 30. Nishikawa R.M., Acharyya S., Gatsonis C., et. al.: Comparison of soft-copy and hard-copy reading for full-field digital mammography. Radiology 2009; 251: pp. 41-49.

  • 31. Pisano E.D., Gatsonis C.A., Yaffe M.J., et. al.: American College of Radiology Imaging Network digital mammographic imaging screening trial: objectives and methodology. Radiology 2005; 236: pp. 404-412.

  • 32. Hadjiiski L., Chan H.P., Sahiner B., et. al.: Improvement in radiologists’ characterization of malignant and benign breast masses on serial mammograms with computer-aided diagnosis: an ROC study. Radiology 2004; 233: pp. 255-265.

  • 33. Rafferty E.A., Park J.M., Philpotts L.E., et. al.: Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology 2013; 266: pp. 104-113.

  • 34. Abbey C.K., Samuelson F.W., Gallas B.D., et. al.: Statistical properties of a utility measure of observer performance compared to area under the ROC curve. Book Statistical properties of a utility measure of observer performance compared to area under the ROC curve.2013.International Society for Optics and PhotonicsBellingham, WA 86730D-D-9

  • 35. Dorfman D.D., Berbaum K.S.: A contaminated binormal model for ROC data: Part III. Initial evaluation with detection ROC data. Acad Radiol 2000; 7: pp. 438-447.

  • 36. Dorfman D.D., Berbaum K.S.: A contaminated binormal model for ROC data: Part II. A formal model. Acad Radiol 2000; 7: pp. 427-437.

  • 37. Dorfman D.D., Berbaum K.S., Brandser E.A.: A contaminated binormal model for ROC data: Part I. Some interesting examples of binormal degeneracy. Acad Radiol 2000; 7: pp. 420-426.

  • 38. Hillis S.L., Obuchowski N.A., Schartz K.M., et. al.: A comparison of the Dorfman-Berbaum-Metz and Obuchowski-Rockette methods for receiver operating characteristic (ROC) data. Stat Med 2005; 24: pp. 1579-1607.

  • 39. Hillis S.L., Obuchowski N.A., Berbaum K.S.: Power estimation for multireader ROC methods an updated and unified approach. Acad Radiol 2011; 18: pp. 129-142.

This post is licensed under CC BY 4.0 by the author.