A Reader Study Comparing Prospective Tomosynthesis Interpretations with Retrospective Readings of the Corresponding FFDM Examinations

Rationale and Objectives

To compare performance of prospective interpretations of clinical tomosynthesis (digital breast tomosynthesis [DBT]) plus full-field digital mammography (FFDM) examinations with retrospective readings of the corresponding FFDM examinations alone.

Methods and Materials

Seven Mammography Quality Standard Act–qualified radiologists retrospectively interpreted 10,878 FFDM examinations that had been interpreted by other radiologists during prospective clinical interpretations of DBT plus FFDM. The radiologists were blinded to the Breast Imaging Reporting and Data System (BIRADS) category given during the clinical interpretations and the verified outcome by follow-up and/or any diagnostic workup that may have followed. Ratings (BIRADS 0, 1, or 2) were recorded. Group performance levels in terms of recall rates and attributable cancer detection rates were compared to the prospective clinical interpretations of the same examinations (DBT plus FFDM) using McNemar test (two sided/tailed) with significance level of .05.

Results

During the prospective clinical interpretations of DBT plus FFDM, 588 cases were recalled (588 of 10,878, 5.41%) compared to 888 cases recalled (888 of 10,878, 8.16%) during the FFDM-alone retrospective interpretations (absolute difference, 35%; P < .0001). There were 59 and 38 suspicious abnormalities later verified as cancers detected during the DBT plus FFDM and the FFDM-alone interpretations, respectively (absolute increase, 55%; P < .0001). Invasive cancer detections were 48 and 29, respectively (absolute increase, 66%; P < .0001).

Conclusions

The combination of DBT plus FFDM for screening asymptomatic women resulted in a significant reduction in recall rates and a simultaneous increase in cancer detection rates when compared to retrospective interpretations of corresponding FFDM examinations alone.

Although routine full-field digital mammography (FFDM) is widely accepted as a screening tool in the United States, its limitations are well known. Of concern are the reported high recall rates and low positive predictive values for recalls, a reported 10% (range: 6%–13%) and ∼5% (range: 4.4%–16.8%), respectively . This means that ∼95% of abnormalities recalled from screening mammography are ultimately found to be negative. In addition, mammography screening has a relatively low cancer detection rate, particularly after the first 2 years of screening and for women with dense breast tissue. Variability in performance levels among radiologists who read and interpret mammograms is large and is attributed in part to the difficulty in distinguishing suspicious lesions from the adjacent overlapping tissue .

In recent years, a major effort has been expended to develop new and better approaches in screening for early detection of breast cancer, one of which is the use of digital breast tomosynthesis (DBT). DBT enables the reconstruction of cross-sectional images aimed to assist radiologists with the interpretation process through better visualization of breast tissue patterns by minimizing issues associated with overlapping tissue . The goal of this approach is to improve screening performance by lowering recall rates while increasing sensitivity, particularly for invasive cancers . There have been several reports on the performance of radiologists during interpretations of DBT examinations, which to date have been performed primarily in combination with FFDM . These studies included both retrospective and prospective interpretations . The results of these studies were quite consistent in exhibiting combined performance gains in recall and cancer detection rates when radiologists interpret DBT-based screening examinations. Although these studies exhibited a wide variability in terms of the magnitude of improvements, all studies reported a combined improvement in the range of 30%–60% despite the variability in the type of practices. The issue of comparing examinations in DBT-based clinical practices that typically compare both FFDM plus DBT to FFDM alone is that differences in clinical decision making under the two approaches could affect care. In Europe, where double reading is often performed routinely followed by an arbitration or consensus decision, this is a substantially easier task. With the arbitration process, one can design a prospective study in which the workflow is quite similar to traditional practices . However, because of the limitation of professional resources in the United States, combined with primarily the single reader approach to the interpretation of mammograms, the process that leads to the decision of whether to recall a patient is more difficult to perform in a prospective manner.

Get Radiology Tree app to read full this article<

Methods

Overview

Get Radiology Tree app to read full this article<

Outcome Verification

Get Radiology Tree app to read full this article<

The Retrospective Reading Experiment

Get Radiology Tree app to read full this article<

Recall Rates

Get Radiology Tree app to read full this article<

Biopsy and Cancer Detection

Get Radiology Tree app to read full this article<

Statistical Analyses

Get Radiology Tree app to read full this article<

Results

Get Radiology Tree app to read full this article<

Table 1

Performance of the Radiologists during the Retrospective Interpretation of FFDM Alone and the Prospective Readings of FFDM Plus DBT

Reader Cases Read Recalled Cases Percent of Recall Rate Cancers Detected Cancers per 1000 Cases FFDM (retrospective) 1 1561 130 8.3 4 2.6 2 1667 155 9.3 12 7.2 3 1445 76 5.3 6 4.2 4 1614 130 8.1 4 2.5 5 1478 101 6.8 3 2.0 6 1445 127 8.8 6 4.2 7 1668 169 10.1 3 1.8 8 — — — — — 9 — — — — — 10 — — — — — 11 — — — — — 12 — — — — — 13 — — — — — 14 — — — — — Total 10,878 888 8.2 38 3.5 FFDM + DBT (prospective) 1 — — — — — 2 — — — — — 3 — — — — — 4 78 6 7.7 1 12.8 5 — — — — — 6 8 0 0.0 0 0.0 7 1161 59 5.1 6 5.2 8 647 52 8.0 3 4.6 9 2701 188 7.0 18 6.7 10 520 35 6.7 1 1.9 11 132 5 3.8 1 7.6 12 1293 48 3.7 6 4.6 13 1535 88 5.7 12 7.8 14 2803 107 3.8 11 3.9 Total 10,878 588 5.4 59 5.4

DBT, digital breast tomosynthesis; FFDM, full-field digital mammography.

Note that different cases were read by different radiologists under each of the two modes. Totals for FFDM alone include two cases that were “recalled” (Breast Imaging Reporting and Data System 0) during the retrospective interpretation and were actually recalled as a result of the secondary/arbitration review and found to have cancers; hence, they constitute false-negative cases for DBT.

Get Radiology Tree app to read full this article<

Discussion

Get Radiology Tree app to read full this article<

Conclusions

Get Radiology Tree app to read full this article<

References

1. Rosenberg R.D., Yankaskas B.C., Abraham L.A., et. al.: Performance benchmarks for screening mammography. Radiology 2006; 241: pp. 55-66.
2. Ng E.H., Ng F.C., Tan P.H., et. al.: Results of intermediate measures from a population-based, randomized trial of mammographic screening prevalence and detection of breast carcinoma among Asian women: the Singapore Breast Screening Project. Cancer 1998; 82: pp. 1521-1528.
3. Sickles E.A., Wolverton D.E., Dee K.E.: Performance parameters for screening and diagnostic mammography: specialist and general radiologists. Radiology 2002; 224: pp. 861-869.
4. Jiang Y., Miglioretti D.L., Metz C.E., et. al.: Breast cancer detection rate: designing imaging trials to demonstrate improvements. Radiology 2007; 243: pp. 360-367.
5. Elmore J.G., Wells C.K., Lee C.H., et. al.: Variability in radiologists’ interpretations of mammograms. N Engl J Med 1994; 331: pp. 1493-1499.
6. Beam C.A., Conant E.F., Sickles E.A., et. al.: Evaluation of proscriptive health care policy implementation in screening mammography. Radiology 2003; 229: pp. 534-540.
7. Bird R.E., Wallace T.W., Yankaskas B.C.: Analysis of cancers missed at screening mammography. Radiology 1992; 184: pp. 613-617.
8. Carney P.A., Miglioretti D.L., Yankaskas B.C., et. al.: Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann Intern Med 2003; 138: pp. 168-175.
9. Niklason L.T., Christian B.T., Niklason L.E., et. al.: Digital tomosynthesis in breast imaging. Radiology 1997; 205: pp. 399-406.
10. Rafferty E.A.: Digital mammography: novel applications. Radiol Clin North Am 2007; 45: pp. 831-843.
11. Baker J.A., Lo J.Y.: Breast tomosynthesis: state-of-the-art and review of the literature. Acad Radiol 2011; 18: pp. 1298-1310.
12. Poplack S.P., Tosteson T.D., Kogel C.A., et. al.: Digital breast tomosynthesis: initial experience in 98 women with abnormal digital screening mammography. AJR Am J Roentgenol 2007; 189: pp. 616-623.
13. Gur D., Abrams G.S., Chough D.M., et. al.: Digital breast tomosynthesis: observer performance study. AJR Am J Roentgenol 2009; 193: pp. 586-591.
14. Gennaro G., Toledano A., di Maggio C., et. al.: Digital breast tomosynthesis versus digital mammography: a clinical performance study. Eur Radiol 2010; 20: pp. 1545-1553.
15. Skaane P., Gullien R., Bjørndal H., et. al.: Digital breast tomosynthesis (DBT): initial experience in a clinical setting. Acta Radiol 2012; 53: pp. 524-529.
16. Bernardi D., Ciatto S., Pellegrini M., et. al.: Prospective study of breast tomosynthesis as a triage to assessment in screening. Breast Cancer Res Treat 2012; 133: pp. 267-271.
17. Rafferty E.A., Park J.M., Philpotts L.E., et. al.: Assessing radiologist performance using combined digital mammography and breast tomosynthesis compared with digital mammography alone: results of a multicenter, multireader trial. Radiology 2013; 266: pp. 104-113.
18. Skaane P., Bandos A.I., Gullien R., et. al.: Comparison of digital mammography alone and digital mammography plus tomosynthesis in a population-based screening program. Radiology 2013; 267: pp. 47-56.
19. Rose S.L., Tidwell A.L., Bujnoch L.J., et. al.: Implementation of breast tomosynthesis into a routine screening practice: an observational study. AJR Am J Roentgenol 2013; 200: pp. 1401-1408.
20. Ciatto S., Houssami N., Bernardi D., et. al.: Integration of 3D digital mammography with tomosynthesis for population breast-cancer screening (STORM): a prospective comparison study. Lancet Oncol 2013; 14: pp. 583-589.
21. Haas B.M., Kalra V., Geisel J., et. al.: Comparison of tomosynthesis plus digital mammography and digital mammography alone for breast cancer screening. Radiology 2013; 269: pp. 694-700.
22. Skaane P., Bandos A.I., Gullien R., et. al.: Prospective trial comparing full-field digital mammography (FFDM) versus combined FFDM and tomosynthesis in a population-based screening program using independent double reading with arbitration. Eur Radiol 2013; 23: pp. 2061-2071.
23. Metz C.E., Shen J.H.: Gains in accuracy from replicated readings of diagnostic images: prediction and assessment in terms of ROC analysis. Med Decis Making 1992; 12: pp. 60-75.
24. Swensson R.G., King J.L., Good W.F., et. al.: Observer variation and the performance accuracy gained by averaging ratings of abnormality. Med Phys 2000; 27: pp. 1920-1933.
25. Gur D., Bandos A.I., Cohen C.S., et. al.: The “laboratory” effect: comparing radiologists’ performance and variability during prospective clinical and laboratory mammography interpretations. Radiology 2008; 249: pp. 47-53.

A Reader Study Comparing Prospective Tomosynthesis Interpretations with Retrospective Readings of the Corresponding FFDM Examinations

Rationale and Objectives

Methods and Materials

Results

Conclusions

Methods

Overview

Outcome Verification

The Retrospective Reading Experiment

Recall Rates

Biopsy and Cancer Detection

Statistical Analyses

Results

Discussion

Conclusions

References

Further Reading

Applying Quantitative Benefit–Risk Analysis to Aid Regulatory Decision Making in Diagnostic Imaging

Appropriateness and Imaging Utilization

Assessing the Impact of Prevalence Expectations on Radiologists' Behavior