Verification Bias

Rationale and Objectives

The sensitivity and specificity of magnetic resonance imaging (MRI) for diagnosis of meniscal tears has been studied extensively, with tears usually verified by surgery. However, surgically unverified cases are often not considered in these studies, leading to verification bias, which can falsely increase the sensitivity and decrease the specificity estimates. Our study suggests that such bias may be very common in the meniscal MRI literature, and illustrates techniques to detect and correct for such bias.

Materials and Methods

PubMed was searched for articles estimating sensitivity and specificity of MRI for meniscal tears. These were assessed for verification bias, deemed potentially present if a study included any patients whose MRI findings were not surgically verified. Retrospective global sensitivity analysis (GSA) was performed when possible.

Results

Thirty-nine of the 314 studies retrieved from PubMed specifically dealt with meniscal tears. All 39 included unverified patients, and hence, potential verification bias. Only seven articles included sufficient information to perform GSA. Of these, one showed definite verification bias, two showed no bias, and four others showed bias within certain ranges of disease prevalence. Only 9 of 39 acknowledged the possibility of verification bias.

Conclusion

Verification bias is underrecognized and potentially common in published estimates of the sensitivity and specificity of MRI for the diagnosis of meniscal tears. When possible, it should be avoided by proper study design. If unavoidable, it should be acknowledged. Investigators should tabulate unverified as well as verified data. Finally, verification bias should be estimated; if present, corrected estimates of sensitivity and specificity should be used. Our online web-based calculator makes this process relatively easy.

The efficacy of diagnostic tests is assessed by comparing them against a reference standard test (“gold standard”). Sensitivity and specificity are key indices of the efficacy of a test. However, the reference test may not be applied to all patients when it is expensive, painful, invasive, dangerous, or refused by patients. This can result in biased estimates of the sensitivity and specificity of the diagnostic test . This type of bias is called “verification bias,” “workup bias,” or “posttest referral bias.” Verification bias is a common problem in imaging research, particularly in retrospective studies. Verification bias is introduced if patients receiving the tests of interest are not equally likely to undergo the reference standard to verify their diagnosis and only those who receive the reference standard are included in the statistical analysis . Verification bias also occurs when an imperfect reference standard is used or when patients are verified using different reference standards in the same study .

The prevalence of verification bias in the medical literature has previously been estimated by several investigators. Greenes and Begg surveyed the medical literature between 1976 and 1980 and found that at least 26% of diagnostic efficacy studies had potential verification bias . Bates et al reviewed verification bias in the pediatric literature between 1987 and 1989 . Of the pediatric studies evaluating diagnostic tests, 36% were subject to verification bias. In a review of all diagnostic test studies published between 1978 and 1993, Cronin found that correction for verification bias was performed in 46% . This same study also found that the proportion of studies correcting for verification bias significantly increased over time: 29% between 1978 and 1981 to 62% and between 1990 and 1993. In a review of studies examining diagnostic tests for cancer published between 1990 and 2003, 40% at least mentioned verification bias as a potential source for bias . In our own recent review of four radiology journals, we found evidence of potential verification bias in 13%–36% (average 27%) of articles . This potential bias was acknowledged in only 17.1% of these articles.

Get Radiology Tree app to read full this article<

Materials and methods

Get Radiology Tree app to read full this article<

Results

Get Radiology Tree app to read full this article<

Table 1

Data for “Any Free Fragment Sign” from Dorsay et al

Arthroscopy (+) Arthroscopy (-) Unverified Total Magnetic resonance (+) 39 6 91 136 Magnetic resonance (-) 4 22 2094 2120 Total 43 28 2185 2256

Table 2

Uncorrected and Bias-corrected Estimates of Sensitivity and Specificity for Data from Dorsay et al

Uncorrected SD Bias-corrected SD Sensitivity 0.907 0.044 0.265 0.465 Specificity 0.786 0.078 0.990 0.004

SD, standard deviation.

Get Radiology Tree app to read full this article<

Table 3

Data for Lateral Meniscus from Zobel et al

Arthroscopy (+) Arthroscopy (-) Unverified Total Magnetic resonance (+) 4 0 15 19 Magnetic resonance (-) 1 25 59 85 Total 5 25 74 104

Table 4

Uncorrected and Bias-corrected Estimates of Sensitivity and Specificity for Data from Zobel et al

Uncorrected SD Bias-corrected SD Sensitivity 0.800 0.179 0.853 0.985 Specificity 1.00 0.000 1.00 Undefined

SD, standard deviation.

Get Radiology Tree app to read full this article<

Discussion

Get Radiology Tree app to read full this article<

Report the number of participants satisfying the criteria for inclusion that did or did not undergo the index tests and/or the reference standard; describe why participants failed to receive either test (a flow diagram is strongly recommended).

Get Radiology Tree app to read full this article<

References

1. Zhou X.H., Obuchowski N.A., Obuchowski D.M.: Statistical methods in diagnostic medicine.2002.Wiley and SonsNew York
2. Sica G.T.: Bias in research studies. Radiology 2006; 238: 780–709
3. Begg C.B.: Biases in the assessment of diagnostic tests. Stat Med 1987; 6: pp. 411-423.
4. Revesz G., Kundel H.L., Bonitatibus M.: The effect of verification on the assessment of imaging techniques. Invest Radiol 1983; 18: pp. 194-198.
5. Kosinski A.S., Barnhart H.X.: A global sensitivity analysis of performance of a medical diagnostic test when verification bias is present. Stat Med 2003; 22: pp. 2711-2721.
6. Whiting P., Rutjes A.W.S., Reitsma J.B., et. al.: Sources of variation and bias in studies of diagnostic accuracy: a systematic review. Ann Intern Med 2004; 140: pp. 189-202.
7. Greenes R.A., Begg C.B.: Assessment of diagnostic technologies. Methodology for unbiased estimation from samples of selectively verified patients. Invest Radiol 1985; 20: pp. 751-756.
8. Bates A.S., Margolis P.A., Evans A.T.: Verification bias in pediatric studies evaluating diagnostic tests. J Pediatr 1993; 122: pp. 585-890.
9. Cronin A.M., Vickers A.J.: Statistical methods to correct for verification bias in diagnostic studies are inadequate when there are few false negatives: a simulation study. BMC Med Res Methodol 2008; 8: pp. 75.
10. Mallett S., Deeks J.J., Halligan S., et. al.: Systematic reviews of diagnostic tests in cancer: review of methods and reporting. BMJ 2006; 333: pp. 413.
11. Petscavage J.M., Richardson M.L., Carr R.B.: Verification bias: an underrecognized source of error in assessing the efficacy of medical imaging. Acad Radiol 2011; 18: pp. 343-346.
12. Richardson M.L., Petscavage J.M.: An interactive web-based tool for detecting verification (work-up) bias in studies of the efficacy of diagnostic imaging. Acad Radiol 2010; 17: pp. 1580-1583.
13. Diamond G.A.: Reverend Bayes’ silent majority. An alternative factor affecting sensitivity and specificity of exercise electrocardiography. Am J Cardiol 1986; 57: pp. 1175-1180.
14. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria 2010. http://www.R-project.org/ .
15. Solomon S.L., Totty W.G., Lee J.K.: MR imaging of the knee: comparison of three-dimensional FISP and two-dimensional spin-echo pulse sequences. Radiology 1989; 173: pp. 739-742.
16. Quinn S.F., Brown T.R., Szumowski J.: Menisci of the knee: radial MR imaging correlated with arthroscopy in 259 patients. Radiology 1992; 185: pp. 577-580.
17. Barnett M.J.: MR diagnosis of internal derangements of the knee: effect of field strength on efficacy. AJR Am J Roentgenol 1993; 161: pp. 115-118.
18. De Smet A.A., Norris M.A., Yandow D.R., et. al.: MR diagnosis of meniscal tears of the knee: importance of high signal in the meniscus that extends to the surface. AJR Am J Roentgenol 1993; 161: pp. 101-107.
19. Applegate G.R., Flannigan B.D., Tolin B.S., et. al.: MR diagnosis of recurrent tears in the knee: value of intraarticular contrast material. AJR Am J Roentgenol 1993; 161: pp. 821-825.
20. Zobel M.S., Borrello J.A., Siegel M.J., et. al.: Pediatric knee MR imaging: pattern of injuries in the immature skeleton. Radiology 1994; 190: pp. 397-401.
21. De Smet A.A., Graf B.K.: Meniscal tears missed on MR imaging: relationship to meniscal tear patterns and anterior cruciate ligament tears. AJR Am J Roentgenol 1994; 162: pp. 905-911.
22. Justice W.W., Quinn S.F.: Error patterns in the MR imaging evaluation of menisci of the knee. Radiology 1995; 196: pp. 617-621.
23. Wright D.H., Smet A.A.D., Norris M.: Bucket-handle tears of the medial and lateral menisci of the knee: value of MR imaging in detecting displaced fragments. AJR Am J Roentgenol 1995; 165: pp. 621-625.
24. King S.J., Carty H.M., Brady O.: Magnetic resonance imaging of knee injuries in children. Pediatr Radiol 1996; 26: pp. 287-290.
25. Cheung L.P., Li K.C., Hollett M.D., et. al.: Meniscal tears of the knee: accuracy of detection with fast spin-echo MR imaging and arthroscopic correlation in 293 patients. Radiology 1997; 203: pp. 508-512.
26. Weinstabl R., Muellner T., Vécsei V., et. al.: Economic considerations for the diagnosis and therapy of meniscal lesions: can magnetic resonance imaging help reduce the expense?. World J Surg 1997; 21: pp. 363-368.
27. Araki Y., Ashikaga R., Fujii K., et. al.: MR imaging of meniscal tears with discoid lateral meniscus. Eur J Radiol 1998; 27: pp. 153-160.
28. Rubin D.A., Kettering J.M., Towers J.D., et. al.: MR imaging of knees having isolated and combined ligament injuries. AJR Am J Roentgenol 1998; 170: pp. 1207-1213.
29. Kreitner K.F., Hansen M., Schadmand-Fischer S., et. al.: [Low-field MRI of the knee joint: results of a prospective, arthroscopically controlled study]. ROFO 1999; 170: pp. 35-40.
30. Elvenes J., Jerome C.P., Reikerås O., et. al.: Magnetic resonance imaging as a screening procedure to avoid arthroscopy for meniscal tears. Arch Orthopaed Trauma Surg 2000; 120: pp. 14-16.
31. Cotten A., Delfaut E., Demondion X., et. al.: MR imaging of the knee at 0.2 and 1.5 T: correlation with surgery. AJR Am J Roentgenol 2000; 174: pp. 1093-1097.
32. Runkel M., Kreitner K.F., Regentrop H.J., et. al.: [Sensitivity of magnetic resonance tomography in detecting meniscus tears]. Der Unfallchirurg 2000; 103: pp. 1079-1085.
33. McCauley T.R., Jee W.H., Galloway M.T., et. al.: Grade 2C signal in the meniscus on MR imaging of the knee. AJR Am J Roentgenol 2002; 179: pp. 645-648.
34. Sparacia G., Barbiera F., Bartolotta T.V., et. al.: Pitfalls and limitations of magnetic resonance imaging in bucket-handle tears of knee menisci. La Radiol Med 2002; 104: pp. 150-156.
35. Magee T., Shapiro M., Williams D.: MR accuracy and arthroscopic incidence of meniscal radial tears. Skel Radiol 2002; 31: pp. 686-689.
36. Major N.M., Beard L.N., Helms C.A.: Accuracy of MR imaging of the knee in adolescents. AJR Am J Roentgenol 2003; 180: pp. 17-19.
37. Dorsay T.A., Helms C.A.: Bucket-handle meniscal tears of the knee: sensitivity and specificity of MRI signs. Skel Radiol 2003; 32: pp. 266-272.
38. Jee W.H., McCauley T.R., Kim J.M.: Magnetic resonance diagnosis of meniscal tears in patients with acute anterior cruciate ligament tears. J Comp Assisted Tomogr 2004; 28: pp. 402-406.
39. Tarhan N.C., Chung C.B., Mohana-Borges A.V.R., et. al.: Meniscal tears: role of axial MRI alone and in combination with other imaging planes. AJR Am J Roentgenol 2004; 183: pp. 9-15.
40. Çevikol C., Karaali K., Esen G., et. al.: [MR imaging of meniscal tears at low-field (0.35 T) and high-field (1.5 T) MR units]. Tansal ve giri şimsel radyoloji 2004; 10: pp. 316-319.
41. Vande Berg B.C., Malghem J., Poilvache P., et. al.: Meniscal tears with fragments displaced in notch and recesses of knee: MR imaging with arthroscopic comparison. Radiology 2005; 234: pp. 842-850.
42. Craig J.G., Go L., Blechinger J., et. al.: Three-tesla imaging of the knee: initial experience. Skel Radiol 2005; 34: pp. 453-461.
43. Vaz C.E.S., de Camargo O.P., de Santana P.J., et. al.: Accuracy of magnetic resonance in identifying traumatic intraarticular knee lesions. Clinics (Sao Paulo) 2005; 60: pp. 445-450.
44. Ververidis A.N., Verettas D.A., Kazakos K.J., et. al.: Meniscal bucket handle tears: a retrospective study of arthroscopy and the relation to MRI. Knee Surg Sports Traumatol Arthrosc Official J ESSKA 2006; 14: pp. 343-349.
45. De Smet A.A., Tuite M.J.: Use of the “two-slice-touch” rule for the MRI diagnosis of meniscal tears. AJR Am J Roentgenol 2006; 187: pp. 911-914.
46. Thomas S., Pullagura M., Robinson E., et. al.: The value of magnetic resonance imaging in our current management of ACL and meniscal injuries. Knee Surg Sports Traumatol Arthrosc Official J ESSKA 2007; 15: pp. 533-536.
47. De Smet A.A., Mukherjee R.: Clinical, MRI, and arthroscopic findings associated with failure to diagnose a lateral meniscal tear on knee MRI. AJR Am J Roentgenol 2008; 190: pp. 22-26.
48. Nourissat G., Beaufils P., Charrois O., et. al.: French Society of Arthroscopy. Magnetic resonance imaging as a tool to predict reparability of longitudinal full-thickness meniscus lesions. Knee Surg Sports Traumatol Arthrosc 2008; 16: pp. 482-486.
49. von Engelhardt L.V., Schmitz A., Pennekamp P.H., et. al.: Diagnostics of degenerative meniscal tears at 3-Tesla MRI compared to arthroscopy as reference standard. Arch Orthopaed Trauma Surg 2008; 128: pp. 451-456.
50. Nemec S.F., Marlovits S., Trattnig S., et. al.: High-resolution magnetic resonance imaging and conventional magnetic resonance imaging on a standard field-strength magnetic resonance system compared to arthroscopy in patients with suspected meniscal tears. Acad Radiol 2008; 15: pp. 928-933.
51. Naranje S., Mittal R., Nag H., et. al.: Arthroscopic and magnetic resonance imaging evaluation of meniscus lesions in the chronic anterior cruciate ligament-deficient knee. Arthroscopy 2008; 24: pp. 1045-1051.
52. Sampson M.J., Jackson M.P., Moran C.J., et. al.: Three Tesla MRI for the diagnosis of meniscal and anterior cruciate ligament pathology: a comparison to arthroscopic findings. Clin Radiol 2008; 63: pp. 1106-1111.
53. Behairy N.H., Dorgham M.A., Khaled S.A.: Accuracy of routine magnetic resonance imaging in meniscal and ligamentous injuries of the knee: comparison with arthroscopy. Int Orthopaed 2009; 33: pp. 961-967.
54. Bossuyt P.M., Reitsma J.B., Bruns D.E., et. al., for the STARD group: Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. AJR Am J Roentgenol 2003; 181: pp. 51-55.
55. Bossuyt P.M., Reitsma J.B., Bruns D.E., et. al., for the STARD group: Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD Initiative. Radiology 2003; 226: pp. 24-28.
56. Begg C.B., Greenes R.A.: Assessment of diagnostic tests when disease verification is subject to selection bias. Biometrics 1983; 39: pp. 207-215.
57. Punglia R.S., D’Amico A.V., Catalona W.J., et. al.: Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen. N Engl J Med 2003; 349: pp. 335-342.
58. Rubin D., Schenker N.: Logit-based interval estimation for binomial data using the Jeffreys prior. Sociol Methodol 1987; 17: pp. 131-144.
59. Harel O., Zhou X.H.: Multiple imputation for correcting verification bias. Stat Med 2006; 25: pp. 3769-3786.
60. Nishikawa H., Imanaka Y., Sekimoto M., et. al.: Influence of verification bias on the assessment of MRI in the diagnosis of meniscal tear. AJR Am J Roentgenol 2009; 193: pp. 1596-1602.
61. Reitsma J.B., Rutjes A.W.S., Khan K.S., et. al.: A review of solutions for diagnostic accuracy studies with an imperfect or missing reference standard. J Clin Epidemiol 2009; 62: pp. 797-806.
62. Brenner H.: Correcting for exposure misclassification using an alloyed gold standard. Epidemiology 1996; 7: pp. 406-410.
63. Hadgu A., Dendukuri N., Hilden J.: Evaluation of nucleic acid amplification tests in the absence of a perfect gold-standard test: a review of the statistical and epidemiologic issues. Epidemiology 2005; 16: pp. 604-612.

Verification Bias

Rationale and Objectives

Materials and Methods

Results

Conclusion

Materials and methods

Results

Discussion

References

Further Reading

An Observer Study for a Computer-Aided Reading Protocol (CARP) in the Screening Environment for Digital Mammography

Atlas of Fetal and Postnatal Brain MR

Case-based Interventional Neuroradiology