Peer Review

Does any radiologist actually like peer review (PR)? Do radiologists look forward to retrospectively checking a specific number of the cases to determine whether they agree with the previous interpretation of their colleague or, if not, attempt to assess whether an apparent error was something that was understandable or should not have been made? For those who have attended periodic section conferences reviewing misses, can anyone honestly deny that your initial concern seeing each new case is whether you made the error, or that you breathe a silent sigh of relief whenever the prior interpretation is expressed in a style that is not their own? How many times have you seen a subtle finding that was missed and you were just thankful that the case had been assigned to someone else?

Retrospective PR using RADPEER (a trademark of the American College of Radiology, Reston, Virginia) or similar system has become the general standard among radiologists . But are you convinced that participating in this exercise makes you a better radiologist? Although the ultimate goal of PR is to improve the performance of individual radiologists, which is then transformed into improving patient care, there is little if any evidence that this is the case.

Have you ever wondered about the efficiency of retrospective PR in identifying significant misses? At our large, tertiary care, academic medical center, we have used a RADPEER-like PR system since 2007. Analyzing the results for the first 6 years in our chest section of four full time equivalent (FTE) radiologists interpreting approximately 385,000 studies over that time, we peer reviewed 9441 cases, of which 8757 (92.8%) were category 1 and 444 (4.7%) were category 2. Therefore, only 197 (2.1%) category 3 cases and 47 (0.5%) category 4 cases met the threshold to be presented to the entire chest section at a monthly quality assurance (QA) conference. This means that an average of 39 cases (9441/244) subjected to PR were required to detect any category 3 or 4 error. For specific diagnoses, this ranged from 197 for pulmonary vascular congestion to 1574 for a rib or other skeletal lesion ( Table 1 ). Based on the results of a short study indicating that it took the average chest radiologist about 1 minute to PR a study, we calculated that completion of the 9441 PR cases over the 6-year period required a total radiologist time expenditure of 157 hours (almost 20 full workdays). The estimated average amount of radiologist time required to detect any category 3 or 4 error is also presented in Table 1 .

Table 1

Category 3 and 4 Misses

Type of Error Total % of 3/4 Errors ( n = 244) Cases to Find One Error Time to Detect One Error Pulmonary vascular congestion 48 19.7 197 3 h 6 min Atelectasis/collapse 36 14.8 262 4 h 22 min Misplaced tubes/catheters, wires 28 11.5 337 5 h 37 min Pneumonia 23 9.4 411 6 h 50 min Effusion 19 7.8 497 8 h 17 min Enlargement of cardiac silhouette 14 5.7 674 11 h 14 min Pneumothorax 14 5.7 674 11 h 14 min Mass/nodule 11 4.5 858 14 h 18 min Mediastinal mass 9 3.7 1049 17 h 29 min Rib/skeletal lesion 6 2.5 1574 26 h 14 min

We also employ an active online QA database, in which radiologists and referring physicians have the opportunity to electronically enter apparent errors and complications. During the same 6-year period, there were 361 cases entered into the QA system with discrepancies attributed to the radiologist, 50% more than via the PR system. Although not random, this approach collected a far higher percentage of clinically relevant cases and provides valuable feedback from referring physicians without requiring large amounts of radiologist time.

A major justification for PR is that it provides data on individual performance that permits the detection of outliers who require remediation or even termination from a radiology group. Conversely, if the performance of a radiologist is being investigated by a regulatory body, demonstration that a radiologist has an error rate similar to his peers is often considered as an indication of satisfactory performance. However, anecdotal evidence and numerous articles have questioned the validity of such a PR system. We have encountered several instances in which random PR revealed the failure of a radiologist to detect an important abnormality, which was shown to have also been missed on earlier studies interpreted by various members of the section. This would raise the error rate of the radiologist subjected to PR, but effectively not be counted against the other radiologists who missed the identical abnormality. Borgstede et al concluded that “RADPEER is a less than perfect measuring system,” based on such facts that its gold standard “is expert consensus, not verification by pathology or clinical follow-up” and that “radiologists may minimize their reports of disparities knowing that the reviewed radiologist are colleagues.” Donnelly noted the lack of an absolute standard or threshold that defines the limits of acceptable performance, but these absolute criteria do not yet exist. Moreover, all currently applied PR methods assess interpretive disagreement between readers. In the absence of a definitive diagnosis (such as surgery or pathology), it may be impossible to differentiate between an error and a genuine difference of opinion regarding the correct interpretation of an image or appropriate recommendation for follow-up . As Bender et al concluded from their study, “A ratings-based peer review system [like Radpeer] is unreliable and subjective for the evaluation of discrepant interpretations.”

Get Radiology Tree app to read full this article<

References

1. Mahgerefteh S., Kruskal J.B., Yam C.S., et. al.: Peer review in diagnostic radiology: current state and a vision for the future. Radiographics 2009; 29: pp. 1221-1231.
2. Borgstede J.P., Lewis R.S., Bhargavan M., et. al.: Radpeer quality assurance program: a multi-facility study of interpretive disagreement rates. J Am Coll Radiol 2004; 1: pp. 59-65.
3. Donnelly L.F.: Performance-based assessment of radiology practitioners: promoting improvement in accordance with the 2007 Joint Commission Standard. J Am Coll Radiol 2007; 4: pp. 699-703.
4. Alport H.R., Hillman B.J.: Quality and variability in diagnostic radiology. J Am Coll Radiol 2004; 1: pp. 127-132.
5. Bender L.C., Linnau K.F., Meier E.N., et. al.: Interrater agreement in the evaluation of discrepant imaging findings with the Radpeer system. AJR Am J Roentgenol 2012; 199: pp. 320-327.

Peer Review

References

Further Reading

Assessment of Heterogeneity Difference Between Edge and Core by Using Texture Analysis

Decreased Regional Cerebral Perfusion at Resting State in Acute Posttraumatic Stress Disorder Resulting From a Single, Prolonged Stress Event

Detecting Pulmonary Nodules in Lung Cancer Patients Using Whole Body FDG PET/CT, High-resolution Lung Reformat of FDG PET/CT, or Diagnostic Breath Hold Chest CT