Automated vs. Manual Pattern Recognition of 3D1 H MRSI Data of Patients with Prostate Cancer

Rationale and Objectives

The aim of this study was to assess (1) automated analysis methods versus manual evaluation by human experts of three-dimensional proton magnetic resonance spectroscopic imaging (MRSI) data from patients with prostate cancer and (2) the contribution of spatial information to decision making.

Materials and Methods

Three-dimensional proton MRSI was applied at 1.5 T. MRSI data from 10 patients with histologically proven prostate adenocarcinoma, scheduled either for prostatectomy or intensity-modulated radiation therapy, were evaluated. First, two readers manually labeled spectra using spatial information to identify the localization of spectra and neighborhood information, establishing the reference set of this study. Then, spectra were labeled again manually in a blinded and randomized manner and evaluated automatically using software that applied spectral line fitting as well as pattern recognition routines. Statistical analysis of the results of the different approaches was performed.


Altogether, 1018 spectra were evaluable by all methods. Numbers of evaluable spectra differed significantly depending on patient and evaluation method. Compared to automated analysis, the readers made rather binary decisions, using information from neighboring spectra in ambiguous cases, when evaluating MRSI data as a whole. Differences between anatomically blinded and unblinded evaluation were larger than differences between evaluations using blinded data and automated techniques.


An automated approach, which evaluates each spectrum individually, can be as good as an anatomy-blinded human reader. Spatial information is routinely used by human experts to support their final decisions. Automated procedures that consider anatomic information for spectral evaluation will enhance the diagnostic impact of MRSI of the human prostate.

The large-scale measurement of serum prostate-specific antigen in recent years has resulted in the detection of an immense number of prostate carcinomas . In particular, when initial biopsy results are negative, magnetic resonance (MR) imaging (MRI) is applied to visualize the zonal anatomy of the prostate and localize a possible tumor . High-resolution T2-weighted (T2w) MRI performed with pelvic array coils provides good specificity (up to 90%) but low sensitivity (27%–61%) for tumor detection and localization . The use of an endorectal coil for signal reception raises the sensitivity for tumors >1 cm. However, the reported sensitivity in the literature ranges from 27% to 100%, and 32% to 99% for specificity, depending on the size of the examined tumors . Moreover, false-positive results of T2w MRI remain a problem. They are often caused by local signal reduction due to postbiopsy hemorrhage, prostatitis, or previous treatment .

These limitations fostered the inclusion of functional imaging techniques such as diffusion-weighted MRI, dynamic contrast-enhanced MRI, and proton MR spectroscopic imaging ( 1 H MRSI) in diagnostic imaging protocols. Using MRSI, prostate cancer is characterized by increases in cholines (ie, free choline and choline-containing compounds; Cho) and a decrease in citrate levels . Single-center studies have shown that with MRSI supplementing T2w MRI, prostate cancer can be better differentiated from normal glandular tissue than with conventional MRI alone . Recently published results have demonstrated that most tumors in the prostate were missed because of their small sizes . Thus, MRSI with higher spatial resolution is needed. However, the number of obtained spectra will increase. The large set of spectra resulting from a single examination and the demand for extensive postprocessing and expertise required to interpret the data have hampered the broader application of MRSI.

Spectroscopic imaging

Figure 1, Color-coded tumor probability map of the prostate of patient e (aged 58 years) with adenocarcinoma, calculated and displayed by CLARET (21) software. Tumor voxels are marked in red and areas without pathologic findings in green . Representative spectra displayed from voxels indicated by blue arrows . Magnetic resonance spectroscopic imaging (MRSI) measurement technique: three-dimensional proton MRSI point-resolved spectroscopy with water and lipid signal suppression (repetition time, 650 ms; echo time, 120 ms; nominal voxel size, 6 × 6 × 6 mm 3 ; matrix size, 16 × 16 × 16). Cho, cholines; Ci, citrate; Cr, creatine.

Figure 2, Color maps showing tissue classification obtained by both readers consensually ( left ) ( an ) compared to the evaluation of one reader ( right ) ( e1 ) without spatial context of in vivo prostate proton magnetic resonance spectroscopic imaging (MRSI). The maps show the tissue classification for the 12 central slices (1–12) of 10 different MRSI data sets (a–j). Spectra were labeled according to a five-point scale using the relative intensities of choline plus creatine and citrate resonances. Voxels of spectra that identify healthy prostate tissue are marked in yellow (class 1), while dark red voxels label tumor (class 5). Voxels that could not be evaluated because of localization outside the prostate or poor spectral quality are white . Note that with spatial context ( an ), more spectra of patients c and h were deemed evaluable.

Postprocessing of MRSI data

Evaluation Procedures

Step 1: Visual evaluation of MR spectroscopic and anatomic data (reference data)

Step 2: Visual evaluation of randomized spectra (blinded reference)

Figure 3, Results of different evaluation methods ( an , pr , e1 , e2 , f1 , f2 , and fl ) of two exemplary magnetic resonance spectroscopic imaging data sets (from patients b and e; see Fig 2 ). Data show high variability between the outcomes of the different methods, motivating a more detailed quantitative analysis. The central slices (1–12) were evaluated by consensus reading with spatial context ( an ), automated pattern recognition ( pr ), expert 1 ( e1 ), and expert 2 ( e2 ) without anatomic information and by fitting metabolite signal templates ( f1 and f2 ) as well as resonance line models ( fl ). Healthy prostate tissue is marked in yellow , and dark red voxels correspond to tumor. White pixels correspond to spectra that were not evaluable.

Step 3: Automated evaluation using spectral fits

Step 4: Automated evaluation using pattern recognition

Statistical Evaluation

Table 1

Comparison of Class Predictions between Readers and Fitting Methods

Comparison Class 1 Class 2 Class 3 Class 4 Class 5e1 vs e2 Class 1 315 86 15 0 0 Class 2 28 317 7 2 0 Class 3 1 11 83 0 1 Class 4 0 2 13 44 6 Class 5 0 0 0 7 80f1 vs f2 Class 1 392 3 7 0 0 Class 2 9 228 9 9 0 Class 3 1 8 144 7 0 Class 4 3 5 21 109 1 Class 5 0 0 2 16 44

e1 , reader 1; e2 , reader 2; f1 , fitting method 1; f2 , fitting method 2.

Comparison of class predictions by two “blinded” readers evaluating the spectra in a single-voxel fashion shows high agreement. Few evaluations differed by more than one class, and overall variation was less than the differences between algorithms of the fitting metabolite signal templates.

Table 2

Similarity of Performance of the Different Processing Methods, as Measured by Kendall’s τ

Method_an__pr__e1_e2__ea__f1__f2__fl Expert anatomic ( an ) 1 0.73 (0.11) 0.72 (0.09) 0.62 (0.11) 0.67 (0.10) 0.73 (0.06) 0.68 (0.06) 0.58 (0.06) Pattern recognition ( pr ) — 1 0.83 (0.06) 0.68 (0.08) 0.77 (0.07) 0.81 (0.04) 0.75 (0.05) 0.64 (0.05) Expert 1 ( e1 ) — — 1 0.84 (0.05) 0.93 (0.03) 0.74 (0.06) 0.68 (0.04) 0.59 (0.06) Expert 2 ( e2 ) — — — 1 0.93 (0.03) 0.63 (0.08) 0.58 (0.06) 0.51 (0.07) Expert average ( ea ) — — — — 1 0.69 (0.07) 0.64 (0.06) 0.54 (0.05) Fitting metabolite 1 ( f1 ) — — — — — 1 0.95 (0.01) 0.81 (0.04) Fitting metabolite 2 ( f2 ) — — — — — — 1 0.85 (0.04) Fitting lines ( fl ) — — — — — — — 1

Values in parentheses are the standard deviations from bootstrapping. The first line, for example, shows that the expert anatomic method had the highest correlations with pr and f1 , both with τ = 0.73. Here, a τ value 1 indicates perfect correlation and a τ value of 0 complete randomness between two methods. Data are visualized in Figure 6 .

Figure 4, Visualization of data shown in Tables 1 and 3 . Multidimensional scaling (MDS) 1 yields the differences between the “single-voxel” methods, whereas MDS 2 shows the differences between anatomic and single-voxel methods. The distances encode the similarity or dissimilarity of the different methods. Scaling of both axes is arbitrary. Although the results of fitting metabolite signal templates ( f1 and f2 ) are at nearly identical positions, the anatomic evaluations ( an ) separate from the other postprocessing methods obviously caused by the anatomic information. Automated pattern recognition ( pr ) is located between visual inspection ( e1 , e2 , and ea ) and spectral fitting ( fl , f1 , and f2 ). This indicates that pattern recognition is the closest method to the readers e1 and e2 .

Evaluation times

Evaluable data

Table 3

Numbers of Spectra Deemed Evaluable in the Different Approaches and Overlap between the Different Evaluation Methods

Method_an__pr__e1_e2__ea__f1__f2 Expert anatomic ( an ) 4516 (100%) 2093 (46.3%) 2108 (46.7%) 1786 (39.6%) 2259 (50.0%) 2306 (51.1%) 2014 (44.6%) Pattern recognition ( pr ) 2093 (84.0%) 2493 (100%) 1897 (76.1%) 1589 (63.7%) 1785 (71.6%) 1785 (71.6%) 1633 (65.5%) Blinded expert 1 ( e1 ) 2108 (78.8%) 1897 (71.0%) 2674 (100%) 2252 (84.2%) 1897 (70.9%) 1906 (71.3%) 1906 (64.1%) Blinded expert 2 ( e2 ) 1786 (77.5%) 1589 (68.9%) 2252 (97.7%) 2305 (100%) 1588 (68.9%) 1599 (69.4%) 1432 (62.1%) Metabolite spectral fitting (jMRUI) 1 ( f1 ) 2259 (47.3%) 1785 (37.3%) 1897 (39.7%) 1588 (33.2%) 4777 (100%) 4723 (98.9%) 4007 (83.9%) Metabolite spectral fitting (CSItools) 2 ( f2 ) 2306 (47.1%) 1785 (36.4%) 1906 (38.9%) 1599 (33.6%) 4777 (96.4%) 4900 (100%) 4010 (81.8%) Line functions spectral fitting (AMARES) ( fl ) 2014 (49.7%) 1633 (40.3%) 1715 (42.3%) 1432 (35.3%) 4007 (98.8%) 4010 (98.9%) 4055 (100%)

Percentages (in parentheses) indicate the amount of overlap between the methods in the respective row. As an example, among the 4516 spectra evaluated in the anatomic inspection of the data ( an , first row), a subset of 44.6% (2014 spectra) could be evaluated by fitting resonance line models ( fl ). Expert 1 and expert 2 labeled 2674 and 2305 spectra, respectively, and 2252 spectra could be labeled by both.

Automated Versus Rater

Experts’ single-voxel consensus

Line fitting

Figure 5, Spatial information emphasizes binary decisions, as demonstrated by the scatter plot (a) , in which anatomic knowledge as a function of the average evaluation by both readers without anatomic knowledge is shown. Whereas the evaluation without anatomic information makes use of all five classes, the one with anatomic details is clustered in the extreme classes 1 and 5. The box plots (b) show results from the evaluation with anatomic information ( y axis), grouped by the decisions from visual inspection without this information ( x axis). Median ( thick black line ), quartiles ( box extensions ), and outliers ( whiskers and circles ) of the distributions are also shown. The curved black line indicates the trend approximated by a local polynomial regression.

Figure 6, CC/C value ([I choline + I creatine ]/I citrate ) as a function of the average expert's label of spectra according to the five-point scale (1 = definitely healthy, 2 = possibly healthy, 3 = undecided, 4 = possibly tumor, and 5 = definitely tumor). The horizontal axis shows the average label assigned to the spectrum from the visual inspections by the two readers. The vertical axis is the calculated CC/C ratio using method fl . Box plots show median ( thick black lines ), quartiles ( box extensions ), and outliers ( whiskers and points ) for the CC/C values of spectra. Dotted horizontal lines (at y = 0.89, 1.29, 1.96, and 5.34) indicate cutoff values for the assignments to classes 1 to 5. The average scores from human readers and the calculated CC/C values show a linear trend, also indicating good performance of the visual inspection of the two readers.

Pattern recognition

Single-Voxel Versus Spatial Analysis

Experts’ labels

Experts’ labels versus automated methods

Evaluation time

Evaluable data

Performance of Automated Methods

Evaluation of single-voxel spectra

Visual inspection

Need for a Spatial Analysis

Visual inspection of the MRSI data

