Home Design Considerations for using PET as a Response Measure in Single Site and Multicenter Clinical Trials
Post
Cancel

Design Considerations for using PET as a Response Measure in Single Site and Multicenter Clinical Trials

Rationale and Objectives

Positron emission tomography (PET) is used to evaluate response to therapy with increasing interest in having PET provide endpoints for clinical trials. Here we demonstrate impacts of PET measurement error and choice of quantification method on clinical trial design.

Materials and Methods

Sample size was calculated for two-arm randomized trials with percent change in 18 F-fluorodeoxyglucose (FDG) PET uptake as an efficacy endpoint. Two methods of uptake quantification were considered: standardized uptake values (SUVs) and kinetic measures from dynamic imaging. Calculations assumed a 20 percentage point difference in treatment groups’ average percent change, and yielded 80% power at α = 0.05. The range of precision (10%–40%) in PET uptake measures was based on review of the literature. The range of SUV sensitivities (50%–100%) relative to kinetic analyses was based on a study of 75 locally advanced breast cancer patients.

Results

Sample sizes increased from 8 to 126 as PET precision worsened from 10% to 40% at full measurement sensitivity to true change. In a subgroup with low initial FDG uptake, a sample size of 126 was required under 20% standard deviation using clinical SUVs. More sophisticated imaging quantification could reduce this sample size to 32.

Conclusions

The dependence of sample size on measurement precision and the sensitivity of imaging measures to true change should be considered in single site and multicenter PET trials to avoid underpowered studies with inconclusive results. Sophisticated PET imaging methods that are more sensitive to changes in uptake may be advantageous in early studies with limited patient numbers.

Multicenter trials are the gold standard for establishing new standards for clinical practice in oncology. There is increasing interest in using positron emission tomography (PET) measures to evaluate response to therapy and provide early and robust endpoints for these clinical trials . Early response endpoints from functional imaging modalities such as PET versus anatomical radiographic measures could allow trial patients to more quickly cross over to an expanding number of salvage therapies following progression. Standardization to reduce measurement error has been suggested to address some known challenges in application of PET in clinical trials ; however, the combined impact of these standards on clinical trial design has not been evaluated. There has also been relatively little study of the impact of the choice of PET image methodology and analysis on study design, especially sample size or study power estimation.

Sample size estimation is an important aspect of study design because it is typically a key driver of trial costs and study duration, especially when patient accrual rates are limited. Sample size requirements are impacted by selected power, significance level, effect size, and measurement error as discussed later in this article. Additional design features such as different classification schemes for response (e.g., European Organization for Research and Treatment of Cancer (EORTC; ) or PET Response Criteria in Solid Tumors (PERCIST; )), group randomization, stratification, or expected attrition may also impact sample size, but these aspects were beyond the scope of this study. An example plot of sample size versus effect size in Figure 1 shows the impacts of measurement error and expected effect size on sample size. The required sample size of a study will increase with an increase in measurement error or with a decrease in expected effect size (e.g., selecting a 20% response threshold versus 30% in PERCIST , compared to no change on average in a control group) or with a decrease in ability to measure the entire range of change associated with the effect size.

Open full size image

Figure 1

Impact of measurement error and effect size on required sample sizes from the two-sample t -test (80% power, type I error rate [α] = 0.05).

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Materials and methods

Trial Design and Trial Parameters

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

n=4⋅(Zα+Zβ)2⋅σ2Δ2 n

=

4

·

(

Z

α

+

Z

β

)

2

·

σ

2

Δ

2

The traditional sample size formula based on the two-sample t -test has a 2 in the numerator and calculates only the number of patients required in one arm. Our sample size equation has a 4 in the numerator to calculate the total sample size required for both arms of the study. The significance level (α, also called the false-positive or type I error rate) is the acceptable probability of incorrectly rejecting the null hypothesis (declaring a treatment group difference exists when in fact it does not). The power (1-β, or probability of not committing a type II error) is the probability of finding a treatment group difference under the condition of a specific effect size. The effect size (Δ) is the expected treatment group difference, and the response variance (σ 2 ) is the common variance of the response variable in the two groups. Z α is the standard normal distribution critical value for a two-sided test of size α, and Z β is the critical value for power of 1-β. Common parameters for randomized Phase II trials are an α of 0.05 and 80% power, which correspond to Z α = 1.96 and Z β = 0.84. Under the null hypothesis, the treatments will have equal average change in FDG uptake and Δ = 0. An estimate of variance (σ 2 ) can be determined from review of the literature or may require additional early-phase imaging trials.

Get Radiology Tree app to read full this article<

Measurement Error Parameters

Get Radiology Tree app to read full this article<

Sensitivity of Different Methods of PET Image Analysis to Underlying True Change

Get Radiology Tree app to read full this article<

Table 1

Selected Sample Size Calculations for Randomized Trial to Detect a True Effect of 20% Difference in Average Percent FDG Uptake Change between Two Treatments (80% Power, Type I Error Rate [α] = 0.05)

Trial Scenario σ ∗ Sample Size First Tertile SUV baseline ≤3

50% Sensitivity † Second Tertile 3≤ SUV baseline ≤5.2

70% Sensitivity † Third Tertile SUV baseline >5.2

90% Sensitivity † Kinetic Modeling

100% Sensitivity † Single site 10% 32 17 10 8 Multicenter (good calibration) 20% 126 65 39 32 Multicenter (poor calibration) 40% 503 257 156 126

FDG, 18 F-fluorodeoxyglucose; SUV, standardized uptake value.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Results

Get Radiology Tree app to read full this article<

Figure 2, Standardized uptake value (SUV) sensitivity to change in 18 F-fluorodeoxyglucose (FDG) uptake measured by FDG flux (Ki) in a cohort of 75 locally advanced breast cancer patients undergoing neoadjuvant chemotherapy, by tertiles of baseline SUV. For a change in Ki of −100%, (a) the predicted percent change in SUV is −52% for the first tertile (baseline SUV ≤3), (b) −71% for the second tertile (3 ≤baseline SUV ≤5.2), and (c) −88% for the third tertile (baseline SUV >5.2).

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Figure 3, Sample size for two-sample t-test (80% power, type I error rate [α] = 0.05): Impact of measurement error and SUV sensitivity (sens) to a true effect size of 20%. Vertical lines are SUV sens corresponding to tertiles from 75 locally advanced breast cancer patients undergoing neoadjuvant chemotherapy.

Table 2

Selected Power Calculations for Randomized Trial to Detect a True Effect of 20% Difference in Average Percent FDG Uptake Change between Two Treatments (80% Power, Type I Error Rate [α] = 0.05)

Trial Scenario n Power First Tertile SUV baseline ≤3

50% Sensitivity ∗ Second Tertile 3≤ SUV baseline ≤5.2

70% Sensitivity ∗ Third Tertile SUV baseline > 5.2

90% Sensitivity ∗ Kinetic Modeling

100% Sensitivity ∗ Single site, σ † = 10% 20 61% 88% 98% 99% 30 78% 97% 99% 99% Multicenter, σ † = 20% 50 42% 70% 89% 94% 100 71% 94% 99% 99% Multicenter, σ † = 40% 100 24% 42% 61% 71% 300 58% 86% 97% 99%

FDG, 18 F-fluorodeoxyglucose; SUV, standardized uptake value.

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Discussion

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

Conclusions

Get Radiology Tree app to read full this article<

Acknowledgments

Get Radiology Tree app to read full this article<

Get Radiology Tree app to read full this article<

References

  • 1. Juweid M.E., Cheson B.D.: Positron-emission tomography and assessment of cancer therapy. N Engl J Med 2006; 354: pp. 496-507.

  • 2. Wahl R.L., Jacene H., Kasamon Y., et. al.: From RECIST to PERCIST: evolving considerations for PET response criteria in solid tumors. J Nucl Med 2009; 50: pp. 122S-150S.

  • 3. Eisenhauer E.A., Therasse P., Bogaerts J., et. al.: New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur J Cancer 2009; 45: pp. 228-247.

  • 4. Young H., Baum R., Cremerius U., et. al.: Measurement of clinical and subclinical tumour response using [18F]-fluorodeoxyglucose and positron emission tomography: review and 1999 EORTC recommendations. European Organization for Research and Treatment of Cancer (EORTC) PET Study Group. Eur J Cancer 1999; 35: pp. 1773-1782.

  • 5. Shankar L.K., Hoffman J.M., Bacharach S., et. al.: Consensus recommendations for the use of 18F-FDG PET as an indicator of therapeutic response in patients in National Cancer Institute Trials. J Nucl Med 2006; 47: pp. 1059-1066.

  • 6. Boellaard R.: Standards for PET image acquisition and quantitative data analysis. J Nucl Med 2009; 50: pp. 11S-20S.

  • 7. Weber W., Ziegler S., Thodtmann R., et. al.: Reproducibility of metabolic measurements in malignant tumors using FDG PET. J Nucl Med 1999; 40: pp. 1771-1777.

  • 8. Weber W.A.: Positron emission tomography as an imaging biomarker. J Clin Oncol 2006; 24: pp. 3282-3292.

  • 9. Wahl R.L., Zasadny K., Helvie M., et. al.: Metabolic monitoring of breast cancer chemohormonotherapy using positron emission tomography: Initial evaluation. J Clin Oncol 1993; 11: pp. 2101-2111.

  • 10. Dunnwald L.K., Gralow J.R., Ellis G.K., et. al.: Tumor metabolism and blood flow changes by positron emission tomography: relation to survival in patients treated with neoadjuvant chemotherapy for locally advanced breast cancer. J Clin Oncol 2008; 26: pp. 4449-4457.

  • 11. Ellis M.J., Gao F., Dehdashti F., et. al.: Lower-dose vs high-dose oral estradiol therapy of hormone receptor-positive, aromatase inhibitor-resistant advanced breast cancer: a phase 2 randomized study. JAMA 2009; 302: pp. 774-780.

  • 12. Minn H., Zasadny K.R., Quint L.E., et. al.: Lung cancer: reproducibility of quantitative measurements for evaluating 2-[F-18]-fluoro-2-deoxy-D-glucose uptake at PET. Radiology 1995; 196: pp. 167-173.

  • 13. Nakamoto Y., Zasadny K.R., Minn H., et. al.: Reproducibility of common semi-quantitative parameters for evaluating lung cancer glucose metabolism with positron emission tomography using 2-deoxy-2-[18F]fluoro-D-glucose. Mol Imaging Biol 2002; 4: pp. 171-178.

  • 14. Krak N.C., Boellaard R., Hoekstra O.S., et. al.: Effects of ROI definition and reconstruction method on quantitative outcome and applicability in a response monitoring trial. Eur J Nucl Med Mol Imaging 2005; 32: pp. 294-301.

  • 15. Nahmias C., Wahl L.M.: Reproducibility of standardized uptake value measurements determined by 18F-FDG PET in malignant tumors. J Nucl Med 2008; 49: pp. 1804-1808.

  • 16. Doot R.K., Scheuermann J.S., Christian P.E., et. al.: Instrumentation factors affecting variance and bias of quantifying tracer uptake with PET/CT. Med Phys 2010; 37: pp. 6035-6046.

  • 17. Fahey F.H., Kinahan P.E., Doot R.K., et. al.: Variability in PET quantitation within a multicenter consortium. Med Phys 2010; 37: pp. 3660-3666.

  • 18. Lockhart C.M., MacDonald L.R., Alessio A.M., et. al.: Quantifying and reducing the effect of calibration error on variability of PET/CT standardized uptake value measurements. J Nucl Med 2011; 52: pp. 218-224.

  • 19. Takahashi Y., Oriuchi N., Otake H., et. al.: Variability of lesion detectability and standardized uptake value according to the acquisition procedure and reconstruction among five PET scanners. Ann Nucl Med 2008; 22: pp. 543-548.

  • 20. Velasquez L.M., Boellaard R., Kollia G., et. al.: Repeatability of 18F-FDG PET in a multicenter phase I study of patients with advanced gastrointestinal malignancies. J Nucl Med 2009; 50: pp. 1646-1654.

  • 21. Freedman N.M., Sundaram S.K., Kurdziel K., et. al.: Comparison of SUV and Patlak slope for monitoring of cancer therapy using serial PET scans. Eur J Nucl Med Mol Imaging 2003; 30: pp. 46-53.

  • 22. Lammertsma A.A., Hoekstra C.J., Giaccone G., et. al.: How should we analyse FDG PET studies for monitoring tumour response?. Eur J Nucl Med Mol Imaging 2006; 33: pp. 16-21.

  • 23. McDermott G.M., Welch A., Staff R.T., et. al.: Monitoring primary breast cancer throughout chemotherapy using FDG-PET. Breast Cancer Res Treatment 2007; 102: pp. 75-84.

  • 24. Doot R.K., Dunnwald L.K., Schubert E.K., et. al.: Dynamic and static approaches to quantifying 18F-FDG uptake for measuring cancer response to therapy, including the effect of granulocyte CSF. J Nucl Med 2007; 48: pp. 920-925.

  • 25. Choi M., Heilbrun L.K., Venkatramanamoorthy R., et. al.: Using 18F-fluorodeoxyglucose positron emission tomography to monitor clinical outcomes in patients treated with neoadjuvant chemo-radiotherapy for locally advanced pancreatic cancer. Am J Clin Oncol 2010; 33: pp. 257-261.

  • 26. Dunnwald L.K., Doot R.K., Specht J.M., et. al.: PET tumor metabolism in locally advanced breast cancer patients undergoing neoadjuvant chemotherapy: value of static versus kinetic measures of fluorodeoxyglucose uptake. Clin Cancer Res 2011; 17: pp. 2400-2409.

  • 27. Zasadny K.R., Wahl R.L.: Standardized uptake values of normal tissues at PET with 2-[fluorine-18]-fluoro-2-deoxy-D-glucose: variations with body weight and a method for correction. Radiology 1993; 189: pp. 847-850.

  • 28. Jacene H.A., Leboulleux S., Baba S., et. al.: Assessment of interobserver reproducibility in quantitative 18F-FDG PET and CT measurements of tumor response to therapy. J Nucl Med 2009; 50: pp. 1760-1769.

  • 29. Hoetjes N.J., van Velden F.H., Hoekstra O.S., et. al.: Partial volume correction strategies for quantitative FDG PET in oncology. Eur J Nucl Med Mol Imaging 2010; 37: pp. 1679-1687.

This post is licensed under CC BY 4.0 by the author.