Machine learning-based imaging biomarkers improve statistical power in clinical trials

Radiomic models, which leverage complex imaging patterns and machine learning, are increasingly accurate in predicting patient response to treatment and clinical outcome on an individual patient basis. In this work, we show that this predictive power can be utilized in clinical trials to signiﬁcantly increase statistical power to detect treatment effects or reduce the sample size required to achieve a given power. Akin to the historical control paradigm, we propose to utilize a radiomic prediction model to generate a pseudo-control sample for each individual in the trial of interest. We then incorporate these pseudo-controls into the analysis of the clinical trial of interest using classical and well established statistical tools, and investigate statistical power. Effectively, this approach utilizes each individual’s radiomics-based predictor of outcome for comparison with the actual outcome, potentially increasing statistical power considerably, depending on the accuracy of the predictor. In simulations of treatment effects based on real radiomic predictive models from brain cancer and prodromal Alzheimer’s Disease, we show that this methodology can decrease the required sample sizes by as much as a half, depending on the strength of the radiomic predictor. We further ﬁnd that this method is most helpful when treatment effect sizes are small and that power grows with the accuracy of radiomic prediction.


BRIEF REPORT
Radiomic models, which leverage complex imaging patterns and machine learning, are increasingly accurate in predicting patient response to treatment and clinical outcome on an individual patient basis.In this work, we show that this predictive power can be utilized in clinical trials to significantly increase statistical power to detect treatment effects or reduce the sample size required to achieve a given power.Akin to the historical control paradigm, we propose to utilize a radiomic prediction model to generate a pseudo-control sample for each individual in the trial of interest.We then incorporate these pseudo-controls into the analysis of the clinical trial of interest using classical and well established statistical tools, and investigate statistical power.Effectively, this approach utilizes each individual's radiomics-based predictor of outcome for comparison with the actual outcome, potentially increasing statistical power considerably, depending on the accuracy of the predictor.In simulations of treatment effects based on real radiomic predictive models from brain cancer and prodromal Alzheimer's Disease, we show that this methodology can decrease the required sample sizes by as much as a half, depending on the strength of the radiomic predictor.We further find that this method is most helpful when treatment effect sizes are small and that power grows with the accuracy of radiomic prediction.With the growing availability of big data in medical imaging, 10 a central focus has emerged on the development of increasingly complex tools for their analysis with the primary goal of 12 individualized predictions (5).In this paper, we propose har-13 nessing these powerful machine learning tools for the analysis Data used in preparation of this article were obtained from the ADNI database (adni.loni.usc.edu).As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report.
To whom correspondence may be addressed.E-mails: christos.davatzikos@pennmedicine.upenn.eduor rshi@pennmedicine.upenn.eduGBM therapies, which aim to prolong survival after diagnosis.103 We conduct our simulated treatment study using 134 patients 104 who were treated for newly diagnosed GBM at the Hospital of 105 the University of Pennsylvania between 2006 and 2013.The 106 actual median survival in this sample was 12 months, and 107 survival data were assessed for all subjects with no loss to 108 follow-up.Detailed demographics and a clinical description of 109 these subjects have been previously published (4).For studies 110 involving these data, we investigate the use of cross-validated 111 predictions of survival time based on radiomic analyses of 112 pre-and post-contrast T1-weighted, FLAIR, diffusion, and 113 perfusion imaging acquired pre-operatively at diagnosis.This 114 GBM predictive model utilizes an SVM to differentiate short, 115 medium, and long survival (4).

116
Statistical Methods.All hypothesis testing is conducted assum-117 ing a 5% type I error rate and using two-sided alternatives.For 118 our continuous outcome analyses, we apply linear regression 119 modeling of the outcome and employ Wald tests to assess 120 whether treatment groups differed in their outcomes either i) 121 adjusting for the radiomic predictor by inclusion as covariate, 122 or the classical approach with corresponds to ii) not adjusting 123 for the radiomic predictor.For time-to-event outcomes, we 124 assess differences between treatment groups with and without adjustment for the radiomic prediction by assuming an 126 accelerated failure time model.

127
We conduct two sets of real data simulations: one set 128 focusing on cognitive decline in AD, and one set focusing 129 on GBM survival outcomes.For both, we sample without 130 replacement twice from the observed data: for the first group 131 indexed by i = 1, . . ., n/2, we set our treatment indicator 132 Ai = 0 and record the observed outcome Yi0, as well as the 133 value of the radiomic predictor Xi at baseline.For the second 134 group, indexed by i = n/2+ 1, . . ., n, we introduce a treatment 135 effect γ, set our treatment indicator Ai = 1, and again record 136 outcome Yi0 and baseline radiomic predictor measurement 137 Xi.We repeat this process 1000 times, recording the p-value 138 corresponding to the test for treatment effect each time.We 139 calculate type I error rate and power as the percentage of time 140 the treatment effect is significant at the α = 0.05 level, where 141 γ is set to 0 to assess type I error and a non-zero value to assess 142 power.In order to quantify the sample size benefits from using 143 this method, we repeat the above procedure for a range of 144 sample sizes n, and the smallest n for which power reaches 145 80% is recorded.We explore this for a range of hypothetical 146 effect sizes, which is defined here as γ divided by the standard 147 deviation of the outcome.

7 ( 3 , 4 )
| Personalized Medicine | Clinical Trials I n recent decades, rapid advances in technology have in- 1 creased the amount of neuroimaging data available to re-2 searchers at an unprecedented rate (1, 2).Machine learning 3 methods empower the integration of this high-dimensional 4 data into powerful individualized predictive markers that have 5 been shown to be useful for tasks such as diagnosis and prog-6 nosis in diseases such as Alzheimer's disease and brain cancers .Predictive modeling is poised to receive the benefits of 8 the large and varied nature of this data.

14
of clinical trials by using them as a means to inform statistical 15 analyses with individualized estimates of clinical outcome.We 16 therefore arrive at the concept of individualized evaluation of 17 treatment effects in clinical trials.18 There is an extensive literature on the use of historical 19 controls to supplement data from new clinical trials that have 20 largely relied on pooling methods or Bayesian modeling (6, 7).21 Whereas these methods augment data for a trial by incorpo-22 rating historical data on the group level, high-dimensional 23 predictors offer the opportunity to augment current trials by 24 incorporating historical data in the form of individualized 25 predictions at the individual level.This allows for a more pre-26 cise evaluation of the treatment effect for each person, rather 27 than relying on a group-level effect that determines average 28 outcome.29 Here, we present a method that draws on these ideas while 30 leveraging powerful predictive biomarkers and the wealth of 31 data used to build them to generate personalized predictions of 32 outcome.These predictions can be used directly in the analysis 33 of data in clinical trials.We find that this methodology can 34 substantially improve statistical power for detecting treatment 35 effects, depending on the predictive power of the machine 36 learning-based model.Correspondingly, this approach can 37 substantially reduce the sample size needed to achieve the 38 same power in a clinical trial.39 Methods 40 Our method relies on access to two sets of data: i) a current 41 clinical trial designed to study an outcome of interest and ii) 42 a cohort of similar subjects treated according to the current 43 standard of care.We narrow our focus in this work to radiomic 44 predictors and associated studies, so we assume that imaging 45 data has been gathered at study enrollment for both sets of 46 trials.However, more broadly we only require a predictive 47 model that is based on sufficient information measured at 48 baseline on each participant in both datasets to predict the 49 outcome under standard of care.The techniques proposed here 50 are also directly applicable to other -omic modeling scenarios, 51 and generally, to any predictive marker of standard of care 52 outcome.53 Our basic premise herein is that we can utilize previously 54 collected imaging data to build a radiomic prediction model, 55 fully validate it, and use it to generate a single score that 56 summarizes imaging patterns that predict future clinical out-57 come of interest, such as patient survival, progression-free 58 survival, or response to treatment (Figure 1).The model 59 that is built based on the historical trial can then be used 60 in conjunction with data collected from the current trial to 61 generate individualized values of the radiomic score for each of 62 the current participants.These individualized scores represent 63 R.T.S. and C.D. designed research; C.L. performed research; C.L. and M.H. analyzed data; and C.L., M.H., R.T.S., and C.D. wrote the paper.The authors declare no conflict of interest.

Fig. 1 .
Fig. 1.A: Workflow for implementing the proposed method in a new clinical trial.B (continuous) and C (survival outcome): Schematic diagram for individualized predictions that are generated for each person in the current trial, where the solid lines indicate observed outcome for the participants of the current trial and the dashed lines indicate predicted outcome for those participants they not been treated.

Fig. 2 .
Fig.2.Results from simulated studies under two scenarios.With the addition of historical controls, the required sample size for 80% power is markedly lower than using classical two-sample clinical trial analysis.