Brain Functional Connectivity and Anatomical Features as Predictors of Cognitive Behavioral Therapy Outcome for Anxiety in Youths

Background: Because pediatric anxiety disorders precede the onset of many other problems, successful prediction of response to the first-line treatment, cognitive-behavioral therapy (CBT), could have major impact. However, existing clinical models are weakly predictive. The current study evaluates whether structural and resting-state functional magnetic resonance imaging can predict post-CBT anxiety symptoms. Methods: Two datasets were studied: (A) one consisted of n=54 subjects with an anxiety diagnosis, who received 12 weeks of CBT, and (B) one consisted of n=15 subjects treated for 8 weeks. Connectome Predictive Modeling (CPM) was used to predict treatment response, as assessed with the PARS; additionally we investigated models using anatomical features, instead of functional connectivity. The main analysis included network edges positively correlated with treatment outcome, and age, sex, and baseline anxiety severity as predictors. Results from alternative models and analyses also are presented. Model assessments utilized 1000 bootstraps, resulting in a 95% CI for R2, r and mean absolute error (MAE). Outcomes: The main model showed a mean absolute error of approximately 3.5 (95%CI: [3.1-3.8]) points a R2 of 0.08 [−0.14 - 0.26] and r of 0.38 [0.24 – 0.511]. When testing this model in the left-out sample (B) the results were similar, with a MAE of 3.4 [2.8 – 4.7], R2-0.65 [−2.29 – 0.16] and r of 0.4 [0.24 – 0.54]. The anatomical metrics showed a similar pattern, where models rendered overall low R2. Interpretation: The analysis showed that models based on earlier promising results failed to predict clinical outcomes. Despite the small sample size, the current study does not support extensive use of CPM to predict outcome in pediatric anxiety.


Supplementary Results
Processing choices Across all measures, fingerprinting allowed identification of subjects based on their brain imaging profiles at rates that far exceeded chance.Nevertheless, the accuracy of fingerprinting, as well as that of CPM/APM, varied according to choices made during processing.Global signal regression (GSR) improved fingerprinting accuracy (53/66 vs 46/66 for baseline used to identify follow-up; 56/66 vs. 48/66 for follow-up used to identify baseline) as well as CPM accuracy (cross-validation: MAE difference = 0.0903, p = 0.0003; external validation: MAE difference = 0.1005, p = 0.0001).For these comparisons we used MAE as it uses the same units of the predicted quantities (PARS), thus serving as a better proxy of the size of the effect attributable to these differences.Using partial correlations instead of simple correlations reduced accuracy of fingerprinting (53/66 vs 33/66 for baseline used to identify follow-up; 56/66 vs. 32/66 for follow-up used to identify baseline), but improved accuracy of CPM (cross-validation: MAE difference = 0.2101, p = 0.0001; external validation: MAE difference = 0.1154, p = 0.0002).
However, other processing choices also affected accuracy, and the generally better models used full (not partial) correlations.
Likewise, the accuracy of APM and anatomical fingerprinting varied depending on processing choices, although with a less consistent pattern than for CPM.Accuracy was perfect or near perfect for area, thickness, curvature, and sulcal depth, with at least 64/66 correct identifications for all these measurements, with or without smoothing, but not as close to perfect for gray/white matter contrast (59/66 with smoothing, although 64/66 without).Gray/white matter contrast, however, generally produced the best APM results compared, for example, with thickness (cross-validation: MAE difference = 6.1119, p = 0.0001; external validation: MAE difference = 0.2528, p = 0.0001) or surface area (cross-validation: MAE difference = 1.2642, p = 0.0100; external validation: MAE difference = 0.1274, p = 0.0001).While smoothing reduced the accuracy of fingerprinting only for gray/white matter contrast without affecting the accuracy for other anatomical measures, it did, counterintuitively, reduce the accuracy of APM for most models and measurements considered.For gray/white matter contrast, smoothing in some cases led to no vertices being detected in the first stage of the predictive modeling.
For both CPM and APM, using (a) a regression model that did not include age, sex, and scanner (where applicable) as nuisance variables, (b) a model in which these were regressed out from data and design, or (c) a model in which these were used as predictors, led to sometimes improved or reduced MAE.Likewise, the choice of edges in the first stage of CPM as (a) positively correlated only, (b) negatively correlated only, (c) both, or (d) the edges with highest differential power from fingerprinting, also led to inconsistent improvements or reductions in the MAE.Weighting the edges by their p-value in logarithmic scale (a rough measure of effect size given the fixed sample sizes) also sometimes improved, sometimes reduced accuracy.The Supplementary Material to this paper includes a large spreadsheet file containing accuracy results (MAE, r, and R2) for CPM and APM, in both cases using cross-validation within Dataset A and external validation using Dataset B. In the spreadsheet, the columns represent: • filename: A unique identifier for the text file containing the results shown in the corresponding row (not relevant for reading these results).
• model: Indicates whether the model included variables such as age, sex, and baseline PARS were used as predictors (predictors), as nuisance variables (residualized), or if omitted from the model (nonuisance).
• fwhm (APM only): Indicates the amount of smoothing applied: no smoothing (0) or smoothing with a Gaussian kernel of full width at half maximum of 15 mm (15).
• denoise (CPM only): Indicates whether denoising used AROMA components and white matter and CSF signal (AROMA) or further included global signal regression (AROMA-GSR).
• netmat (CPM only): Indicates whether the connectivity matrix used for this model used partial correlations (partial) or not (full).
• vertices (APM) or edges (CPM): Indicates whether edges/vertices selected in the first regression of the predictive modeling were those with a positive (pos) correlation with PARS, negative (neg), or both (both), or if the edges/vertices selected were those with highest differential power from fingerprinting (finger).
• weighted: Indicates whether a simple sum of values from edges/vertices selected in the first stage of the predictive model was used (FALSE) or whether a weighted sum based on the negative logarithm of p-values was used (TRUE), thus giving stronger weight to more significant edges/vertices.
• pars: Indicates what PARS was tentatively predicted by the model: PARS at the start of treatment (i.e., week 0, parsTotalStart), PARS at the end of treatment (i.e., week 8 or 12 depending on the dataset, parsTotalEnd), the difference between PARS at start and end (parsDelta), or PARS at the end of treatment after taking the PARS at start as nuisance (BaselineAsNuisance).For BaselineAsNuisance, there are no models configured as nonuisance.
• train, test (external validation only): Indicators of what datasets were used for training or for testing (i.e., datasets A or B).The spreadsheet also includes training and testing on the same dataset; such models are expected to have excellent performance, and were run only for sanity checking; results are not meant to be used otherwise.
• MAE, MAE_lowerCI, MAE_upperCI: Mean absolute error, along with lower and upper bounds for the corresponding 95% bootstrap confidence intervals.
• Corr, Corr_lowerCI, Corr_upperCI: Correlation between observed and predicted values, along with lower and upper bounds for the corresponding 95% bootstrap confidence intervals.
• NumberBootStraps: Number of bootstraps used to produce the confidence intervals.
• NumberOfVertices (APM) or NumberOfEdges (CPM): For external validation, this is the number of vertices or edges selected in the first stage of the predictive modeling.For cross-validation, this is the average across leave-one-out folds.If there are zero edges or vertices selected in the first stage, models configured as predictors and residualized are expected to yield the same results.
Colors in the spreadsheet are conditional on the values shown, and convey no information other than already represented by the respective numbers.

Table 2 : Comparison between included and excluded subjects in the CPM and APM analyses -Dataset A.
* Two-sample t-test or Chi-squared test when appropriate.**Patients may have more than one diagnosis, thus the sum is higher than 100%.Supplementary* Two-sample t-test or Chi-squared test when appropriate.**Patients may have more than one diagnosis, thus the sum is higher than 100%.Supplementary

Table 3 : Comparison between included and excluded subjects in the CPM and APM analyses -Dataset B.
* Patients may have more than one diagnosis, thus the sum is higher than 100%. *

Table 4 : Comparison between included and excluded subjects in the fingerprinting analyses.
* Two-sample t-test or Chi-squared test when appropriate.Supplementary

Table 5 : Detailed descriptive statistics for the fingerprinting sample. In the fingerprinting analysis HV were included to maximize the sample with baseline and follow-up scans.
* Patients may have more than one diagnosis, thus the sum is higher than 100%.
* Two-sample t-test or Chi-squared test when appropriate.*