Development and validation of a neuroimaging biomarker for electroconvulsive therapy outcome in depression: a multicenter machine learning analysis

Electroconvulsive therapy (ECT) is the most effective intervention for patients with treatment resistant depression. A clinical decision support tool could guide patient selection to improve the overall response rate and avoid ineffective treatments with adverse effects. Initial small-scale, mono-center studies indicate that both structural magnetic resonance imaging (MRI) and functional MRI biomarkers may predict ECT outcome, but it is not known whether those results can generalize to data from other centers. Here, we used MRI data of 189 depressed patients from seven participating centers of the Global ECT-MRI Research Collaboration (GEMRIC) to develop and validate neuroimaging biomarkers for ECT outcome in a multi-center setting. We used multimodal data (i.e., clinical, structural MRI and resting-state functional MRI) and evaluated which data modalities or combinations thereof could provide the best predictions for treatment response ([≥]50% symptom reduction) or remission (minimal symptoms after treatment) using a support vector machine (SVM) classifier. Remission classification using a combination of gray matter volume with functional connectivity led to good performing models with 0.82-0.84 area under the curve (AUC) when trained and tested on samples coming from all centers, and remained acceptable when validated on other centers with 0.71-0.73 AUC. These results show that multimodal neuroimaging data is able to provide good prediction of remission with ECT for individual patients across different treatment centers, despite significant variability in clinical characteristics across centers. This suggests that these biomarkers are robust, indicating that future development of a clinical decision support tool applying these biomarkers may be feasible.


Introduction
Electroconvulsive therapy (ECT) is currently the most effective intervention for patients with treatment resistant depression 1 . Despite its high efficacy, ECT remains underutilized, as only 1-2% of patients with severe or persistent depression receive ECT 2,3 . Although approximately 48% of treatment resistant patients recover with ECT, it is also associated with adverse cognitive effects and may be regarded as more invasive than other treatment options because the use of anesthesia is essential 4 .
Furthermore, ECT is relatively expensive and non-responsiveness can only be determined after multiple sessions. Information that better predicts treatment outcome would enable patient selection thereby further improving the overall response rate and avoiding ineffective treatment with adverse effects. A personalized recommendation about the expected benefit of ECT would be a valuable addition to the treating physician's clinical judgement, and may increase its use in clinical practice.
Attempts to develop instruments that may predict ECT outcome date back to the 1950s 5 . Meta-analyses have associated several clinical characteristics with beneficial ECT outcome, in particular no history of treatment resistance, older age and psychotic symptoms 6,7 . However, their predictive power is insufficient to guide individual patient selection 4,[8][9][10][11] . Recent studies have started using neuroimaging data to predict ECT outcome at the individual level using machine learning analysis, which can construct multivariate prediction models using all the available data. Initial small-scale studies have shown that both structural magnetic resonance imaging (MRI) and functional MRI . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Despite these promising results, the existing studies have been limited by using small samples and mono-center settings. This reduces the possibility for models to generalize to new samples across centers. Although machine learning models typically perform better when trained on larger samples from the same center, classification accuracy of larger multicenter studies tends to decrease, presumably due to increased clinical (e.g., adults vs. elderly) and technological (e.g., different MRI hardware and protocols) variability across centers [22][23][24] . In order to develop robust and generalizable neuroimaging biomarkers for ECT outcome, we used data from the Global ECT-MRI Research Collaboration (GEMRIC) and validated classification performance in a multicenter setting 25 . We used multimodal data (i.e., clinical, structural MRI (sMRI), and resting-state functional MRI (rs-fMRI)) and evaluated which data modalities or combinations thereof might provide the best predictions. As previous studies and clinical trials have used either treatment response (at least 50% symptom reduction) or remission (minimal symptoms after treatment) as outcome criterion, we assessed prediction accuracy for both criteria. Additionally, we evaluated whether model performance would increase when only data from centers with reasonable sample sizes are used. Finally, we visualized the brain regions that were most informative to the classifications, in order to gain insight into the brain regions predictive of ECT 6 outcome. To adhere to guidelines on transparent reporting of multivariable prediction models for individual prognosis or diagnosis (TRIPOD), the checklist is included in the Supplementary Files 26 .
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint

Participants
We used data from GEMRIC, an international consortium that contains the largest multi-center database of neuroimaging on ECT 25,27 . All contributing sites received ethics approval from their local ethics committee or institutional review board. In addition, the centralized mega-analysis was approved by the Regional Ethics  28 . Treatment response was defined as ≥50% HAM-D decrease compared to baseline and remission as post-ECT HAM-D score ≤7. ECT stimulus parameters varied between different centers, including electrode placement. As GEMRIC consists of samples ranging from very small (<20 patients) to relatively large (>40 patients), we performed all analyses on the entire cohort and for centers with ≥20 patients (three centers, N=109) in order to ensure classifiers were provided with sufficient examples per center. A description of centers-specific ECT procedures and image acquisition is provided elsewhere 25,27 .
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint Supplementary Tables 2-3. Structural T1weighted scans were acquired using 1.5T and 3T scanners with a minimum resolution of 1.33 mm 3 and preprocessed using the CAT12 toolbox for voxel-based morphometry (VBM). Images were segmented into gray matter (GM), white matter (WM) and cerebrospinal fluid (CSF), normalized to MNI space using DARTEL registration, resampled to 1.5 mm 3 isotropic and spatially smoothed with an 8mm isotropic Gaussian kernel 29 . GM data were masked at 0.2 to exclude WM.

MRI acquisition parameters are listed in
150-266 rs-fMRI volumes were acquired with a TR of 1.7-3.0 seconds, in-plane resolution of 2.4-3.75 mm, and slice thickness of 3-5 mm. Preprocessing was performed using ANTs (https://github.com/ANTsX/ANTs) and FSL (http://fsl.fmrib.ox.ac.uk/), including brain extraction, boundary-based co-registration, motion correction, spatial smoothing with a 5mm isotropic Gaussian kernel, and normalization to a 2mm MNI template. Denoising was performed using ICA-AROMA, and depending on the type of analysis, high-pass (f>0.01) or bandpass filtering (0.009<f<0.08) was applied together with WM and CSF nuisance regression 30 . Denoised rs-fMRI data were resampled to 4mm isotropic. Subjects showing excessive motion were excluded 31,32 .
Only subjects that passed quality control for both rs-fMRI and sMRI were included for analysis, leading to a final sample of 189 patients (Supplementary Figure   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint 1 for a flowchart). Details on MRI preprocessing, quality control and machine learning are provided in the Supplementary Methods.

Feature extraction
We extracted commonly used MRI features from the preprocessed data. For sMRI, we used voxel-wise modulated GM maps (VBM) and 142 cortical and subcortical parcellations using the Neuromorphometrics atlas (NMM; provided by Neuromorphometrics, Inc). For rs-fMRI, we used group independent component analysis (ICA) to extract physiologically meaningful resting-state networks and reduce data dimensionality to 70 independent components 33 . Components reflecting nonneural signals were discarded, resulting in 53 spatial components for analysis. Groupinformation guided ICA was used to derive subject-specific time-series and spatial maps for the 53 signal components 34 . Time-series were used to calculate individual functional connectivity (FC) matrices that described pairwise connectivity between signal components with Pearson correlations (ICA-DR FC). Additionally, we used an atlas-based approach from Power et al., and extracted time-series from 264 functional areas to compute FC matrices (Power FC) 35 . Correlations were converted to z-scores with Fisher r-to-z transformation before entering classification.

Machine learning
Machine learning classifications were performed using linear support vector machine (SVM; LIBSVM 36 ) implemented in scikit-learn with stratified shuffle-split cross-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint validation (CV) with 100 iterations. At each iteration, stratified-splits were made by preserving the proportion of responders/remitters and non-responders/remitters from each center to obtain maximally homogeneous train-test splits in which 80% data was used for classifier training and 20% for testing. This CV procedure is further referred to as 'internal validation'. In addition, we addressed leave-one-site-out (LOSO) CV, in which all but one center was used to train the SVM while the remaining center was used to assess model performance (further referred to as 'external validation').
This procedure was repeated so that each center is used once for testing. LOSO reduces the risk of overfitting data from a single center but may result in large between-sample heterogeneity of training and test sets, resulting in lower classification performance compared to internal validation 37 . Hyper-parameters for SVM were optimized with gridsearch using nested cross-validation. We assessed classification performance using different sets of MRI features (VBM, NMM, ICA-DR FC, Power FC, and ICA spatial components), as well as using clinical data only (i.e., age, sex and pre-ECT HAMD scores) for baseline classification. Clinical data were always included for each classification. The primary performance metric was area under the receiver operator characteristic curve (AUC) and reported metrics were averaged across CV iterations 38,39 . Balanced accuracy, sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) are reported in Supplementary Tables 6-13.
Statistical significance of classification performance was assessed using a label permutation-testing framework with 1000 iterations 40 . Obtained p-values were corrected for multiple comparisons using False Discovery Rate (FDR; two-stage (non-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. negative); alpha=0.05). 95% confidence intervals (CI) for AUC were computed using the modified Wald-method 41 . To reduce computational burden, only spatial ICA classifications that resulted in AUC>0.75 were tested for significance. Finally, we assessed classification performance for multi-modal classifications combining anatomical and functional features: regional neuromorphometrics GM volumes with either ICA or Power-atlas based FC, and voxel-wise GM with either ICA or Power-atlas based FC.

Anatomical localization
To investigate which regions contributed most to the voxel-wise classification, we employed a method to estimate p-values for the weights of the SVM 42 . A statistic was computed incorporating the weight component value and the size of the margin, and an analytical approximation to the null-distribution obtained through permutation testing was used to calculate p-values.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Demographic data
Demographic data is presented in Table 1. Of the 189 included patients, 113 patients were ECT responders and 76 non-responders, and 76 were remitters and 113 nonremitters. As expected, patients with a favorable outcome were older, and higher symptom severity at baseline was associated with ECT response but not remission. No significant differences in sex, initial electrode placement and total number of ECTsessions were observed.
Demographic data for the three largest centers (with N≥20) used for additional analyses are described in Supplementary Tables 4-5. Differences in sample demographics and clinical characteristics between the three largest centers were similar to those seen in the entire sample. These findings highlight that there is considerable clinical heterogeneity between centers.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021.

Response prediction
The majority of the classification models for response prediction performed poorly with AUC<0.7 for internal validation, and none of the models remained significant with external validation after permutation testing with FDR correction. The results for response classification are presented in Figures 1 and 3 and Supplementary Tables 6, 8, 10-11. Although in clinical patient care response to ECT may be beneficial, reaching remission after treatment is most preferable. Therefore, we focus on the results obtained for remission prediction below.

All centers
We first evaluated prediction performance across centers using all data (N=189) with internal validation. Sample size per center ranged from 14 to 42. Prediction performance with internal validation was poor with AUC ranging between 0.58-0.67 across different MRI modalities ( Figure 1A). Classification using clinical variables resulted in a comparable AUC of 0.62. All these AUCs were statistically significant.
Classification using external validation hardly exceeded chance-level, with AUCs ranging between 0.51-0.58 and none were statistically significant. Classification using ICA networks did not exceed AUC>0.75 for either internal or external validation.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021.  Figure 1. Multi-center predictions for ECT treatment response and remission using unimodal MR data modalities. Panel A depicts classification performance using data from all centers and different MR modalities with internal validation (AUC is averaged over 100 stratified cross-validation splits). Panel B shows classification performance using data from all centers with external validation (leave-one-site-out cross-validation, scores are averaged across different center left out for model testing). Panel C depicts classification performance using data from the three largest centers with internal validation. Panel D shows classification performance using data from the three largest centers with external validation. VBM = voxel-based morphometry; NMM = Neuromorphometrics atlas; FC = functional connectivity; ICA = group information guided independent component analysis. Red dashed line depicts chance level performance (0.5 AUC). Asterisks indicate significant difference from chance level after permutation testing with false discovery rate correction for multiple comparisons (p<0.05, corrected).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Three largest centers
We next assessed prediction performance using a subsample of data containing three centers with N≥20 (N=109) to provide the machine learning classifier with sufficient samples per center. Classification performance with internal validation ranged between 0.52-0.83 AUC across different features used, and 0.65 AUC was obtained for classifications using clinical variables only ( Figure 1C). All AUCs obtained with internal validation showed statistical significance. Notably, the highest performance was  Table   9).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Multimodal analysis
Classification using a combination of anatomical and functional MRI measures with samples from all centers led to a maximum of 0.68 AUC using internal validation which was statistically significant, whereas 0.64 AUC for external validation did not obtain significance. We then assessed multimodal classification performance using the three largest centers only. Classification of voxel-wise GM with ICA-based FC led to the best performing model, 0.84 AUC using internal validation, which remained acceptable using external validation with 0.71 AUC. Classifications for voxel-wise GM with Poweratlas FC led to similar performances with 0.82 AUC for internal validation and 0.73 AUC for external validation. All of the aforementioned AUCs were statistically significant for . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint both internal and external validation. Classification performance for regional neuromorphometrics volumes with ICA-based FC resulted in 0.72 AUC with internal validation and 0.52 AUC for external validation. Classifications for regional neuromorphometrics volumes with Power-atlas FC led to 0.67 AUC using internal validation and 0.55 AUC for external validation. AUCs obtained for classifications using regional neuromorphometrics were statistically significant for internal validation but not for external validation (Supplementary Tables 12-13). Figure 3. Multimodal multi-center predictions for ECT response and remission. Panel A depicts classification performance using data from all centers and different combinations of features with internal validation (AUC is averaged over 100 stratified cross-validation splits). Panel B shows classification performance using data from all centers and different combinations of features with external validation. Panel C depicts classification performance using data from the three largest centers with internal validation. Panel D shows classification performance using data from the three largest centers with external validation. VBM = voxel-based morphometry; NMM = Neuromorphometrics atlas; FC = functional connectivity; ICA = group information guided independent component analysis. Red dashed line depicts chance level performance (0.5 AUC). Asterisks indicate significant difference from chance level after permutation testing with false discovery rate correction for multiple comparisons (p < 0.05, corrected).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Learning curves
To evaluate the relation between sample size and classification performance, we examined learning curves for the best performing models (i.e. remission classification using data from the three largest centers) by subsampling the data using different . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Anatomical localization
We investigated which brain regions contributed most to treatment classification for voxel-wise GM data. We only focus on our best performing unimodal model, which for remission classification resulted in 0.83 AUC using data from the three largest samples.
P-values were plotted for GM weights only as we were interested in brain regions rather than the influence of covariates. As shown in Figure 5, regions located in dorsomedial prefrontal (dmPFC), precuneus and thalamus exhibited high contribution to the classification task. The sign of weights within thalamus was mostly negative, implying a high chance for non-remission classification, whereas signs of weights within dmPFC and precuneus were mostly positive, implying a high chance for remission classification. Note that these results reflected the contribution of these brain regions to the multivariate pattern used by the SVM classifier. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
These results show that neuroimaging data can provide a good prediction of ECT remission for individual patients across different centers. In line with recent metaanalyses, older age and higher depression severity at baseline were associated with better ECT outcome 7,43,44  These results indicate that multimodal neuroimaging data may provide a robust biomarker that could be used to guide clinical decision-making. By providing patients and clinicians a patient-specific prognosis, this could ultimately increase the success rate of ECT, avoid ineffective treatments and accompanying adverse effects, and increase the use of the most effective antidepressive treatment available.
Previous monocenter studies using neuroimaging data to predict ECT outcome with either structural or functional MRI were able to obtain up to 0.84 AUC 21 . Here we achieved similar classification performance in a multicenter setting. Using data from different samples involves many additional sources of technological (e.g., different MR hardware and scanner protocols) and clinical (e.g. different ECT protocols, patient . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint cohort and recruitment procedures) variability 23 . These additional sources of variability may decrease prediction accuracy of MRI measurements for ECT outcome 23,45 .
Conversely, a multicenter study avoids cohort-specific solutions and so helps test generalizability of the results across different samples, increasing the likelihood that features identified as discriminatory between remitters and non-remitters reflect generic properties related to treatment outcome across datasets. Our results showed that generalizability to new samples came at the cost of lower accuracy, as classifications performed with internal validation (AUC≈0.83) outperformed those using external validation (AUC≈0.72). Additionally, we found that using a subsample of the data containing three centers with N≥20 each (N=109) led to better model performance compared to using all eight centers (N=189). This improvement could not be attributed solely to reduced clinical heterogeneity, as differences in sample demographics and clinical characteristics between the three largest centers were found to be similar to those seen in the entire sample (Supplementary Tables 1-2). We therefore hypothesize that the exclusion of smaller centers ensured that the model had sufficient examples per center for training.
Brain regions that contributed most to remission classification using structural MRI data included dmPFC, precuneus and (hypo)thalamus. Our results also indicated a role for thalamus FC, as classification of the thalamus ICA resulted in the best performing functional MRI classifier. The thalamus is a hub connecting all corticocortical circuits with links to hippocampus and medial PFC. It is a central hub in the affective network and plays an important role in emotion dysregulation 12,[46][47][48][49] . There . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint is evidence of decreased thalamic volume in depression [50][51][52][53] and hyperactivity during rest and cognitive and emotion processing 54,55 . Additionally, it has been suggested that seizure propagation between distant brain regions through cortical-thalamocortical and direct cortical-cortical connections is pivotal for ECT effectiveness [56][57][58][59][60] . There is also evidence that links reduced thalamic volume and altered rs-fMRI connectivity with clinical improvement 12,19 . The precuneus is the core of the posterior default mode network and is associated with self-related processing and episodic memory retrieval, and has shown altered FC in depression [61][62][63] . Preliminary evidence links changes in precuneus network connectivity and structure with ECT treatment outcome 64,65 .
Altogether, these results provide evidence for the importance of thalamic and precuneus structure and their functional connectivity with other brain regions for both depression and ECT-related clinical response. Several ECT outcome prediction studies using structural MRI have also implicated the precuneus 15,19,20 , and studies using rs-fMRI have reported functional connectivity with the thalamus as important regions 18,60 . Notably, the identification of brain regions contributing most to the classification resulted from a multivariate analysis, and the localization of these regions should therefore be interpreted with caution as these regions may not only be related to treatment outcome but also contribute to denoising during the classification process 66 .
Several limitations have to be taken into account when interpreting our findings. We used a retrospectively pooled sample from existing data across the world, without harmonized protocols for scanning, inclusion criteria or demographic and clinical characteristics. Not surprisingly, we found significant differences in sample . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint demographics and clinical characteristics between the different data collection centers. These sources of heterogeneity may limit classification performance but also provide an opportunity for model development using independent data sets and the discovery of generalizable biomarkers that are reproducible across centers. However, classification performance might be improved by using standardized acquisition parameters for possible future clinical utility. Additionally, our findings show that the prediction of treatment response was poor, while prediction of remission was good.
This indicates that ECT outcome prediction is limited to remission, which may also provide a better outcome criterion compared to response. Remission has become the gold standard for depression treatment, because patients who do not remit have a poorer prognosis and greater chance of relapse and recurrence than those who do.
Remission is also associated with a lower full symptomatic recurrence rate compared with achieving treatment response 7, 67, 68 . Furthermore, while unimodal and multimodal models performed comparable for remission classification using data from the largest centers with internal validation, only the multimodal classifications remained acceptable with external validation on different centers. We speculate that multimodal data may increase the probability that either the structural or functional MRI data overlaps across centers.
Taken together, this study suggests that ECT remission can be accurately predicted using MRI data in a large, ecologically valid, multi-center sample of patients receiving ECT, indicating that future development of a clinical decision support tool might be feasible. MRI could easily be incorporated during decision making, as ECT is . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint typically provided in a hospital setting. And as MRI is inexpensive compared to ECT, the additional costs are expected to outweigh the costs of unsuccessful treatments.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint

Conflict of interest
Dr. van Wingen has received research grant support from Philips. Dr. Camprodon serves in the Scientific Advisory Board of Hyka Therapeutics and Feelmore Labs, and has been a consultant for Neuronetics. All other individually-named co-authors in the GEMRIC working group reported no biomedical financial interests or potential conflicts of interest.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 31, 2021. ; https://doi.org/10.1101/2021.07.29.21261206 doi: medRxiv preprint