Abstract
Neuron loss is a key feature of neurodegenerative diseases often leading to brain atrophy detectable through magnetic resonance imaging (MRI). Various brain atrophy measures are essential in research of Alzheimer’ disease (AD) and related dementias. This study aims to forecast future annual percentage changes in hippocampal, ventricular, and total gray matter (TGM) volumes in individuals with varying cognitive statuses, from healthy to dementia. We developed a machine learning model using elastic net linear regression and tested two approaches: (1) a baseline model using predictors from a single-time-point and (2) a longitudinal model using predictors derived from longitudinal MRI. Both approaches were evaluated with MRI-only models and models that combined MRI with additional risk factors (age, sex, APOE4, and baseline diagnosis). Cross-validated Pearson correlation scores between predicted and actual annual percentage changes were 0.62 for the hippocampus, 0.51 for the ventricles, and 0.41 for TGM, using the longitudinal MRI + risk factor model. Longitudinal models consistently outperformed baseline models, and models including risk factors outperformed the MRI only model. Validation using an external dataset confirmed these findings, highlighting the value of predictors derived based on longitudinal data. We further studied the value of the predicted atrophy/enlargement rates for clinical status progression prediction across three different datasets. Predicted atrophy was a consistently better indicator of progression to mild cognitive impairment and dementia than present-day regional volumes, with the longitudinal atrophy prediction model typically outperforming the baseline model in terms of clinical status prediction. Future atrophy prediction has significant potential for assessing the risk of cognitive decline, even in cognitively unimpaired individuals, and can aid in selecting participants for clinical trials of disease-modifying drugs for AD.
1 Introduction
Dementia impacts approximately 57 million people globally [1], with Alzheimer’s disease (AD) being its most common cause [2]. Machine learning approaches for predicting future diagnoses have been extensively studied in the context of dementia [3], with numerous studies focusing on the progression from mild cognitive impairment (MCI) to dementia [4]. However, as the clinical focus shifts from prognosis to selecting the most effective treatments, there is a growing need to predict more nuanced outcomes [5, 6]. This shift in focus could also enhance the selection of study participants for clinical trials, ensuring that the most appropriate candidates are chosen based on their predicted responses to treatment.
Predicting the change in cognitive tests scores, such as mini mental state examination (MMSE) and Alzheimer’s Disease Assessment Scale–Cognitive Subscale (ADAS-Cog), is perhaps the most natural way towards these more nuanced outcomes. However, predicting changes in cognitive scores has proven to be challenging, partly due to the validity challenges in the longitudinal measurement of cognitive changes in less impaired populations with the well-established tests [7, 8] and partly due to challenges in assessing cognition in general [9]. These challenges include learning effects, which can skew results due to repeated testing, and high variability, where outcomes depend significantly on the individual’s condition on the test day. For example, in the recent TADPOLE challenge, none of the competing teams was able to perform better than a simple baseline method for the ADAS13 score prediction while the predictions for the future diagnosis and future ventricular volume were significantly better than with the baseline methods [10]. Moreover, while the future cognitive scores may be predictable, this predictive power is many times driven by the current baseline scores, yielding change estimates of limited usability [11]. As a result, there is a pressing need for serial biomarkers that reflect the underlying biology of AD and its connection to dementia pathology.
This work, instead, explores the prediction of future atrophy based on serial magnetic resonance imaging (MRI). The MRI-based brain atrophy has long been identified as an important measure to track dementia progression [12]. It possesses several important characteristics for a prediction target: 1) atrophy measures are quantitative, 2) MRI is widely available and non-invasive, 3) various atrophy measures have been adopted as secondary endpoints in randomized controlled trials of dementia/AD interventions [13, 14] and 4) hippocampal atrophy has been qualified for enrichment in pre-dementia trials by the European Medicines Agency [15]. By focusing on atrophy prediction, this study aims to provide a more reliable and clinically useful tool for both treatment selection and trial participant selection.
We focus on three distinct atrophy measures: hippocampal atrophy, enlarged ventricular volume, and whole-brain atrophy which all have been established as key structural MRI markers for AD and dementia progression, with slightly different time windows of the best predictive utility [16–19]. In clinical trials on dementia and AD, the atrophy of these three key brain regions are often considered as secondary endpoints (e.g. [20]). These regions provide valuable insights into short-term dementia risks and changes in their volumes has been linked with the cognition [21–23].
Surprisingly, while the association of brain atrophy measures to dementia and AD progression is well documented, the prediction of future changes in these measures has not been studied widely. Prediction of the future ventricle volume was one of the tasks in a recent TADPOLE challenge [10], where the best performing model was based on the concept of rate of AD progression [24]. Liedes et al [25] focused on prediction of hippocampal atrophy with promising results. However, in [25] they studied only prediction of hippocampal atrophy and used data from single time-points only for predictions. In this work, we analyze three regions (hippocampus, total gray matter, and ventricles) together within the same dataset and workflow and develop a modeling framework able to use predictors derived from longitudinal data. Finally, we study the value of forecasted atrophy rates to predict the progression of the level of the memory concern from healthy to MCI/dementia and from MCI to dementia. We demonstrate that the models that use predictors derived from longitudinal data consistently outperform corresponding models based on single time-point predictors. Also, we show that predicted atrophy is a better indicator of progression to mild cognitive impairment and dementia than the volumes of the regions at current day.
2 Materials and Methods
2.1 Data for atrophy/enlargement prediction
Data used in this work were obtained from the ADNI (http://adni.loni.usc.edu) and the AIBL study group (https://aibl.org.au). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD. For up-to-date information, see (www.adni-info.org). AIBL study methodology has been reported previously [26].
We included all ADNI participants who had a T1-weighted MRI acquired at the time of entry, 24-month visit, and 48-month visit. Moreover, we required participants to have information on demographics (age, sex), APOE4 status, and diagnosis status. These were obtained from from the ADNIMERGE.csv table. 2,365 participants had MRI data at the time of entry. After 24 months of follow-up, MRI information was available for 1,373 participants, and for 586 participants after 48 months. 565 individuals had MRI data at entry, 24 months, and 48 months. After excluding participants without APOE4 information, we finalized our dataset with 550 individuals who had all the considered information. We refer to this dataset as ADNI-main dataset.
We used the Australian Imaging Biomarkers and Lifestyle Flagship Study of Aging (AIBL, ADNI-subcohort) as a external test data in our study. At the study entry, MRI scans were available for 605 participants. After 18 months of follow-up, 232 individuals had MRI scans, and 149 individuals had scans at the 36-month follow-up. Among them, 95 participants had MRI scans at all three time points. All these 95 individuals have complete demographic data, including age, sex, diagnosis, and APOE4 status. The characteristics of study cohorts are presented in Table 1 and participant’s RIDs are available as supplementary material.
Characteristics of the ADNI and AIBL cohorts, including sample size, age, sex distribution, APOE4 status, and clinical diagnosis. Age is presented as mean (standard deviation). Diagnosis and age are from the time of entry. CN refers to cognitively normal. APOE4 status is displayed as the count of individuals with 0, 1, or 2 alleles.
2.2 MRI preprocessing
The MRI data were preprocessed following the framework described in [27]. We used the volumetric and thickness measures derived from CAT12 toolbox (https://neuro-jena.github.io/cat/, version 8.1) using the default settings [28]: For 136 volume measures, the T1-weighted MRI was segmented into gray matter (GM) and white matter (WM) and non-linearly normalized to a stereotactic space using the shooting approach [29]. Based on spatially normalized GM and WM segments, the neuromorphometrics atlas was used to extract the regional volumes 1 We denote these volume measures by Vij(t) for participant i, region j = 1, …, 136, and time point t. The cortical thickness measures [30] were computed and registered to the FSaverage surface template [31], where DK40 atlas [32] was used to define 69 regionally averaged cortical thickness measurements. These cortical thickness measures are denoted by Vij(t) for participant i, region j = 137 …, 206, and time point t. The MRI data from the participant i at time point t is collected into a vector Vi(t).
The annual percentage change between the time points t1 and t2 (expressed in years) for a region i is defined as
If Δij(t1, t2) is positive, it refers to atrophy, and if Δij(t1, t2) is negative, it refers to enlargement.
2.3 Prediction framework
Our purpose is to predict future Δik(t2, t3) based on the data available during time points t1 and t2 where t3 > t2 > t1. For simplifying the terminology, we name the time-point t1 as past, t2 as current day, and t3 as future. We formulate this prediction as supervised learning problem. Denote future change Δik(t2, t3) by yik and the number of participants by N. Provided the data xi, i = 1, …, N (see Table 2) from the past and current day, we apply the Elastic Net linear regression, a regularization technique that combines both Lasso (L1) and Ridge penalties (L2) [33]. Elastic Net is advantageous in high-dimensional data environments, especially when multicollinearity (correlation among predictors) is present. This method allows us to improve model generalization by reducing over-fitting while selecting a sparse subset of informative features. Dropping the index k, elastic net solves the coefficients
of the linear model as
where β = [β1, …, βp]T, λ controls the overall regularization strength, and α adjusts the balance between L1 and L2 penalties. When α = 1, Elastic Net behaves like Lasso regression, driving some coefficients (βj) to exactly zero, effectively selecting relevant features. Conversely, when α = 0, it acts as Ridge regression, reducing the effect of multicollinearity without eliminating predictors. By setting 0 < α < 1, Elastic Net captures the advantages of both Lasso and Ridge regularization, yielding a stable and interpretable model even in the presence of correlated predictors.
Definitions of the predictive models used in the study. The models vary by the inclusion of MRI data, MRI changes over time, and additional risk factors such as age, sex, diagnosis, and APOE4 status.
We used (nested) cross-validation to optimize the parameter λ, ensuring that the selected model provided the best trade-off between model complexity and predictive accuracy. The parameter α was fixed at 0.5 following our previous works [34]. This approach allowed us to obtain a model that is both interpretable and well-suited for generalization to new data.
We performed the prediction for three different regions of interest: 1) hippocampus, defined as the union of left and right hippocampus; 2) ventricles, defined as the union of left and right lateral ventricles and inferior lateral ventricles, and 3) total gray matter (TGM) volume. Hippocampus and ventricles were defined according to the neuromorphometrics atlas (https://neuromorphometrics.com/Seg/html/segmentation/) and the TGM volume was defined as in the CAT12 toolbox [28].
We designed four models based on two approaches with different feature combinations, presented in Table 2. In the baseline approach, in which we used current day data (ADNI 24 month visits) to predict atrophy in the future (ADNI 48 month visits), we defined two models: the MRI only model was trained using only MRI information and the MRI and risk factors model used in addition demographic/risk factor information containing age, sex, the number of APOE4 alleles, and the current day diagnosis (normal cognition, MCI, dementia). We represented this demographic/risk-factor information by ri for participant i. The longitudinal approach used past and current day data (ADNI 0 and 24 month visits) to predict atrophy in the future (ADNI 48 month visits), and we again defined MRI only and MRI+ risk factors models. Since the MRIs were acquired both with 1.5T and 3T scanners, we provided this information to the learning algorithm as additional predictors (see [35]), encoding possible field-strength combinations as a binary vector fi.
2.4 Implementation and performance evaluation
Following the framework described in [36], we implemented a 10-fold cross-validation approach to prevent overfitting and evaluate model performance, see Figure 1. The data was divided into training and test sets using a nested and stratified 10-fold cross-validation framework [37]. This involved two levels of cross-validation: an outer loop for model evaluation and an inner loop for selecting hyperparameters (λ). Participants were randomly divided into 10 subsets. In each fold of the outer loop, one subset was used for testing, and the remaining nine were designated for training. During the inner loop, optimal λ were selected by minimizing the mean square error (MSE). This nested approach ensures robust model evaluation and minimizes overfitting by effectively utilizing the entire dataset for training and testing.
Schematic representation of the prediction framework. (a) Atrophy prediction framework: The model is trained and validated using a 10-fold cross-validation approach on the ADNI-main dataset. Elastic Net Linear Regression (ELNR) is used to predict the annualized percentage change. The performance is evaluated using Mean Absolute Error (MAE) and Pearson correlation (R). (b) External validation: The trained model is applied to the AIBL-1 test dataset (Table 1) predict the annualized percentage change, and the performance is assessed using MAE and Pearson correlation (R). (c) Progression prediction: The model is trained on the ADNI-main dataset and applied to external test datasets (ADNI-ex and AIBL-2, Table 3) to predict clinical progression. The Area Under the Curve (AUC) is used to evaluate classification performance for progressive vs. stable cases. (d) Selection of the representative run: The median run is selected from 10 iterations of 10-fold cross-validation, and participants are classified into progressive or stable groups based on predicted values. AUC is then calculated for the classification task. We use AIBL-1 and AIBL-2 to refer to two different versions of the AIBL test data, described in Tables 1 and 3, respectively.
We assessed the models’ performance using the cross-validated Pearson correlation coefficient (PCC) and the mean absolute error (MAE) between the predicted and actual rates of cognitive decline expressed in percentage points. To ensure robustness and minimize the influence of random variations, the results were averaged over 10 runs of nested 10-fold cross-validation (CV). To compare the correlation coefficients between the two approaches, we used methods described by Diedenhofen and colleagues [38].
We further utilized the AIBL-cohort as an independent test dataset. The ADNI data served as the training set, while AIBL data was used for testing. The visit schedule of AIBL is different from the ADNI visit schedule: the current day are 18 (AIBL) and 24 (ADNI) month visit, and future are 36 (AIBL) and 48 (ADNI) month visit. However, as the prediction task was formulated as the prediction of annualized change, this causes no differences to the prediction framework presented above. All the analyses were done using R (version 4.1.1), with the following packages: glmnet [39], caret [40], cocor [41], pROC [42], ggplot2 [43], and complexheatmap [44]. The codes to reproduce the work are available at https://github.com/MaryamHadji/Future-prediction
Demographic and clinical characteristics of cohort used for validation of clinical progression prediction. The cohorts include ADNI-Main, AIBL, and ADNI-External datasets, with subgroups categorized as CN Progressive, CN Stable, MCI Progressive, and MCI Stable. Age is presented as mean (standard deviation). Sex is shown as M/F, and APOE4 status is displayed as the count of individuals with 0, 1, or 2 alleles.
2.5 MCI/dementia progression prediction
We evaluated the effectiveness of our predicted atrophy/enlargement rates in assessing clinical status progression, see Figure 1. For this, we considered participants in CN and MCI groups, predicted their future atrophy/enlargement rates
, and evaluated how well
predicted the change in future cognitive status, i.e., MCI-to-dementia or NC-to-MCI progression. The model’s performance in MCI/Dementia progression prediction was assessed using the area under the receiver operating characteristic curve (AUC) and we considered each region of interest k (hippocampus, ventricles, and TGM) separately. We determined two groups based on current visit and diagnosis labels during the follow-up: a) Stable group: CN/MCI individuals who have remained with the same diagnosis status for five years or more following the current day visit; b) Progressive group: 1) Individuals with CN diagnosis at the current day visit who later develop MCI or dementia (the last two diagnoses must be MCI or dementia), and 2) MCI individuals who later develop to dementia (the last two diagnoses must be dementia). We contrasted clinical status progression prediction based on predicted future atrophy scores to clinical status progression prediction using present day brain region volumes and hypothesize that the predicted atrophy predicts clinical progression better than simple present day brain region volume.
We used three different datasets for MCI/Dementia progression prediction: First, we used the main ADNI dataset that we used for the atrophy prediction including data from 550 individuals. Among the 246 CN individuals, 60 individuals progressed MCI/dementia during the follow-up period, and 133 remained stable with a follow-up duration of at least five years. Similarly, in the MCI group, 79 individuals converted to dementia, whereas 119 remained with MCI status, see Table 3. With this dataset, the MCI/dementia progression prediction was done within the cross-validation framework, i.e., we selected the atrophy estimates based on the results of cross-validation run with median PCC and predicted the clinical progression-based atrophy prediction from that run.
Second, we used the main ADNI dataset for training the models to predict
in AIBL, which were then used to predict clinical progression AIBL. In the CN group, 13 individuals progressed from CN to MCI/dementia, while 45 remained stable. Because there were not enough people in the MCI group (only 8) with follow-up data of 5 years after the current day visit (18 month visit), we did not consider MCI to dementia progression prediction in AIBL.
Last, we identified a subset of ADNI participants who did not have 0, 24 and 48 month visits but whose data could be used for progression prediction. In practice, the 48 month MRI data was missing for these participants, but they still had sufficient clinical follow-up information. We refer to this dataset as the ADNI-external test set. In this subset, 31 individuals in the CN group progressed to MCI/dementia, while 65 remained stable, but the mean age of these two groups was highly different. In the MCI group, 53 individuals progressed to dementia, whereas 49 remained unchanged in their clinical status, see Table 3.
3 Results
3.1 Predicting annual atrophy/enlargement
The validation results of all models are presented in Figure 2. Overall, our findings highlight that the longitudinal approach, particularly when incorporating additional risk factors, yielded the best performance for predicting annual atrophy/enlargement.
Prediction of annual percentage change of volumes of different brain regions using 10-fold cross-validation on the main ADNI dataset. (a) Bar plots showing the correlation of different models based on baseline and longitudinal approaches. (b) Scatter plot illustrating the prediction of atrophy rates for the longitudinal approach of MRI + Risk Factors model, derived from a single computation run with median performance.
Hippocampal atrophy
For the baseline approach utilizing only MRI data from the ADNI dataset, the PCC for predicting the hippocampal atrophy rate was 0.55 (95% CI: 0.49, 0.61), with MAE of 1.36 (95% CI: 1.26, 1.46). Incorporating genetic and demographic factors into the baseline MRI model, the PCC improved significantly to 0.59 (95% CI: 0.53, 0.65, p = 0.02), and the MAE decreased to 1.32 (95% CI: 1.22, 1.41). For the longitudinal approach using the MRI-only model, the PCC was 0.60 (95% CI: 0.54, 0.65), with an MAE of 1.31 (95% CI: 1.22, 1.41). When the longitudinal MRI model was combined with risk factors, the PCC significantly improved to 0.62 (95% CI: 0.56, 0.67, p = 0.04), with an estimated MAE of 1.29 (95% CI: 1.20, 1.38). On the other hand, in the MRI-only model, the PCC for the baseline approach significantly improved from 0.55 (95% CI: 0.49, 0.61) to 0.60 (95% CI: 0.54, 0.65) for the longitudinal approach (p = 0.02). Similarly, in the model incorporating both MRI and demographic factors, the PCC increased from 0.59 (95% CI: 0.53, 0.65) in the baseline approach to 0.62 (95% CI: 0.56, 0.67) in the longitudinal approach, representing a significant improvement (p = 0.03).
Ventricle enlargement
For predicting ventricle enlargement rate using the baseline approach relying solely on MRI data, the PCC was 0.40 (95% CI: 0.32, 0.48), with an MAE of 2.67 (95% CI: 2.48, 2.88). Incorporating genetic and demographic factors into the baseline MRI model improved the PCC significantly to 0.46 (95% CI: 0.37, 0.54, p = 0.004), while reducing the MAE to 2.57 (95% CI: 2.38, 2.79). For the longitudinal approach using the MRI-only model, PCC increased to 0.49 (95% CI: 0.39, 0.58), and the MAE decreased to 2.52 (95% CI: 2.33, 2.72). The best performance was achieved with the longitudinal approach combining the MRI model and additional risk factors, yielding the PCC 0.51 (95% CI: 0.41, 0.59) and the MAE of 2.48 (95% CI: 2.29, 2.69), however, this improvement was not significant (p = 0.4) over the longitudinal MRI-only model. In order to compare the models using only MRI information for ventricle enlargement rate, the PCC for the baseline approach was 0.40 (95% CI: 0.32, 0.48). For the longitudinal approach, the PCC increased to 0.49 (95% CI: 0.39, 0.58), representing a significant improvement (p = 0.004). For the model incorporating both MRI and demographic information, there was no significant improvement, with the PCC increasing from 0.46 (95% CI: 0.37, 0.54) in the baseline approach to 0.51 (95% CI: 0.41, 0.59) in the longitudinal approach (p = 0.1).
Gray matter atrophy
To predict the TGM atrophy, based on baseline approach utilizing only MRI data, PCC was 0.28 (95% CI: 0.20, 0.35), with a MAE of 0.90 (95% CI: 0.83, 0.97). Incorporating genetic and demographic factors into the MRI model improved the PCC to 0.32 (95% CI: 0.24, 0.39) and slightly reduced the MAE to 0.89 (95% CI: 0.82, 0.95), the difference between the two models was not significant (p = 0.1). For the longitudinal approach using only the MRI model, the PCC increased further to 0.40 (95% CI: 0.33, 0.47), with the MAE of 0.86 (95% CI: 0.80, 0.93). The best-performing non-significant result from the previous model was observed in the longitudinal approach, which combined MRI data with additional risk factors. This approach achieved a PCC of 0.41 (95% CI: 0.34, 0.48) and an MAE of 0.86 (95% CI: 0.80, 0.93) (p = 0.6). In the MRI-only model, the PCC was 0.28 (95% CI: 0.20, 0.35) for the baseline approach and increased to 0.40 (95% CI: 0.33, 0.47) for the longitudinal approach, demonstrating a significant improvement (p= 0.0009). Furthermore, after incorporating demographic predictors alongside MRI information, we observed a significant improvement from 0.32 (95% CI: 0.24, 0.39) in the baseline approach to 0.41 (95% CI: 0.34, 0.48) in the longitudinal approach (p = 0.006).
Predictor importance
The most important predictors for atrophy/enlargement rates in the longitudinal MRI+risk factor model are displayed in Fig. 3. For predicting hippocampal atrophy, the five most important predictors were dementia diagnosis, right hippocampal volume, and changes in volumes of left middle temporal gyrus, right hippocampus and left cerebral white matter. Both present day volumes and changes in volumes of various brain structures contributed to models. However, only present day cortical thickness values (and not the rate of change in cortical thickness) were included as top predictors. From risk factors, only dementia diagnosis and the number of APOE4 alleles were included in the top predictors. Unsurprisingly, only the left or right (but not both) structure was included in the top predictors, with the exception of hippocampus volume. For predicting ventricle enlargement, the five most important predictors were changes in volumes of left lateral ventricle, left middle temporal gyrus, and left parahippocampal gyrus, dementia diagnosis, and the number of APOE4 alleles. The change rates of volumes of various structures appeared more often in the top predictors than the present day volumes of structures. Few cortical thickness measures were included as well as change rate of left rostral middle frontal cortex. Again, from risk factors, only dementia diagnosis and the number of APOE4 alleles were included in the top predictors. For TGM atrophy, the five most important predictors were dementia diagnosis, change rates of right parahippocampal gyrus volume, left transverse temporal cortical thickness, right cerebellum white matter volume, and left isthmus cortical thickness. Again, the change rates of volumes of various structures appeared more often in the top predictors than the present day volumes of structures. Both changes of and present day cortical thicknesses were included as top 20 predictors. The only risk factor in the top predictor list was the dementia diagnosis. For TGM, one of variables coding the MRI field strength entered to the model (strength3, indicating that all the measurements were done with 3.0T scanner).
Importance of predictors across 10 runs of 10-fold cross-validation for the longitudinal, MRI+Risk factors model. Each heatmap consists of a separate single-column heatmap showing the correlation scores between each variable and the target label. Additionally, a bar graph illustrates the importance of each predictor, calculated as the mean of absolute values of regression coefficients across the cross-validation folds. Note that the regression coefficients are computed for standardized models. For brain imaging predictors, “ch” indicates change and “CT” indicates cortical thickness. Predictor names without “CT” refer to volumes. “DX” refers to the diagnosis at 24 months, defined across three columns: “DX-CN”, “DX-Dementia”, and “DX-MCI”. Strength refers to variable coding of the field strengths of the scanners used to acquire the scans.
3.2 External evaluation in AIBL
Figure 4 presents the external evaluation results for atrophy/enlargement prediction using the AIBL dataset. These results demonstrate that the longitudinal approach outperforms the baseline approach when tested on an external dataset.
Prediction of annual percentage change of volumes in different brain regions on the AIBL (test) dataset. (a) Bar plots showing the correlation scores of different models based on baseline and longitudinal approaches. (b) Scatter plots illustrating the prediction of atrophy rates for the longitudinal approach of MRI + Risk Factors model, derived from a single computation run with median performance.
The external evaluation for hippocampal atrophy prediction using the baseline approach with the MRI only model resulted in a PCC of 0.62 (95% CI: 0.43, 0.76) and a MAE of 1.00 (95% CI: 0.87, 1.15). When applying the longitudinal approach and incorporating genetic information alongside MRI data, the PCC improved to 0.71 (95% CI: 0.29, 0.65), while the MAE decreased to 0.89 (95% CI: 0.78,1.01).
The external evaluation for ventricle enlargement prediction using the baseline approach, which relied solely on MRI data, resulted in the PCC of 0.42 (95% CI: 0.08, 0.47) and the MAE of 2.40 (95% CI: 2.05, 2.79). In comparison, the longitudinal approach that combined MRI data with genetic and risk factor information achieved a higher PCC of 0.56(95% CI: 0.11, 0.52) and a lower MAE of 2.14 (95% CI: 1.83, 2.46).
The external evaluation for TGM atrophy prediction using the baseline approach, which relied solely on MRI data, resulted in the PCC of 0.28 (95% CI: 0.00, 0.40) and the MAE of 0.82 (95% CI: 0.69,0.96). In comparison, the longitudinal approach that combined MRI data with genetic and risk factor information achieved a higher PCC of 0.43 (95% CI: 0.14, 0.53) and a slightly lower MAE of 0.78(95% CI: 0.66,0.90).
3.3 Prediction of MCI/dementia progression
The ROC curves assessing performance of the predicted atrophy/enlargement rates in clinical status progression prediction are presented in Figure 5 for ADNI main dataset and in Supplementary Figures 1 and 2 for AIBL and ADNI external datasets. All AUC statistics are collected in Table 4. Overall, the clinical status progression prediction results highlight that a) predicted atrophy yielded better than chance progression predictions and the predicted atrophy scores typically reached higher AUCs than the corresponding present day volume information.
AUCs for different models for clinical status progression prediction across three databases. The values in parentheses refer to 95% confidence intervals of AUC.
ROC curves of clinical status progression prediction in a CN and b MCI groups with different models and brain regions using ADNI-Main dataset.
ADNI-Main Cohort
For the ADNI-Main cohort, using present day hippocampus volume achieved an AUC of 0.67 (95% CI: 0.59, 0.75) for the CN-to-MCI progression prediction. The predicted hippocampal atrophy (with longitudinal MRI + risk factors model) resulted in the improved progression prediction performance, yielding an AUC of 0.72 (95% CI: 0.65, 0.80). However, this improvement was not statistically significant (p = 0.10, DeLong test). Similarly, the present day ventricle volume produced an AUC of 0.57 for CN-to-MCI progression prediction (95% CI:0.48, 0.66), which increased to 0.64 (95% CI:0.55,0.72) with the predicted ventricle enlargement (longitudinal MRI + risk factors model). This improvement was not statistically significant (p = 0.30, DeLong test). The present day TGM volume predicted CN-to-MCI progression with an AUC of 0.57 (95% CI: 0.48, 0.66), which increased to 0.60 (95% CI: 0.51, 0.69) with the predicted TGM atrophy (longitudinal MRI +risk factors). This increment was not statistically significant (p = 0.60).
Within MCI group, present day hippocampus volume predicted progression to dementia with an AUC of 0.79 (95% CI: 0.72, 0.86). Predicted atrophy (longitudinal MRI + risk factors model) did not significantly improve the performance (p = 0.50), yielding an AUC of 0.81 (95% CI: 0.74, 0.87). For ventricle volume, the AUC was 0.62 (95% CI: 0.54, 0.70), which significantly improved to 0.74 (95% CI: 0.67, 0.81) when predicting progression to dementia using predicted atrophy (longitudinal MRI + risk factors model, p = 0.04). For the TGM volume, the AUC for the TGM volume was 0.65 (95% CI: 0.57, 0.73) and predicted atrophy for longitudinal MRI + risk factors model yielded a similar AUC of 0.64 (95% CI:0.56, 0.72) (p=0.9).
The comparison between different atrophy prediction models (baseline vs. longitudinal, MRI only vs. MRI + risk factors) did not yield any clearly best model, but all the models performed similarly in the progression prediction.
ADNI-External Cohort
For the ADNI-external cohort among the CN group, present day hippocampus volume achieved an AUC of 0.69 (95% CI: 0.56, 0.82) for the CN-to-MCI progression prediction. The predicted hippocampal atrophy (with longitudinal MRI + risk factors model) resulted in the increased progression prediction performance, yielding an AUC of 0.83 (95% CI: 0.72,0.93). This improvement was statistically significant (p = 0.04, DeLong test). Similarly, the present day ventricle volume produced an AUC of 0.77 for CN-to-MCI progression prediction (95% CI:0.66, 0.88), which reduced to 0.67 (95% CI:0.54,0.80) with the predicted ventricle enlargement (longitudinal MRI + risk factors model). This reduction was not statistically significant (p = 0.2, DeLong test) and we hypothesize that the high AUC for ventricle volume based prediction was driven by a large age difference between the stable and progressive groups (see Table 3). The present day TGM volume predicted CN-to-MCI progression with an AUC of 0.49 (95% CI: 0.35, 0.62), which increased to 0.60 (95% CI: 0.47, 0.73) with the predicted TGM atrophy (longitudinal MRI +risk factors). This increment was not statistically significant (p = 0.2).
For the MCI group in the ADNI-external, the model using hippocampal volume alone achieved an AUC of 0.78 (95% CI: 0.69, 0.88). With the predicted hippocampal atrophy (longitudinal MRI + risk factors model) the progression prediction performance improved significantly, yielding an AUC of 0.86 (95% CI: 0.79, 0.94) (p=0.03). The ventricle volume model achieved an AUC of 0.61 (95% CI: 0.49,0.73), which increased to 0.75 (95% CI: 0.64,0.85) with the predicted ventricle enlargement (longitudinal MRI + risk factors mode), but the increase was not significant (p = 0.1). The model using TGM volume resulted in an AUC of 0.58 (95% CI: 0.46,0.70), while the predicted TGM atrophy (longitudinal MRI + risk factors model) resulted to a similar AUC to 0.59 (95% CI: 0.46, 0.71) (p = 0.9).
Notably, the AUCs for clinical status progression prediction were consistently higher for atrophy scores predicted based on longitudinal model than for the atrophy scores based on baseline model. Addition of risk factors to the atrophy score prediction did not have much influence on the progression prediction, with the exception of the hippocampus and the baseline model, where atrophy scores predicted based on MRI + risk factors model provided better progression predictions than those based on MRI only model.
AIBL Cohort
Using hippocampus volume, we could predict the progression from CN to MCI with an AUC of 0.62 (95% CI: 0.42, 0.82). Using predicted atrophy scores significantly improved the prediction performance, resulting in an AUC of 0.87 (95% CI: 0.76, 0.98) (p=0.02), highlighting the enhanced predictive value of this approach. The ventricle volume achieved an AUC of 0.65 (95% CI: 0.49, 0.81) while the AUC with predicted ventricle enlargement improved to 0.83 (95% CI: 0.67, 0.99), however, the improvement was not significant (p = 0.1). Using TGM volume, the AUC was 0.56 (95% CI: 0.34, 0.77) while the AUC with predicted TGM was increased to 0.69 (95% CI: 0.51, 0.88) (p = 0.3).
Here, the longitudinal MRI + risk factors model was clearly superior to the baseline MRI only model for progression prediction using predicted hippocampal atrophy and ventricle enlargement. With TGM, all the four models performed similarly in terms of progression prediction.
4 Discussion
Hippocampal atrophy, ventricle enlargement, and whole-brain atrophy are three key MRI markers used to predict AD/dementia [16, 45, 46]. This study aimed to predict the annual change in the hippocampus, ventricles, and total gray matter volumes (used as a proxy for whole-brain atrophy), in individuals with varying levels of cognitive impairment, ranging from healthy controls to dementia. To achieve this, we developed an ML model based on ENLR. We explored two approaches: (1) a baseline method relying on single-time-point data and (2) a longitudinal method incorporating information from two years prior to baseline. Both approaches were tested using models that utilized only MRI data and models that combined MRI data with additional risk factors, including age, sex, and baseline diagnosis. Our analysis also explored how incorporating longitudinal data affected the performance of these models.
Our findings showed that incorporating additional information into the baseline MRI model significantly enhanced its ability to predict hippocampal atrophy and ventricular enlargement. In our previous works, we demonstrated the advantages of combining multiple data modalities—such as demo-graphics, APOE4, MRI, and cognitive assessments—for predicting AD at early stages[27, 36, 47]. Other studies have also emphasized the importance of integrating diverse data sources for enhanced predictive accuracy[25, 48, 49]. Our current results align with these findings but with a notable difference in our approach to combining cognitive assessment measures. Instead of using baseline cognitive assessments, we utilized baseline diagnoses categorized as CN, MCI, and dementia. Baseline diagnosis serves as a summary of cognitive assessments, encapsulating an individual’s cognitive status. Using a single summary variable reduced data dimensionality and redundancy, as many cognitive measures convey overlapping information. This streamlined approach not only simplifies the modeling process but also maintains the robustness of the predictions.
Investigating the advantage of incorporating data from two years prior to current day through the development of a longitudinal model revealed significant improvements in predictive performance. The improvement was statistically significant when comparing baseline versus longitudinal using MRI-only models across all cases, including hippocampal atrophy, TGM atrophy, and ventricular enlargement. For MRI + risk factor models, adding historical data also improved performance, but the improvement was smaller compared to the MRI-only models. However, the improvement remained statistically significant for hippocampal and TGM atrophy but not for ventricular enlargement. Overall, the best predictions were achieved using the longitudinal MRI + risk factor model across all three brain regions. External validation using the AIBL dataset confirmed these findings, demonstrating that longitudinal models consistently provided more accurate predictions for all three types of annual change. Our findings extend those of earlier studies that have shown the importance of considering longitudinal data in prediction tasks based on brain imaging data [50, 51]. We speculate that the greater improvement observed in the MRI-only model compared to the MRI + risk factor model may be due to the absence of cognitive status information in MRI-only models. Adding historical data to baseline MRI data provides additional insights into individual structural changes over time, as the differences between past and baseline MRI scans vary across diagnostic groups (CN, MCI, AD). This temporal information helps the model differentiate individuals more effectively. In contrast, the MRI + risk factor model already includes diagnostic group information, which partially accounts for cognitive status, thereby reducing the relative contribution of historical data.
Moreover, in longitudinal models, we investigated the role of various predictors, including MRI features and risk factors, and their annualized percentage changes between baseline and two years prior. A closer examination of the coefficient values in the longitudinal model reveals that baseline diagnosis is one of the most important features for predicting the annual percentage change in all three studied biomarkers. Among the top selected features, APOE4 emerged as a significant predictor for hippocampal atrophy and ventricular enlargement, but not for TGM atrophy. Interestingly, many longitudinal MRI features were identified as important within the top selected features, emphasizing the value of incorporating longitudinal information into the models. Another notable observation is that, for predicting the annual percentage change in each biomarker, multiple brain regions contribute significantly to the prediction task, rather than the respective region alone holding the highest importance. Among the top selected MRI features for predicting hippocampal atrophy and ventricular enlargement were the hippocampus, middle temporal gyrus, inferior lateral ventricles, and amygdala, all of which are highly relevant in AD progression [45]. However, the top selected features for predicting TGM atrophy differed slightly from those for hippocampal and ventricular predictions and were not as strongly associated with AD. These results align with our expectations, as the hippocampus and ventricles are the primary brain regions affected by AD, even in the early stages. In contrast, TGM represents the whole brain and exhibits a weaker association with AD compared to the hippocampus and ventricles. Overall, our findings indicate that multiple brain regions, along with their longitudinal information and associated risk factors, play a crucial role in predicting the annual percentage change in all these biomarkers.
The individuals in our cohort from the ADNI dataset were drawn from different ADNI phases, and MRI data was acquired using different field strengths. This variability posed a challenge [52, 53], particularly as our analysis considered multiple time points for each participant, two time points (current day and future) for baseline models, and three time points (past, current day, and future) for longitudinal models. For many participants, the field strength was not consistent across all time points. To address this issue, we introduced a new categorized variable representing the field strength at each time point and included it as a predictor in our models (see [35]). By doing so, we accounted for the variability in field strength and ensured it was considered in the analysis, reducing its potential impact on the model’s performance. To streamline our analysis, we opted for a cross-sectional pipeline for MRI preprocessing using CAT12, prioritizing its simplicity and efficiency in reducing processing time. Our preliminary analyses revealed no significant performance differences between the cross-sectional and longitudinal preprocessing approaches for atrophy prediction in CAT12. Consequently, the cross-sectional pipeline was selected as a practical and effective choice for our study and the finding that longitudinal predictors outperformed cross-sectional ones supports this choice. However, the optimal approach for addressing the construction of the longitudinal predictors should be addressed systematically in future studies.
Although hippocampal and TGM atrophy, as well as ventricular enlargement, are critical factors in dementia progression, predicting volume changes in these regions has not been extensively studied. In particular, limited research has evaluated all three regions within the same dataset and pipeline to assess their roles in dementia progression. While various studies have investigated the associations of these biomarkers with progression to dementia [46, 54, 55], the prediction of changes over time and their relationship with progression to MCI or dementia remains underexplored. The prediction of total ventricular volume has been previously examined in the TADPOLE challenge [10], where the tasks were predicting clinical diagnosis, ADAS-Cog13 scores, and total ventricular volume. Another study by Liedes et al. [25] focused on predicting the annualized rate of hippocampal atrophy using multivariate ML algorithms on the ADNI dataset. Our study shares similarities with Liedes et al [25] in predicting annualized hippocampal atrophy but differs in three key ways: (1) we analyzed all three regions—hippocampus, TGM, and ventricles—together within the same dataset and workflow, rather than focusing solely on hippocampal atrophy; (2) we developed a longitudinal model that incorporates data from prior time points, enabling us to investigate the impact of longitudinal predictors on the performance of our predictive models; and (3) we evaluated the ability of our models to predict progression to MCI or dementia in both healthy and MCI groups.
We assessed the effectiveness of our predictive models in estimating the annual percentage changes in hippocampal, TGM, and ventricular volumes to predict dementia progression in both healthy and MCI groups. Results from the main ADNI dataset show that predicted hippocampal atrophy and ventricular enlargement outperform actual biomarker volumes in predicting progression to MCI or dementia in both CN and MCI groups. The most notable improvements came from predicted hippocampal atrophy, particularly using longitudinal approaches. For CN individuals, the predictive accuracy for progression to MCI/dementia improved from an AUC of 0.67 (using actual hippocampal volume) to 0.73 (using predicted atrophy with the longitudinal method). Similarly, in MCI individuals, the AUC for predicting progression to dementia increased from 0.79 to 0.83. However, no such improvement was observed for TGM atrophy, as predicted TGM atrophy did not provide higher predictive power compared to actual TGM volume. Results from the external validation datasets, (AIBL and ADNI-external) showed similar trends. While our models were not specifically designed to predict progression to MCI or dementia, the predicted annual percentage changes—especially for hippocampal atrophy—performed well and the performance was comparable to previous MRI-based approaches for predicting dementia progression in CN individuals [56, 57]. This analysis is particularly significant, as the primary goal of our study is to enhance the early prediction of dementia. By comparing predicted values to actual biomarker volumes, we sought to determine whether our models provide better predictive power for early-stage dementia progression.
In summary, our study highlights the importance of incorporating longitudinal data to improve predictions of annual changes in key biomarkers linked to dementia. By integrating MRI data and risk factors into machine learning models, we significantly enhanced predictive accuracy, particularly for hippocampal atrophy and ventricular enlargement. The inclusion of longitudinal data allowed us to capture individual structural changes over time, improving the model’s ability to differentiate between diagnostic groups. These findings emphasize the advantages of combining diverse data sources and tracks brain changes over time to better predict dementia progression. Future research should further explore the integration of multiple data modalities and assess the long-term impact of longitudinal data to refine predictive models for dementia.
Data Availability
All data produced are available online at http://adni.loni.usc.edu https://aibl.org.au
Availability of data and materials
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (https://adni.loni.usc.edu/) and and the Australian Imaging Biomarkers and Lifestyle flagship study of ageing (AIBL) (https://aibl.csiro.au/adni/index.html). Details about data access are detailed there. The participant’s RIDs are provided in Supplementary Materials. The codes to reproduce the work are available at https://github.com/MaryamHadji/Future-prediction. CAT12 toolbox is available at https://neuro-jena.github.io/cat.
Competing interests
The authors have no actual or potential conflicts of interest.
Funding
This research has been supported by The Academy of Finland, grant 351849 under the frame of ERA PerMed (“Pattern-Cog”), grants 346934 (PRIMAL), and 358944 (Flagship of Advanced Mathematics for Sensing Imaging and Modeling) from the Research Council of Finland.
Author Contributions
M.H: Methodology, formal analysis, investigation, software, visualization, writing—original draft, and writing—review and editing; E.M: Methodology, formal analysis, software, writing—original draft, and writing—review and editing; J.T: Conceptualization, methodology, resources, funding acquisition, writing—original draft, and writing—review and editing.
Acknowledgments
Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
During the preparation of this work, the authors used GPT-4 from OpenAI and Microsoft Copilot to improve language and readability. After using this tool/service, the authors reviewed and edited the content as needed. The authors take full responsibility for the content of the publication.
Footnotes
↵1 The atlas is derived from the maximum probability tissue labels derived from the “MICCAI 2012 Grand Challenge and Workshop on Multi-Atlas Labeling” http://www.neuromorphometrics.com/2012_MICCAI_Challenge_Data.htm with the MRIs from OASIS project and the labeled data provided by Neuromorphometrics, Inc. (Neuromorphometrics.com) under academic subscription.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].↵
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].
- [18].
- [19].↵
- [20].↵
- [21].↵
- [22].
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].↵
- [36].↵
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵
- [53].↵
- [54].↵
- [55].↵
- [56].↵
- [57].↵










