Abstract
A major challenge in multimorbid aging is understanding how diseases co-occur and identifying high-risk groups for accelerated disease development, but to date associations in the relative onset acceleration of disease diagnoses have not been used to characterise disease patterns. This study presents the development and evaluation of a neural network Cox model for predicting onset acceleration risk for age-associated conditions, using demographic, anthropomorphic, imaging and blood biomarker traits from 60,396 individuals and 218,530 outcome events from the UK Biobank. Risk prediction was evaluated with Harrell’s concordance index (C-index). The model performed well on internal (C-index 0.6830 ± 0.0902, n = 8, 931) and external (C-index 0.6461 ± 0.1264, n = 855) test sets, attaining C-index ≥ 0.6 on 38 out of 47 (80.9%) conditions. Inclusion of body composition and blood biomarker input traits were independently important for predictive performance. Kaplan-Meier curves for predicted risk quartiles (log-rank p ≤ 1.16E − 16) indicated robust stratification of individuals into high and low risk groups. Analysis of risk quartiles revealed cardiometabolic, vascular-neuropsychiatric, and digestive-neuropsychiatric disease clusters with strong statistically significant inter-correlated onset acceleration (r ≥ 0.6, p ≤ 3.46E − 5), while 13 and 19 conditions were strongly associated with onset acceleration of all cause mortality and all cause morbidity respectively. In prognostic survival analysis, the proportional hazards assumption was met (Schoenfeld residual p > 0.05) in 435 out of 435 or 100% (1238 out of 1334 or 92.8%) of cases across outcomes, aHR = 6.11 ± 9.00 (aHR = 3.67 ± 5.78) with (without) Bonferroni correction. The neural architecture of OnsetNet was interpreted with saliency analysis and several significant body composition and blood biomarkers were identified. The results demonstrate that neural network survival models are able to estimate prognostically informative onset acceleration risk, which could be used to improve understanding of synchronicity in the onset of age-associated diseases and reprioritize patients based on disease-specific risk.
1 Introduction
As we age, the probability of developing more than one condition greatly increases [1]. Multimorbidity, which can be defined as co-occurrence of more than one condition in the same individual, is thus an important problem in aging societies [2, 3], being associated with lower quality of life [4], higher mortality risk [5], and greater usage of healthcare resources than healthy people or those with just one chronic condition [6, 7]. There is a high and increasing prevalence of multimorbidity in the general population, with elderly and socioeconomically deprived groups particularly at risk [1, 8–10]. Chowdhury et al. [11] determined that the global prevalence of multimorbidity was 37.2%, rising to 51.0% in people over 60 years old, while Nunes et al. [12] found that mortality risk of individuals with multimorbidity was 1.73 times greater than those without multimorbidity. Despite these concerns, healthcare delivery systems and quality evaluation remain focused on the treatment of single diseases [13, 14]. An important part of generating an evidence base for improvements in clinical practice is ascertaining patterns of disease and inter-disease associations beyond chance [15].
While epidemiological studies of multimorbidity vary widely in sample size, age settings and statistical methodology, there is a growing consensus that age-associated chronic diseases do not co-occur randomly but in associated fashion, including conditions across the physical and mental health divide [1, 16]. A 2014 meta-review of 14 studies [15] identified cardiometabolic, neuropsychiatric-gastrointestinal, and musculoskeletal-vascular-neuropsychiatric-gastrointestinal co-occurrence clusters, while more recently [17] identified cardiometabolic, neuropsychiatric, and allergic disease co-occurrence clusters as being commonly reported among 41 studies. The majority of reviewed studies used agglomeration methods such as agglomerative hierarchical clustering and explanatory factor analysis to link diseases on the basis of co-occurrence in individuals, and were not restricted to age-associated conditions. While co-occurrence is important for characterising disease associations, another significant factor is age of disease onset [1]. For example, whether an individual develops cardiovascular disease or hearing loss may be less informative for their health status than age of onset, given that prevalence is estimated at 79-86% and 80.6% in those older than 80 and 85 respectively in the United States [18, 19]. The timing of disease onset has been shown to be informative for risk of secondary diagnoses; early onset cancer is associated with increased risk of cardiovascular disease and secondary cancers compared to later onset cancer [20, 21]. Accelerated biological age estimated using DNA methylation or phenotypic aging clocks has been associated with increased risk of mortality and adverse health outcomes compared to less accelerated biological age [22, 23]. A recent study on chronic diseases [24] found that certain lifestyle factors affected only age of onset and not odds of diagnosis for 10 cardiometabolic, autoimmune and respiratory conditions. Taken together, this suggests that the temporal dimension of multimorbidity may be important for characterising disease patterns.
The present study aims to determine (1) the extent to which onset acceleration risk of age-associated diseases can be accurately predicted from individual traits, and (2) how strongly early onset risk of one disease is associated with early onset risk of another disease beyond chance. To this end, Cox models were trained and evaluated for prediction of individual risk for early onset of diverse age-associated diseases. Using data from 60, 396 individuals in the UK Biobank imaging cohort [25], we trained a single multimodal neural network, OnsetNet, to predict time-to-event log-risk of the first diagnosis of 47 conditions with birth as temporal baseline, including two composite conditions of all-cause mortality and all-cause morbidity.
First, we evaluated performance of neural network models across disease conditions using C-index measured on internal center and external center held-out test sets, where model hyperparameters were selected by fivefold cross-validation and internal test and C-index ≥ 0.6 was considered acceptable prediction [26]. We compared performance of neural network models on eight different combinations of five input trait groups (demographic, anthropomorphic, summary body composition, spatial body composition, and blood biomarkers) totalling 43 input variables. We assessed performance differences between neural network and linear architectures to evaluate the benefit of using a neural network, which is state-of-the-art and may be a more appropriate choice where it is too simplistic to assume linearity of the log-risk function [27]. OnsetNet score quartiles were computed to illustrate within-disease heterogeneity across onset acceleration risk quartiles. Between-disease risk relationships were analysed by computing all-to-all correlations in OnsetNet risk quartiles between diseases, and clusters containing disease conditions with high inter-correlated onset acceleration risk were identified with agglomerative hierarchical clustering. We trained standard prognostic linear adjusted Cox models (baseline as measurement time, pre-existing diagnoses excluded) with OnsetNet score as an input trait, yielding adjusted hazard ratios across disease pairs. The proportional hazards assumption was tested by examination of Schoenfeld residuals. Finally, the model was analysed with gradient-based saliency to infer relative importance of individual biomarkers and to assess for biological validity of learned associations between biomarkers and disease risk.
OnsetNet’s output is interpretable as a metric of accelerated biological aging, given that accelerated biological age can be characterized as event acceleration risk of age-associated disease from birth [28]. While the present study is purely associative, OnsetNet’s ability to stratify individuals into high and low risk groups may be useful for clinical settings, where important problems include early identification of high-risk individuals in order to effectively reduce disease burden on the general population [29] and characterising within-disease heterogeneity due to diversity in pathology [30, 31].
2 Methods
2.1 Study participants
This study was a secondary analysis of longitudinal data from the UK Biobank resource [32] under application number 44584. Data was obtained from the UK Biobank imaging cohort [33] for 60,396 individuals who were initially assessed between April 2007 and October 2010 and had outcomes recorded up to November 30th 2022 (follow-up 13.9 ± 0.819 years, min 12.2 years, max 15.6 years). Eligible individuals had full records of all covariates (29,873 male, 30,523 female, age at imaging 66.2 ± 8.01 years). Where multiple measurements were made, the latest measurement for each individual was taken. Data was split into a training and validation dataset (train-val) containing individuals assessed at Cheadle, Reading, Newcastle and unspecified centers (24,984 male, 25,626 female, age at imaging 66.2 ± 8.00 years), an internal test set containing individuals assessed at train-val centers (4,455 male, 4,476 female, age at imaging 66.0 ± 8.10 years), and an external test set contained individuals assessed at the Bristol center (434 male, 421 female, age at imaging 67.9 ± 7.54 years).
2.2 Covariates
Input variables were selected based on risk factors identified in literature review and were separated into 5 groups. Demographic (“demog”) contained sex, race, education level, and Townsend deprivation index. Basic body (“bb”) contained weight, height, body mass index (BMI), waist circumference, grip strength, systolic blood pressure, alcohol consumption, and smoking history. Basic body composition computed from neck-to-knee magnetic resonance imaging scans (“bbc”) contained total mass (MASS), total subcutaneous adipose tissue volume (SAT), total muscle volume (MUSC), total visceral adipose tissue volume (VAT), and total thigh intramuscular and intermuscular adipose tissue mass (TMAT). Multidimensional spatial body composition (“sbc”) contained MASS, SAT, MUSC, VAT, and TMAT measured along 370 slices across the vertical height dimension. Blood-based biomarkers (“blood”) contained low-density lipoprotein cholesterol (LDL-C), triglycerides, apolipoprotein A (ApoA), apolipoprotein B (ApoB), cholesterol, high-density lipoprotein cholesterol (HDL-C), glucose, glycated haemoglobin (HbA1c), insulin-like growth factor 1 (IGF-1), urate, urea, total protein, alanine aminotransferase (ALT), albumin (ALB), alkaline phosphatase (ALP), aspartate aminotransferase (AST), c-reactive protein (CRP), calcium (Ca), creatinine, cystatin C, gamma glutamyltransferase (GGT), phosphate, sex hormone binding globulin (SHBG), total bilirubin, and vitamin D.
For non-demographic variables, age at trait measurement was provided as an additional input variable with the measurement (Figure 1). Continuous input variables were scaled to zero mean and unit standard deviation using statistics from the train-val dataset, and discrete input variables were represented with one-hot encodings. Body composition variables were computed by automated segmentation of MRI instances by a previously published deep learning system that was used to produce publicly available UK Biobank body composition variables [34–37]. In brief, UNet [38] based deep learning models were trained to segment body composition tissues from UKBiobank neck-to-knee Dixon acquisitions using manual annotations. The UKBiobank protocol utilized the Siemens Aera 1.5T scanner (Syngo MR D13) and is detailed in [33].
Architecture of OnsetNet, a neural network Cox model trained to predict onset acceleration as log-risk of first incidence of 47 disease conditions from individual traits, using birth as temporal baseline. Abbreviations: body mass index (BMI), body mass (MASS), muscle (MUSC), visceral adipose tissue (VAT), glycated haemoglobin (HbA1c), apolipoprotein B (ApoB), insulin-like growth factor 1 (IGF-1), c-reactive protein (CRP), low-density lipoprotein cholesterol (LDL-C).
Outcomes for 47 age-related conditions were chosen using literature on aging and multimorbidity [39, 40]. We obtained date of first occurrence for 45 non-composite disease categories defined by ICD10 codes (Appendix A.4) from the UK Biobank first occurrence register, which incorporates primary care, hospital inpatient, death register and self-reported records. Date of death was taken from the death register and date for all-cause morbidity was taken as first occurrence date for any of the 45 non-composite disease categories.
2.3 Deep learning estimation of onset acceleration risk
Neural network models trained to predict onset acceleration risk consisted of separate encoders per trait group each producing a feature vector of fixed length k ∈ {128, 256, 512}, followed by feature concatenation and a shared predictor head with one scalar output for each disease condition (Figure 1). Neural networks had an average of 2.31E6 ± 1.00E6 parameters (min 8.54E5, max 4.92E6, Appendix B).
Fivefold cross-validation was used to train models on the train-val dataset by optimizing objective l(θ), the Cox proportional hazard (Cox PH) partial log-likelihood objective function with weight decay regularization [41] over 47 disease conditions d ∈ D:
where θ are model parameters, λ is the weight decay coefficient, xi are input traits for individual i, log-risk
is the scalar output of the model for disease condition d and individual i, Ed is the censoring vector for d where
(uncensored) if individual with index i received a diagnosed for d and
(censored) otherwise.
is the number of uncensored individuals for d. Birth was used as temporal baseline and time to event was measured in years (Figure 2).
is age at first diagnosis for individual i with a diagnosis of d and otherwise age at censor date (November 30th 2022), risk set
is the set of individuals undiagnosed with d at age t.
Cox models commonly predict risk of event earliness from measurement baseline while OnsetNet models event earliness from birth.
To evaluate performance, test sets were unseen during model selection of neural network Cox models and linear Cox comparison baselines. Hyperparameter values (λ, k, learning rate, dropout rate) for each input trait group combination were chosen by average validation fold C-index. Computation of C-index was batched with batch size of 2048 for tractability [27]. Neural networks were implemented in Python with GPU-accelerated PyTorch 2.4.1 and linear models in R 4.1.2. The code will be made available upon publication.
2.4 Statistical analysis
To investigate onset acceleration risk correlation between conditions, Pearson’s r was computed between OnsetNet score quartiles q ∈ [1, 4] for all pairs of conditions with internal test C-index performance ≥ 0.6. Correlation of r ≥ 0.6 was considered strong and p ≤ 3.46E − 5 was considered significant. Using the all-pairs correlation matrix, a clustering dendrogram was computed using unweighted pair group method with arithmetic mean (UPGMA) hierarchical agglomerative clustering for the linkage function (SciPy cluster.hierarchy.linkage) with cosine distance between correlation vectors as the clustering metric. Statin users (12,157 out of 60,396 in the study cohort reporting statin medication usage in verbal interview) were removed, and sensitivity analyses were conducted for removal of Bonferroni correction and inclusion of statin users.
To evaluate the prognostic ability of onset acceleration scores, a prognostic Cox model was trained for each pair of disease conditions (A, B), with OnsetNet risk quartile for condition A as input and individual log-risk of time-to-event for first occurrence of condition B as output. Baseline time was taken as imaging assessment time, corresponding to the latest date across input trait measurements, outcome events were within 10 years of baseline, individuals with pre-existing diagnoses were excluded, and time-to-event was measured in months. Demographic traits were included as input covariates for primary adjustment of hazard ratios, and secondary adjustment for both demographic and basic body traits was also performed to test for sensitivity. 6 outcome conditions with greater than 75% allocation to either sex (female: gynecological problems, thyroid diseases, osteoporosis; male: prostatic hyperplasia, hyperuricemia/gout, sexual dysfunction) were trained for the dominant sex only. Outcome events for 3 conditions with fewer than 50 uncensored individuals after removal of pre-imaging diagnoses were removed (hemorrhoids, Parkinson’s disease, and somatoform disorders).
Two-tailed tests were used to compute p-values and Bonferroni corrections were made at p = 0.05 level. The study adhered to the EQUATOR TRIPOD+ AI [42] guidelines for transparent reporting of an AI multivariable prediction model and STROBE [43] guidelines for reports of observational studies. Completed checklists are given in Appendix D.
3 Results
3.1 Predictive performance
A summary of participant characteristics for training and test datasets is given in Table 1. Eight different combinations of trait group inputs were evaluated for the neural network OnsetNet architecture and compared to a linear architecture (Table 2). High-dimensional spatial body composition was utilized for neural network models but not linear models due to correlation in body composition along the height dimension. The OnsetNet full traits model (demog+bb+bbc+sbc+blood) attained the highest internal test C-index (0.6830 ± 9.02E − 2), highest combined internal and external test C-index (0.6647 ± 1.11E − 1), and highest validation C-index (0.6856 ± 9.08E − 1) on average over conditions and cross-validation data folds. On the external test set, OnsetNet demog+bb+bbc+blood (0.6472 ± 1.23E − 1) performed best on C-index, followed by OnsetNet full traits (0.6461 ± 1.26E − 1). While performance differences between top OnsetNet models and the linear model were not statistically significant by estimated p-value, larger differences between models were statistically significant (Table 2, Table C10), and the linear model was consistently outperformed by neural networks on all metrics. Within specific conditions, diabetes mellitus non-T1 (0.940 ± 3.06E − 3 for OnsetNet full traits) and severe vision reduction (0.550 ± 2.54E − 3 for OnsetNet full traits) performed best and worst on internal test respectively (Figure 3).
Internal test set C-index across disease conditions for linear and neural network architectures. Grey error bars illustrate 95% confidence intervals.
We analysed the importance of adding individual trait groups to model inputs (Table C10). The largest performance gains on internal test C-index from adding traits to model inputs were adding bb to demog (Δ0.0542, p = 9.68E − 15), sbc to demog+bb (Δ0.0295, p = 4.81E − 5), and blood to demog+bb (Δ0.0272, p = 4.48E − 4). On internal and external test sets, utilizing body composition traits increased performance and was approximately twice as important in the absence of blood biomarkers. On internal test set C-index, adding bbc to demog+bb increased performance by Δ0.0244 (p = 6.85E − 4), sbc to demog+bb by Δ0.0295 (p = 4.81E − 5), bbc to demog+bb+blood by Δ0.0122 (p = 1.42E − 1), and sbc to demog+bb+blood by Δ0.0143 (p = 8.59E − 2). For models with blood traits included, small gains were observed on internal test C-index from utilizing increasingly complex body composition inputs (demog+bb+bbc+blood: 0.6799 ± 9.08E − 2, demog+bb+sbc+blood: 0.6820 ± 9.04E − 2, demog+bb+bbc+sbc+blood: 0.6830 ± 9.04E − 2), but performance on the external test set was highest for basic body composition (demog+bb+bbc+blood: 0.6472 ± 1.233E − 1, demog+bb+sbc+blood: 0.6446 ± 1.269E − 1, demog+bb+bbc+sbc+blood: 0.6461 ± 1.264E − 1). C-index was lower (Δ − 0.0306 ± 0.0080) with higher variance on the external test set compared to the internal test set for all input trait combinations.
Following model selection and performance evaluation, the full dataset was utilized for post-hoc analysis. The best performing neural network by both average validation C-index and test C-index was demog+bb+bbc+sbc+blood which provided OnsetNet scores for analysis. Onset acceleration score quartiles were computed by discretizing OnsetNet scores using sex and condition specific quartile bins computed on the train-val dataset. Kaplan-Meier estimates for onset acceleration score quartile illustrated within-condition heterogeneity (Figure 4).
Kaplan-Meier survival curves of age in years (horizontal axis) against survival probability (vertical axis) for OnsetNet risk quartiles. All conditions and at-risk counts are given in Appendix C.
3.2 Inter-disease onset acceleration analysis
Pearson’s r was computed between OnsetNet score quartiles for each pair of 38 conditions where both conditions had internal test C-index performance ≥ 0.6 (Figure 5, Figure 6). 1420 out of 1444 correlations were significant at p ≤ 3.46E −5 level (Bonferroni corrected, p = 5.87E − 9 ± 9.81E − 8) including all strong correlations (r ≥ 0.6). Disease clusters and strong correlations were reproduced in sensitivity analysis without Bonferroni correction (p ≤ 0.05) or with inclusion of statin users (Appendix C).
Clusterplot illustrating structure in the onset acceleration risk of age-associated diseases. Onset acceleration risk quartile correlations for conditions with internal test C-index ≥ 0.6 were clustered by hierarchical agglomerative clustering with cosine distance between correlation vectors as the clustering metric. Color denotes risk quartile correlation strength. Statin users were removed and 24 out of 1444 entries with p > 3.46E − 5 (Bonferroni corrected) were set to 0 for cosine distance. Row and column colours indicate condition types.
Graph of inter-disease OnsetNet risk quartile correlations for conditions with internal test C-index ≥ 0.6 and correlations with r ≥ 0.3 and p ≤ 3.46E−5 (Bonferroni corrected). Statin users were removed and r scales with edge width.
Onset acceleration risk quartile for all cause mortality and all cause morbidity were strongly correlated with each other (r = 0.78) and jointly strongly correlated with anxiety (r = 0.86, 0.82), dementia (r = 0.84, 0.82), depression (r = 0.80, 0.83), migraine/chronic headache (r = 0.77, 0.69), cardiac valve disorders (r = 0.75, 0.65), sexual dysfunction (r = 0.73, 0.67), hypotension (r = 0.69, 0.63), cardiac arrhyth-mias (r = 0.64, 0.63), and rheumatoid arthritis/chronic polyarthritis (r = 0.63, 0.79). Insomnia (r = 0.81), chronic gastritis/GERD (r = 0.78), intestinal diverticulosis (r = 0.75), liver disease (r = 0.72), cardiac insufficiency (r = 0.69), neuropathies (r = 0.68), urinary incontinence (r = 0.68), obesity (r = 0.67), and urinary tract calculi (r = 0.63) were strongly correlated with all cause morbidity but not with all cause mortality. Correlation between onset acceleration risk in obesity and all cause mortality was r = 0.43. Conversely, all cause mortality was strongly correlated with Parkinson’s disease (r = 0.72), anemia (r = 0.68), and atherosclerosis/PAOD (r = 0.62), which were not strongly correlated with all cause morbidity. The two strongest onset acceleration correlations with Parkinson’s disease were dementia (r = 0.73) and all cause mortality (r = 0.72).
Within non-composite conditions, onset acceleration risk clusters were identified by cosine distance between correlation vectors. A cardiometabolic cluster (corners of Figure 5) was identified where lipid metabolism disorders were associated with chronic ischemic heart disease (r = 0.79), hypertension (r = 0.79), cerebral ischemia/chronic stroke (r = 0.70), diabetes mellitus non-T1 (r = 0.63), hyperuricemia/gout (r = 0.61), and chronic cholecystitis/gallstones (r = 0.60) in a cluster that did not include obesity (r = 0.35).
Second, chronic gastritis/GERD was associated with a mixed digestive and neuropsychiatric cluster (lower right, Figure 5) containing intestinal diverticulosis (r = 0.85), rheumatoid arthritis/chronic polyarthritis (r = 0.79), all cause morbidity (r = 0.78), urinary tract calculi (r = 0.77), insomnia (r = 0.77), depression (r = 0.75), anxiety (r = 0.73), liver disease (r = 0.73), neuropathies (r = 0.71), chronic chole-cystitis/gallstones (r = 0.71), urinary incontinence (r = 0.66), obesity (r = 0.63), dementia (r = 0.63), chronic ischemic heart disease (r = 0.60), cardiac insufficiency (r = 0.60), and thyroid diseases (r = 0.60).
Third, a vascular-neuropsychiatric cluster was identified (upper left, Figure 5), where onset acceleration risk in cardiac insufficiency was strongly associated with cardiac arrhythmias (r = 0.80), dementia (r = 0.75), hypotension (r = 0.73), rheumatoid arthritis/chronic polyarthritis (r = 0.71), cardiac valve disorders (r = 0.71), depression (r = 0.70), all cause morbidity (r = 0.69), atherosclerosis/PAOD (r = 0.69), insomnia (r = 0.68), thyroid diseases (r = 0.68), anxiety (r = 0.68), urinary incontinence (r = 0.67), parkinson’s disease (r = 0.64), anemia (r = 0.62), neuropathies (r = 0.61), chronic gastritis/GERD (r = 0.60), and bronchitis/COPD (r = 0.60).
Notable associations within the vascular-neuropsychiatric cluster included anxiety and depression (r = 0.9) which demonstrated the strongest correlation in onset acceleration risk overall, with top shared correlations for all cause mortality (r = 0.86, 0.80), all cause morbidity (r = 0.82, 0.83), dementia (r = 0.81, 0.81), migraine/chronic headache (r = 0.78, 0.77), rheumatoid arthritis/chronic polyarthritis (r = 0.77, 0.79), hypotension (r = 0.76, 0.73), sexual dysfunction (r = 0.74, 0.71), anemia (r = 0.74, 0.67), chronic gastritis/GERD (r = 0.73, 0.75), cardiac valve disorders (r = 0.71, 0.67), insomnia (r = 0.71, 0.79), intestinal diverticulosis (r = 0.69, 0.73), liver disease (r = 0.69, 0.69), cardiac insufficiency (r = 0.68, 0.70), urinary incontinence (r = 0.67, 0.69), cardiac arrhythmias (r = 0.64, 0.63), neuropathies (r = 0.63, 0.66), parkinson’s disease (r = 0.62, 0.62), gynecological problems (r = 0.61, 0.60), and urinary tract calculi (r = 0.60, 0.62). The strongest association with anemia was hypotension and vice versa (r = 0.84).
3.3 Prognostic survival analysis
Adjusted hazard ratios (aHR) for OnsetNet Q4 (Q1 as reference) for 47 input conditions and 44 outcome event conditions were rendered (Figure 7). 1334 out of 2068 (64.5%) hazard ratios were significant at p = 0.05 level after primary adjustment without Bonferroni correction, of which Schoenfeld residuals were insignificant (p > 0.05) for 1238 or 92.8% (aHR = 3.67 ± 5.78), and 1037 or 77.7% remained statistically significant after secondary adjustment. 435 out of 2068 (21.0%) hazard ratios were significant at p = 9.05E − 7 level (Bonferroni corrected p = 0.05) after primary adjustment (aHR = 6.11±9.00), where all Schoenfeld residuals were insignificant (p > 0.05), and 184 (42.3%) remained statistically significant after secondary adjustment. With Bonferroni correction, 25 out of 44 (56.8%) input conditions yielded statistically significant Q4 hazard ratios for the same outcome condition (aHR = 11.09 ± 17.8), and 27 statistically significant Q4 hazard ratios were observed for all cause mortality (aHR = 6.26 ± 9.33) of which 23 were strong with at least threefold increase in mortality risk compared to Q1 (aHR ≥ 3, aHR = 6.85 ± 9.98). Using binarized OnsetNet score instead of quartiles (above median vs. below median, Appendix C) yielded 23 statistically significant adjusted hazard ratios for all cause mortality with Bonferroni correction (aHR = 2.24 ± 0.408) of which only 2 were strong (all cause mortality and dementia, aHR = 3.39 ± 0.222), suggesting that onset acceleration risk Q4 was disproportionately associated with mortality hazard compared to lower quartiles.
Adjusted log hazard ratios for OnsetNet risk Q4 (Q1 as reference) given primary adjustment with demographic variables. Color red (blue) indicates positive (negative) log hazard ratios and color intensity scales with log hazard ratio magnitude. Gray cells denote log hazard ratios that were statistically insignificant following primary adjustment (p > 9.05E − 7, Bonferroni corrected) or where Schoenfeld residual p ≤ 0.05. Entries that remained statistically significant after secondary adjustment with basic body traits are underlined to denote robustness to change in adjustment.
3.4 Biomarker saliency
To investigate the relationship learned by the neural network between input traits and onset acceleration risk of conditions, we computed sex-specific normalized absolute partial derivative of network input variables with respect to each output [44, 45]. Univariate traits (Figure 8) and multivariate spatial body composition (Figure 9) were normalized separately.
Saliency map for univariate traits for (a) male and (b) female participants. Top 5 salient traits per condition are labelled. Average saliency across conditions given in the bottom row.
Spatial body composition saliency maps averaged over 47 conditions for (a) male and (b) female participants. Condition-specific maps are given in Appendix C.5.
Known relationships inferred by saliency without a-priori knowledge include creatinine and cystatin C for renal insufficiency, HbA1c for diabetes mellitus non-T1, urate for hyperuricemia/gout, GGT for liver disease, total MASS for obesity, systolic blood pressure for hypertension and hypotension. Notably, BMI and total MASS were not in the top 5 traits for onset acceleration risk of diabetes mellitus non-T1, which was dominated by HbA1c. The 10 most important individual traits on average over conditions for the full traits OnsetNet model were total MASS, total SAT, total MUSC, total VAT, grip strength, GGT, HbA1c, HDL-C for both males and females, ApoB and cholesterol for males, and AST and total TMAT for females. Additional traits in the top 10 for the full univariate traits model included smoking, total TMAT, and urate for males, and alcohol, smoking and urate for females (Appendix C). The most salient spatial body composition regions were VAT and MASS located at the abdomen for males, and muscle and MASS around the abdomen for females on average over conditions. Less overt relationships inferred for both sexes included HBA1c, HDL-C and urate for anemia; GGT, ApoB, AST and body composition for dementia; urate for osteoporosis; total MASS, HBA1c and grip strength for insomnia. Total MASS was highly salient for chronic low back pain, and total TMAT was highly salient for cardiac conditions (arrhythmias, insufficiency, valve disorders). For both sexes, body composition traits were more salient than blood biomarkers for predicting onset acceleration of dementia and Parkinson’s disease; the top five univariate biomarkers for Parkinson’s disease were basic body composition traits. ALT was salient for liver disease but not in the top 5 traits, which featured AST and GGT.
4 Discussion
For the methodological contribution of the study, we validated that onset acceleration risk of diverse age-associated diseases can be modelled using neural network and linear Cox architectures, with neural networks outperforming linear models on in-distribution and out-of-distribution test data. While there was high variability in predictive performance across conditions, the final model selected by cross-validation utilized all input traits and attained internal test C-index ≥ 0.6 on 38 out of 47 conditions (80.9%). Model applicability was not limited by temporal differences in trait measurement times (11.3 ± 3.03 years between blood samples and imaging scans for individuals in this study) since the relevant period is lifespan instead of post-measurement time. Inclusion of body composition was found to be important for predictive performance across disease conditions, with performance gain from adding body composition metrics to basic demographic and anthropomorphic traits being comparable to the performance gain from adding blood biomarkers.
In post-hoc analysis, a number of connections between diseases with existing support in the literature were inferred without a-priori knowledge, such as associations in onset correlation risk between Parkinson’s disease, dementia and all cause mortality, cardiovascular and cerebrovascular conditions [16, 39, 46]. Strong correlations in onset acceleration risk were inferred for cardiometabolic, digestive-neuropsychiatric, and vascular-neuropsychiatric condition groups, partially corroborating a previous review of cross-sectional multimorbidity studies across different populations which identified cardiometabolic, neuropsychiatric-gastrointestinal, and musculoskeletal-vascular-neuropsychiatric-gastrointestinal co-occurrence clusters [15] despite the use of different methodology that associated diseases by co-occurrence. Automated inference of associations between biomarkers and disease identified pre-supported associations in a precise manner, including biomarkers for renal insufficiency, diabetes and liver disease, adding to previous work that used partial derivative based saliency to identify biologically salient factors for disease [45]. Saliency analysis validated the importance of grip strength and body composition traits [47].
This study is subject to several limitations. Firstly, existing work showed that the UK Biobank cohort contains a healthy participant bias and is predominantly composed of individuals from White ethnic backgrounds [48]. Although models adjusted for sociodemographic factors, conclusions drawn may not hold for more representative or ethnically diverse populations. All conditions except all cause mortality were reliant on the timing and definition of diagnoses, which depends on factors such as access to healthcare services and diagnostic criteria. Underestimation of disease prevalence in the UK Biobank is known to occur secondary to underdiagnosis [49, 50]. While Cox models are the dominant survival analysis approach in medical research and clinical trials [51, 52] and have been used for time-to-death analysis of aging clocks [22], it is an approximate model of survival as the proportional hazards assumption generally does not hold perfectly. In this exploratory analysis, results from evaluation against ground truth event times (Table 2, Figures 3, 4, 7) indicated that the Cox PH objective was capable of training models to predict onset acceleration risk, but alternative models may also be applied in future work. Additionally, models were assessed on held-out in-distribution and out-of-distribution test sets but are yet to be evaluated on an entirely independent cohort, and direct clinical utility and explicit causal relationships were not assessed. Prospective and intervention based studies are required to show clinical utility and support the validity of associations inferred in this study.
In summary, we introduced and evaluated the use of Cox models for modelling onset acceleration risk in the context of multimorbid aging, and performed analysis on inter-disease relationships, within-disease heterogeneity across individuals, and biomarker saliency for onset acceleration risk prediction. This work opens up novel approaches for the analysis of age-related diseases and associated biomarker predictor traits, with potential clinical applicability to early identification of high-risk groups in the population.
Declarations
Data Availability
Participant data from the UK Biobank cohort was obtained through UK Biobank Access Application number 44584. The UK Biobank has approval from the North West Multi-center Research Ethics Committee (REC reference: 11/NW/0382). All methods were performed in accordance with the relevant guidelines and regulations, and informed consent was obtained from all participants. Researchers may apply to use the UK Biobank data resource by submitting a health-related research proposal that is in the public interest. More information may be found on the UK Biobank researchers and resource catalogue pages (www.ukbiobank.ac.uk).
Sources of Funding
Institutional funding for the study was provided by Calico Life Sciences LLC.
Disclosures
The authors report no conflicts of interest.
Author statement
M. X. Ji conducted literature search, study design, data curation and verification, model training and data analysis, results interpretation, figures, writing first draft, and writing review and edits. M. Thanaj conducted literature search, study design, data curation and verification, results interpretation, and writing review and edits. L. Néhale-Ezzine, B. Whitcher, E. L. Thomas, and J. D. Bell conducted results interpretation, and writing review and edits.
Data access statement
Participant data from the UK Biobank cohort was obtained through UK Biobank Access Application number 44584. The UK Biobank has approval from the North West Multi-center Research Ethics Committee (REC reference: 11/NW/0382). All methods were performed in accordance with the relevant guidelines and regulations, and informed consent was obtained from all participants. Researchers may apply to use the UK Biobank data resource by submitting a health-related research proposal that is in the public interest. More information may be found on the UK Biobank researchers and resource catalogue pages (www.ukbiobank.ac.uk). No participant or public involvement occurred during the design, conduct, reporting, interpretation, or dissemination of this study.
Acknowledgements
The authors would like to thank the UK Biobank Reading team, Madeleine Cule, Ramprakash Srinivasan, Adam Kosiorek, and Martin Tobin for their support.
Appendix A Dataset
Dataset details for n = 60, 396 individuals are provided, including demographics, outcome counts, and biomarker descriptions.
A.1 Demographics
A.2 Trait distributions across dataset
A.3 Data censoring
The UK Biobank dataset used in this study contained participants of middle age and older (50.18 years old was mean age minus two standard deviations) who had medical records collected within a finite time window, therefore data was subject to left and right censoring. Cox PH survival analysis accounted for right censoring via relative risk adjustment; participants whose medical record window were not extended late enough to include future diagnoses were optimised to have higher risk scores than people with diagnoses within the window. Limitation imposed by left censoring was ameliorated by focusing on age-associated diseases which are more prevalent from middle age onwards. Finally, large dataset size assisted the model in learning robust associations inspite of limited window size.
A.4 Disease conditions
A.5 Quartile definitions
A.6 Data pre-processing
The UK Biobank study protocol is publically available [25]. The inclusion flowchart is given in A2.
Body composition variables
The sbc variable with dimension [5 × 370] consisted of mass, subcutaneous adipose tissue, muscle, visceral adipose tissue, thigh intramuscular and intermuscular adipose tissue channels measured along 370 vertical slices, segmented and landmarked (at mid-thigh, hip, T12 vertebra as top of the iliopsoas muscle, and shoulder) with a previously published deep learning system [34–37]. Spatial body composition was computed from tissue segmentation maps by summing tissue volume for each vertical slice to yield a [1 × 370] vector per tissue type along the height dimension, followed by linear interpolation to register vectors to sex-specific average landmark positions and concatenation to combine vectors into a matrix with dimension [5 × 370].
Appendix B Architecture
OnsetNet’s architecture is illustrated in Figure B3.
Architecture for neural network OnsetNet with channel sizes and PyTorch naming convention. Input variables shown in green. mg denotes input dimensions of trait group g. Encoders produce feature vectors of fixed length k ∈ {128, 256, 512}, where k is a hyperparameter.
Appendix C Additional Results
C.1 Addition of biomarker inputs
C.2 Test performance
External test set C-index across disease conditions for neural network and linear architectures. Gray error bars illustrate 95% confidence intervals.
Internal test set C-index across disease conditions for neural network architectures. Gray error bars illustrate 95% confidence intervals.
External test set C-index across disease conditions for neural network architectures. Gray error bars illustrate 95% confidence intervals.
C.3 Kaplan-Meier curves
C.4 Inter-disease risk analysis
C.4.1 Onset acceleration correlation
The clustering dendrogram in Figure 5 utilized UPGMA (unweighted pair group method with arithmetic mean) hierarchical agglomerative clustering for the linkage function (SciPy cluster.hierarchy.linkage) with cosine distance between correlation vectors as the clustering metric, where defined the linkage function between non-identical clusters u and v, i (j) indexes members of cluster u (v), and c denotes cosine distance. Color red (blue) indicates positive (negative)
All inter-disease Pearson correlations r for non-statin users. Color red (blue) indicates positive (negative) correlation and color intensity scales with correlation magnitude. Conditions with internal test C-index < 0.6 and correlations with p > 3.46E − 5 (Bonferroni corrected) indicated in gray.
All inter-disease Pearson correlations r, statin users included. Color red (blue) indicates positive (negative) correlation and color intensity scales with correlation magnitude. Conditions with internal test C-index < 0.6 and correlations with p > 3.46E − 5 (Bonferroni corrected) indicated in gray.
All-to-all OnsetNet score quartile correlation for conditions with internal test C-index ≥ 0.6, clustered by cosine distance between correlation vectors. Color red (blue) indicates positive (negative) correlation and color intensity scales with correlation magnitude. Statin users were included and 28 out of 1444 entries with p > 3.46E − 5 (Bonferroni corrected) were set to 0 for cosine distance computation.
All inter-disease Pearson correlations r for non-statin users without Bonferroni correction. Color red (blue) indicates positive (negative) correlation and color intensity scales with correlation magnitude. Conditions with internal test C-index < 0.6 and correlations with p > 0.05 indicated in gray.
All-to-all OnsetNet score quartile correlation for non-statin users and conditions with internal test C-index ≥ 0.6, clustered by cosine distance between correlation vectors. Color red (blue) indicates positive (negative) correlation and color intensity scales with correlation magnitude. 12 out of 1444 entries with p > 0.05 (no Bonferroni correction) were set to 0 for cosine distance computation.
C.4.2 Associations in onset acceleration correlation
C.4.3 Prognostic survival analysis
Prognostic risk evaluation
Although commonly used exclusively for prognostic prediction in biomedical contexts [27], Cox PH models do not assume events must occur after trait measurement baseline. The Cox PH objective function specifies a well-defined optimization problem for risk prediction functions regardless of whether variable xi (Section 2.3) contains traits of the individual measured before or after the event occurred. Model hd was trained and evaluated as a general (not exclusively prognostic) risk prediction model taking variable xi containing imperfect information for event time of condition d whether retrospective or prospective. Prognostic value of the model was separately verified in Section 3.3 with exclusion of events that occurred before trait measurement time.
All-to-all adjusted log hazard ratios for OnsetNet risk Q4 (Q1 as reference) given primary adjustment with demographic variables. Color red (blue) indicates positive (negative) log hazard ratios and color intensity scales with log hazard ratio magnitude. Results with adjusted hazard ratio p > 0.05 (without Bonferroni correction) or Schoenfeld residual p ≤ 0.05 omitted in gray. Entries that remained statistically significant after secondary adjustment with basic body traits are underlined to denote robustness to change in adjustment.
All-to-all log adjusted hazard ratios for binarized OnsetNet risk (above-median; below-median as reference) given primary adjustment with demographic variables. Color red (blue) indicates positive (negative) log hazard ratios and color intensity scales with log hazard ratio magnitude. Results with adjusted hazard ratio p > 0.05 (without Bonferroni correction) or Schoenfeld residual p ≤ 0.05 omitted in gray. Entries that remained statistically significant after secondary adjustment with basic body traits are underlined to denote robustness to change in adjustment.
All-to-all log adjusted hazard ratios for binarized OnsetNet risk (above-median; below-median as reference) given primary adjustment with demographic variables. Color red (blue) indicates positive (negative) log hazard ratios and color intensity scales with log hazard ratio magnitude. Statistically insignificant adjusted hazard ratios with p > 9.05E − 7 (Bonferroni corrected p > 0.05) or Schoenfeld residual p ≤ 0.05 are omitted in gray. Entries that remained statistically significant after secondary adjustment with basic body traits are underlined to denote robustness to change in adjustment.
C.5 Biomarker saliency
Saliency maps for full traits (demog+bb+bbc+sbc+blood) and full univariate traits (demog+bb+bbc+blood) OnsetNet models are provided.
C.5.1 Full traits model
C.5.2 Full univariate traits model
Appendix D Reporting checklists





Footnotes
Improved readability in the text and figures.
References
- [1].↵
- [2].↵
- [3].↵
- [4].↵
- [5].↵
- [6].↵
- [7].↵
- [8].↵
- [9].
- [10].↵
- [11].↵
- [12].↵
- [13].↵
- [14].↵
- [15].↵
- [16].↵
- [17].↵
- [18].↵
- [19].↵
- [20].↵
- [21].↵
- [22].↵
- [23].↵
- [24].↵
- [25].↵
- [26].↵
- [27].↵
- [28].↵
- [29].↵
- [30].↵
- [31].↵
- [32].↵
- [33].↵
- [34].↵
- [35].
- [36].
- [37].↵
- [38].↵
- [39].↵
- [40].↵
- [41].↵
- [42].↵
- [43].↵
- [44].↵
- [45].↵
- [46].↵
- [47].↵
- [48].↵
- [49].↵
- [50].↵
- [51].↵
- [52].↵