Abstract
Clinical and molecular heterogeneity of Alzheimer disease (AD) is increasingly recognized as a major factor impacting the diagnosis, the design of therapeutic interventions and clinical trials, and therefore possibly delaying the development of effective treatments for AD. We sought to determine whether cross-omics integration reveals molecular profiles of AD with clinical and biological relevance. By leveraging cross-omics approaches, we integrated high-throughput transcriptomic, proteomic, metabolomic, and lipidomic profiles from AD and control cases from multiple cortical regions and cohorts. We identified four distinct molecular profiles in which a specific profile was associated with significantly higher Clinical Dementia Rating (CDR) at death, shorter survival after onset, more severe neurodegeneration and astrogliosis, and decreased levels of metabolomic profiles. Its molecular signatures show significant dysregulation of synaptic genes indicating neuron/synapse losses and dysfunction at later stages of AD, and present in multiple cortical regions. Among other molecules, we found that the expression of alpha-synuclein (SNCA) and SNAP25 were downregulated in this profile. Overall, our results suggest that AD has a more prominent synaptic loss and dysfunction associated with worse cognition. These novel molecular findings open the possibility for new biomarkers for the molecular staging of AD and potential therapeutic targets to ameliorate cognitive decline and AD progression.
Main
Alzheimer disease (AD) is a neurodegenerative disorder pathologically characterized by amyloid (A³) plaques, neurofibrillary tangles (NFTs), neuroinflammation, and synaptic and neuronal loss. AD is the most common cause of dementia among the elderly, currently affecting approximately 6.2 million individuals of age 65 and older in the United States and 50 million worldwide1. However, AD shows a large genetic, clinical, and pathological heterogeneity. Similar to the variability of patterns and rates at which amyloid deposition, tauopathy, and neurodegeneration are observed among the affected population2–4, participants with variable degrees of cognitive impairment at the time of death can exhibit very similar AD associated neuropathology. In addition, AD pathology hallmarks are commonly accompanied by variable burdens of one or more other additional neurodegenerative pathologic features, including cerebrovascular lesions, Lewy body pathology, and TDP-43 proteinopathy3. Concordant with these observations, growing evidence suggests that molecular profiles also differ substantially within AD5, 6. This multifaceted heterogeneity suggests that multiple biological pathways are involved in AD 3 presenting a major challenge for the development of therapeutic interventions and companion diagnostics. However, the heterogeneity of AD has not been fully considered in the design of therapeutic interventions and clinical trials, possibly because of a detailed map of the AD molecular heterogeneity is still missing. Characterizing this heterogeneity on a biological level 3 by identifying specific changes in synapse/neuronal population, demyelination, glial activation, for instance 3 may provide critical insights to develop new peripheral biomarkers, companion diagnostics, and therapeutic interventions, tailor the design of clinical trials, and bring closer precision medicine for AD.
In recent years, multiple clinical and pathological AD subtypes have been identified based on distinct spatiotemporal trajectories of tau pathology, brain atrophy, postmortem brain transcriptomics profiles, and cerebrospinal fluid proteomics. Differences in characteristics associated with these subtypes including, age at onset, sex, APOE ’4 carrier status, cognitive status, disease duration, and cerebrospinal fluid (CSF) biomarker levels, were previously reported2, 3, 5–11. Prior research efforts have leveraged molecular profiles and biomarkers to characterize the AD population into subgroups, and have provided information on AD progression, that in turn could be used in clinical trials. However, the influence that such AD profiles may have on cognitive decline remains poorly understood and under-studied, representing a barrier to using this information effectively in clinical trials of potential disease-modifying therapies.
Previous studies have proposed multiple AD subtypes examining clinical, neuropathological, and molecular features7. Recently, distinct spatiotemporal tau deposition trajectories have been identified using tau PET imaging2, which suggest significant differences between sex, APOE e4, and age. In parallel, three AD subtypes associated with multiple dysregulated pathways, such as susceptibility to tau-mediated neurodegeneration, amyloid-³ neuroinflammation, synaptic signaling, immune activity, mitochondria organization, and myelination using a system biology approach to study transcriptomics from multiple cortical regions5. Despite the substantial effort in identifying molecular subtypes of AD described in this study, broader conclusions are limited by its nature as a single-omics analyses typically captures changes in a single component of the biological cascade.
Comprehensive studies that aggregate multiple modalities of omics (cross-omics) have proven to be a powerful way to identify molecular subtypes for complex human diseases including cancer12–17. Omics layers describe the biological system at different biomolecular levels18, allowing more comprehensive segregation of AD biology into molecular profiles that the partial analyses of any individual modality might miss. Cross-omics approaches can reveal not only how complex biomolecular profiles change with the disease, but also the relationships and correlations between distinct classes of biological molecules. Their potential for studying neurodegeneration has been previously described19–28. However, only recently deep molecular characterization of multiple cortical regions from diverse brain banks and cohorts has only recently been generated5, 6, which now enables a more detailed and comprehensive examination of cross-omics data in AD.
In the current study, we leveraged machine learning, digital deconvolution, and traditional statistical approaches to integrate and analyze multiple high-throughput omics from different cortical regions and cohorts. Our cross-omics data integration approach identified distinct molecular profiles of AD associated with cognitive function and clinical attributes. Our results highlight that molecular heterogeneity is a common characteristic in AD brains and by virtue of molecular stratification, cross-omics approaches reveal multiple molecular pathways associated with AD progression. Overall, these results provide a foundation for implementing precision medicine approaches for AD.
Results
Study design
We leveraged high-throughput transcriptomic, proteomic, metabolomic and lipidomic data from the parietal cortex of participants with neuropath-confirmed AD from the Knight Alzheimer Disease Research Center (Knight ADRC) for our discovery cohort (see Methods, Fig. 1 and Table 1). Omics data were generated using next-generation sequencing, SomaScan, and Metabolon Precision Metabolomics platforms for transcriptomic (60,0754 transcripts)29–31, proteomic (1,092 proteins)32, and metabolomic (627 metabolites)33 respectively (see Methods). Preprocessing, quality control (QC), and feature selection were performed on each omics data type, and only samples that passed the QC criteria in the three omics modalities were retained, resulting in 278 samples (255 AD and 23 control). We employed a Bayesian integrative clustering method named iClusterBayes34 to integrate AD multi-omics data (see Methods) and identify molecular profiles of AD. Control cases were later integrated for comparative analyses.
To investigate the biological relevance of the AD molecular profiles, we performed multiple association analyses including association with clinical and neuropathological attributes (e.g. sex, age of death, age at onset, post-mortem interval, CDR, Braak, etc.), survival, and differential expression analyses to characterize AD molecular profiles. We leveraged digital deconvolutions approaches to infer the cellular population structure using bulk RNA-seq data31, and tested for association of these profiles with cell-type abundances. We have also integrated single-nuclei RNA-seq data (snRNA-seq) from the parietal cortex from 67 Knight ADRC participants to determine the specific cell-type effects.
For replication, we leveraged transcriptomics (RNA-seq) and proteomics (tandem mass tag;TMT) from 116 brains (93 AD and 23 control, Table 1) extracted from the parahippocampal gyrus (BM36) from The Mount Sinai Brain Bank (MSBB) study35 as well as transcriptomics data from 237 brains (144 AD and 93 control, Table 1) from the dorsolateral prefrontal cortex (DLPFC) from The Religious Orders Study and Memory and Aging Project (ROSMAP) cohort36 accessed through the AMP-AD Synapse portal. Preprocessing, QC, and feature selection were performed similar to the discovery datasets.
Identification of distinct molecular profiles of Alzheimer disease
We sought to investigate how multiple omic modalities capture optimally the molecular heterogeneity in AD, without any additional conditioning on additional neuropathological or clinical data. We used Bayesian information criteria (BIC) to evaluate clustering solutions between 2 and 10 clusters and identified four clusters (Knight-Clusters, Fig. 2a) which maximized compactness (intra-cluster similarity). Among the four clusters, we observed that the molecular profiles of Cluster4 (named Knight-C4 hereafter, n=42) exhibited more pronounced dysregulation for many molecules compared to AD participants from other clusters. For instance, the transcriptomic profiles of the top 75% quantile genes show significant dysregulation signatures associated with Knight-C4 (Fig. 2b).
Cross-omics analyses identified a molecular profile of AD brains associated with worse cognitive function and molecular attributes
We initially evaluated whether the alternative molecular profiles were associated with genetic risk factors, neuropathological status, or clinical and demographic variables. We found that AD cases in Knight-C4 showed more severe cognitive decline, measured by significantly higher Clinical Dementia Rating (CDR) scores at death (p=5.24×10-3; Fig. 2c), shorter survival time following disease onset (8 months shorter, Fig. 2d, p=4.2 ×10-3, HR=1.6) and younger age at death (Fig. 2d, p=4.2 ×10-3, HR=1.7) compared to donors in Knight-C1-3. However, the Knight-C4 cluster was not associated with either AD polygenic risk score (Supplementary Fig. 2) or APOE ’4 allele carrier status (Supplementary Fig. 3a). We did not detect any significant association between Knight-C4 and either Braak neurofibrillary tangle stage or amyloid-β plaque score (Supplementary Fig. 3b,c). It is worth mentioning that these annotations are not complete for all the brains, restraining the statistical power.
Next, we leveraged digital deconvolution methods to infer the cellular population structure from bulk RNA-seq31 (see Methods). We identified that the parietal cortices from donors in Knight-C4 show significantly higher astrocyte and lower neuronal proportions (Fig. 2e and Supplementary Table 1). We then evaluated whether a metabolomic profile that we previously reported associated with AD duration33 would reveal additional molecular differences among the clusters. Lower values of this profile (or eigenmetabolite) were associated with a longer disease duration driven by earlier onset33. We found that brains in Knight-C2 and Knight-C4 exhibited significantly lower eigenmetabolite scores (Fig. 2f) using ordinal logistic regression (p=6.0×10-03).
Altogether, these results show the feasibility of our approach in extracting novel insights from AD subpopulations by identifying a late-stage AD profiles associated with more neuronal loss and astrogliosis that caused more alterations in different molecular processes.
The AD molecular profile associated with worse cognition is represented consistently across multiple cortical regions
We sought to investigate whether the molecular profile associated with worse cognitive outcomes was specific to the parietal cortex or also present in additional cortical regions affected at distinct stages of AD. Our analyses of the transcriptomics and proteomics from AD parahippocampal gyrus (BM36) from the MSBB study35 (Table 1) identified two distinct molecular profiles (Fig. 3a). We identified that brains in Cluster1 (MSBB-C1) recapitulate the transcriptomic signatures we had previously identified in Knight-C4. We also ascertained the extent of overlap between the molecules dysregulated in Knight-C4 (Supplementary Table 2) and MSBB-C1 (Supplementary Table 3) and determined a coincident profile of dysregulated genes (Supplementary Fig. 4a, p < 2.2 ×10-16 hypergeometric test, Supplementary Table 4). Furthermore, these donors in MSBB-C1 also exhibited more severe dementia at expiration, shown by a significantly higher CDR scores (p=1.5×10-3, Fig. 3b), and estimated higher and lower proportions of astrocytes and neurons, respectively (p=5.5×10-07 and 3.2×10-09 respectively; Fig. 3c, Supplementary Table 1) compared to other AD cases in MSBB. Furthermore, MSBB-C1 was associated with higher Braak tau scores (p=3.9×10-03, Supplementary Fig. 5a). Although not significant (p=8.0×10-02), these donors show a trend suggesting earlier age of death (Supplementary Fig. 5b). In contrast, we did not identify a significant overlap of dysregulated proteins between regions (p > 0.05, Supplementary Fig. 4b).
Furthermore, we leveraged RNA-seq generated from MSBB donors in additional cortical regions and identified that the distinguishing molecular profile of MSBB-C1 is consistently present in multiple cortical regions including frontal cortex (BM10), superior temporal gyrus (BM22), and inferior frontal gyrus (BM44) (Supplementary Fig. 5c).
Similarly, our analyses of the RNA-seq from the dorsolateral prefrontal cortex (DLPFC) from the ROSMAP cohort36 identified subset of AD donors (Fig. 3d). Brains in Cluster1 (ROSMAP-C1) recapitulated the transcriptomic profiles of Knight-C4 (p < 2.2 ×10-16 hypergeometric test; Supplementary Fig. 4c and Supplementary Table 4 and 5). Donors from ROSMAP-C1 showed evidence of greater cognitive impairment and increased neuropathological AD scores, indicated by significantly lower Mini-Mental State Examination scores (MMSE; p=4.4×10-02; Fig. 3e), higher tau tangle densities (p=1.3×10-02) and neurofibrillary tangle burdens (PHF tau tangles, p=2.1×10- 02; Supplementary Fig. 6a,b). Although not significant (p > 0.05), ROSMAP-C1 showed a trend to higher plaque densities—Global burden of AD pathology (Supplementary Fig. 6c, see Methods and Bennett et al.,201236 for details on how these neuropathological variables are identified). Cell proportions from deconvolution analysis (Supplementary Table 1) also showed that ROSMAP-C1 has significantly higher astrocyte (p=3.1×10-08) and lower neuron (p=9.7×10-14) proportions (Fig. 3f).
Altogether, these results indicate that cellular and molecular profiles of AD cases with worse cognitive status are similar across multiple cortical regions, usually affected at different stages of AD pathology.
The parietal cortex of subjects with worse cognitive function exhibited a remarkable molecular dysregulation
Differential expression (DE) analyses (see Methods, Supplementary Tables 2,6,7) showed that Knight-C4 has a ∼10-fold increase in the number of significantly differentially expressed genes (DEGs) compared to other AD participants (Table 2 and Supplementary Fig. 7a); furthermore, 90% of the 4,661 DEGs were unique to participants in the Knight-C4. (Table 2 and Fig. 4a-left panel). Similarly, we observed an increased number of proteins and metabolites (Table 2 and Supplementary Fig. 7b,c) with significant abundance levels in Knight-C4. However, we determined that Knight-C2 has the largest number of dysregulated proteins. We identified 24 (34%), 232 (51%), and 250 (52%) significant proteins unique to Knight-C1, 2, and 4, respectively (Fig. 4b-left panel and Table 2). We did not detect significant proteins dysregulated uniquely in Knight-C3.
Knight-C4 was associated with cluster-specific significantly different metabolic reading levels (90 hits; Table 2 and Supplementary Fig. 7c) while only one hit was detected in Knight-C2. Out of the 90 hits, 80 (90%) significant metabolites were also significantly different between Knight-C4 and other AD cases. These results suggest that cross-omics analyses capture molecular heterogeneity among AD cases that are otherwise missed in non-clustered omics specific approaches.
To quantify the benefits provided by this cross-omics study that clusters AD brain donors using multi-omics profiles, we performed DE analyses comparing all AD cases combined (un-clustered) to control brains. Then, we determined the percentage of significant features detected in each cluster that are not captured by un-clustered analyses (Table 2, Fig. 4a-c-right panel). We observed that Knight-C4 exhibited the highest percentage of missed hits with 92%, 98%, and 100% for transcriptomics, proteomics, and metabolomics respectively (Fig. 4a-c-right panel). Notably, a high percentage of hits was also observed with Knight-C2 in proteomics (98%). On the other hand, un-clustered analyses identified a small percentage of significant features that are not captured by clustered analyses. For instance, only 5% and 0.5% of un-clustered significant hits for transcriptomics and proteomics, respectively were not captured by Knight-C4. No un-clustered significant metabolites were missed by Knight-C4.
To further characterize distinct profiles of AD brains, we performed DE analyses to compare AD brains in each cluster to brains in other clusters (Supplementary Tables 2,6,7). Knight-C4 showed the highest percentage of significant hits compared to other AD brains (Fig. 4d and Table 3). This significant difference was observed in transcriptomics with 6,822 significant hits compared to 678, 22, and 1 for Knight-C1-3 (Fig. 4d-outer track and Table 3). Similarly, this difference was observed in metabolites with 228 significant hits in Knight-C4 compared to 105, 0, and 0 for Knight-C1-3 respectively (Fig. 4d-inner track and Table 3). No significant differences in the number of hits were observed between the four clusters in proteomics with all clusters showed relatively similar detections (Fig. 4d-middle track and Table 3). We also verified that none of the omics dominate this solution, each of them capturing molecular heterogeneity among AD brains that are not necessarily detected by other omics modalities.
In summary, these results show that molecular profiles of AD are heterogeneous, and cross-omics integration approach can better identify the substantial differences within AD brains.
The AD molecular profile associated with worse cognitive function is significantly enriched in dysregulated synaptic genes and is associated with AD-related pathways
We performed pathway analyses to characterize the brains with worse cognitive function. We found that the hits in Knight-C4 are significantly enriched in dysregulated synaptic genes (Fig. 4e,f, Supplementary Table 8) using the annotation from The SynGO consortium37 (p=1.0×10-15 and p=4.8×10-45 for Knight-C4 vs Control and Knight-C4 vs Knight-C1-3 sets respectively) and curated list from Fei et al., 201838 (p=2.3×10-07 and p=1.1×10-21 for Knight-C4 vs Control and Knight-C4 vs Knight-C1-3 sets respectively). Coincidently, molecular data from the MSBB and ROSMAP studies replicated these results, as MSBB-C1 and ROSMAP-C1 were significantly enriched in synaptic genes (Supplementary Fig. 8a,b and Supplementary Table 8). These results suggest that synaptic and neuronal loss occurs at later stages of AD.
In addition, the pathway analyses of Knight-C4 identified multiple AD-related pathways (Fig. 4g,h). KEGG pathways identified intracellular trafficking and signaling pathways (Fig. 4g) including Endocytosis (p=6.5×10-04), Phagosome (p=1.2×10-06), and mTOR signaling pathway (p=3.3×10-02), all of which were shown to play major roles in the pathological processes of AD39–41. KEGG pathways also identified lysosomes (p=4.7×10-02, Fig. 4g), a critical component within cells. Growing evidence has shown that the dysfunction of lysosomes plays a major role in pathogenesis of multiple neurodegenerative diseases including the progression of AD43–46. Another significant example associated with Knight-C4 was Autophagy (p=1.6×10-02, Fig. 4g), a pathway associated with several neurodegenerative disorders including AD47–50. For instance, the dysfunction of autophagy was shown to be linked to the accumulation of misfolded protein aggregates51. KEGG pathways results also show that the downregulation of Knight-C4, compared to other AD cases, was associated with pathways related to multiple neurodegenerative diseases (Fig. 4g), including AD (p=2.5×10-14), Huntington disease (p=3.3×10-14), and PD (p=2.1×10-12) suggesting that neuronal/synaptic loss and gliosis are common characteristics of most neurodegenerative diseases at late-stages. These analyses also identified synaptic pathways, including dopaminergic synapse (p=7.0×10-03) and glutamatergic synapse (p=1.0×10-02) (Fig. 4g). Furthermore, GO biological process analyses identified significant neuronal differentiation (p=1.8×10-02) and development (p=5.7×10-03) associated with Knight-C4 (Fig. 4h)
Compared to the control group, Knight-C4 showed two interesting KEGG pathways Herpes simplex virus 1 infection (p=4.4×10-03), a risk factor for AD52–55 that lead to neuronal damage and induced gliosis56, 57, in addition to synaptic vesicle cycle (p=4.6×10-05) shown to play a key role in the pathobiology of AD58 (Supplementary Fig. 9a). GO biological process analyses identified several significant synaptic signaling and transmission pathways (Supplementary Fig. 9b). Furthermore, pathway analyses using significant proteins identified AD-related KEGG pathways including PI3K—Akt signaling (p=6.5×10-19), Rap1 signaling (p=5.8×10-14), Axon guidance (p=2.1×10-12), MAPK signaling (p=1.4×10-20), and Ras signaling (p=1.8×10-15) pathways associated with the downregulation of Knight-C4 (Supplementary Fig. 10a). Top GO pathways were related to cell/gene regulatory pathways (Supplementary Fig. 10b).
Altogether, these results show that synapse dysfunction is more pronounced in participants with worse cognitive function (late-stage AD).
Molecular heterogeneity among AD clusters established neuropathological biomarkers
Our DE analyses identified several known neurodegenerative disease genes dysregulated in Knight-C4 including APP, MAPT, CLU, SNAP25, GFAP, SNCA, LRRK2, TARDBP, CHI3L1, MMP9, and C9ORF72 (Fig. 5a). The transcriptomics profiles of Synaptosome Associated Protein 25 (SNAP25) were significantly downregulated in Knight-C4 (Fig. 5b) as has been previously shown in AD59, supporting the likelihood of impaired synaptic function in neurons59. SNAP25 has also been shown as a candidate biomarker in the cerebral spinal fluid (CSF) for synapse degeneration in AD patients60. SNAP25 was significantly DE between Knight-C4 and control (p=5.9×10-3, Fig. 5b) and between Knight-C4 and other AD cases (p=4.2×10-08, Fig. 5b). However, SNAP25 was not significantly DE (p > 0.05, Fig. 5b) in non-clustered AD cases compared to the control, which may suggest that synaptic dysfunction in the parietal cortex is not substantially affected until later stages of AD. The decreased expression of SNAP25 in Knight-C4 was replicated in MSBB-C1 and ROSMAP-C1 cohorts (Supplementary Fig. 11).
We inspected the results for Glial Fibrillary Acidic Protein (GFAP), an emerging biomarker in brain disorders61 usually employed to identify astrocytic reactivity state, a process well documented in AD and neurodegeneration. GFAP was overexpressed uniquely in Knight-C4 compared to Knight-C1-3 (Fig. 5a,c, p=1.7×10-2). In spite of the significant astrogliosis of Knight-C4, GFAP was not significantly DE in Knight-C4 compared to controls, nor was it DE in all AD cases compared to the control (p > 0.05, Fig. 5c). To query the possibility that collinearity between case-control status and cellular population structure introduced spurious results, we repeated these analyses using a simplified model that did not correct for cellular population structure but could not obtain significant association. We obtained similar results when we analyzed AQP4, another marker gene of astrocyte reactivity state, as we only detected significant differences in the levels of AQP4 between Knight-C4 and other cases (p=1.0×10-02). In addition, we did not identify significant differences in GFAP proteomic levels (p > 0.05, Supplementary Fig. 12), however, proteomic levels of GFAP were significantly upregulated in Knight-C2 (Supplementary Fig. 12) compared to the control and AD brains in Knight-C1,3,4 (p=1.0×10-4 and 1.4×10-21 respectively). The increased transcriptomic GFAP levels were also identified in the subset of brains with worse cognitive function from the replication cohorts, MSBB-C1 and ROSMAP-C1 (Supplementary Fig. 13). Altogether these results support heterogeneity in the presence of astrocytic reactivity state and its association with decline of cognition.
Our cross-omics integration analyses also identified cluster-specific dysregulation of Clusterin (CLU), a gene associated with AD62, 63. Genome-wide associations studies have shown that CLU (also known as APOJ) is the third most associated late-onset AD (LOAD) risk gene after APOE and BIN163–66. Proteomic levels of CLU were significantly downregulated in Knight-C2 compared to the control (p=6.4×10-3) and AD cases in Knight-C1,3,4 (p= 9.8×10-7; Fig. 5d,e). CLU was also significantly downregulated in Knight-C4 compared to the control (p=3.4×10-02; Fig. 5e). To assess the clinical implications of CLU dysregulation, we evaluated its levels in the cerebrospinal fluid (CSF) from 553 participants of the Knight ADRC by performing survival analyses for progression using CDR. These analyses indicate that lower CSF CLU is associated with an increased risk of dementia progression (HR=0.42; p=1.0×10-07, Fig. 5f). Nonetheless, we did not observe any significant differences in CLU transcriptomic and proteomics levels in the MSBB and ROSMAP cohorts.
The cross-omics approach identified subset of AD brains with unique abundance levels for AD biomarkers. These results show the capability of cross-omics approaches in detecting AD molecular biomarkers that typical non-clustered approaches sometimes miss when AD cases are analyzed altogether. These results provide support for alternative strategies to reposition traditional biomarkers for precision medicine approaches.
Metabolic differential abundance analyses identified biological pathways associated with synaptic dysregulation and cognitive function
We next looked at the significant metabolites that uniquely characterize brain samples in Knight-C4. To determine whether significant metabolites replicated in additional cohorts, we accessed metabolomics from the same ROSMAP cohort36 (Table 1, Methods) and performed differential abundance analyses comparing the levels from donors in the previously identified ROSMAP-C1 cluster to controls and to other AD cases (Supplementary Table 5). We identified 57 significant metabolites (43 increased and 14 decreased) in Knight-C4 compared to controls that replicated in the ROSMAP-C1 (Fig. 6a, Supplementary Table 9). Similarly, we identified 74 metabolites (70 increased and 4 decreased) in the Knight-C4 compared to other AD cases that also replicated in ROSMAP-C1 (Fig. 6b, Supplementary Table 9).
Our pathway analyses identified several pathways significantly related to AD. Among the set of metabolites decreased in AD (Fig. 6a,b) gamma-aminobutyrate (GABA) and choline, both involved in major neurotransmitter pathways, were identified. GABA, synthesized in neurons, was significantly decreased in Knight-C4 (p= 5.1×10-04) and ROSMAP-C1 (p=1.3×10-02) compared to the control (Supplementary Fig. 14a). The same observation was found when these profiles were compared to other AD cases (p=1.2×10-05 and p= 1.6×10-02 for the Knight-C4 and ROSMAP-C1 respectively, Supplementary Fig. 14a). These results support that those cases with worse cognitive outcomes are associated with dysregulated neuronal or synaptic pathways. As an inhibitory neurotransmitter, GABA has been shown to play a key role in synchronizing the activity of the human cerebral cortex67, 68. Similarly, the metabolomics profiles of choline were decreased in these profiles in both Knight-C4 (p= 4.1×10-06) and ROSMAP-C1 (p=1.2×10-04) compared to the control (Supplementary Fig. 14b). These reading levels of choline also show significantly different abundance in these clusters compared to other AD cases (p= 2.9×10-07 and p=4.7×10-3 for Knight-C4 and ROSMAP-C1 respectively, Supplementary Fig. 14b). Choline was also shown to have a critical role in neurotransmitter function because of its effect on the function of acetylcholine and dopamine69. However, we did not find significant differences in the reading levels of acetylcholine and dopamine. We also did not observe significant differences for additional neurotransmitters included in the panel (serotonin and glutamate). This indicates that the dysregulation of neurotransmitters is possibly specific to GABA and choline. We then studied tryptophan, an amino acid essential for metabolites that serve as neurotransmitters and signaling molecules70. We found that its levels were increased in the Knight-C4 when compared to the control and other AD cases (p=8.3×10-3 and 3.0×10-06 respectively) (Supplementary Fig. 15a). Similarly, we found that brains in ROSMAP-C1 showed increased tryptophan levels compared to the control and other AD brains (p=1.4×10-4 and 2.8×10-07 respectively) (Supplementary Fig. 15a)
Furthermore, several increased carbohydrate/sugar metabolites were identified in the set of metabolites increased in Knight-C4 (Fig. 6a,b) including glucose, mannitol, sorbitol, ribose, and UDP-glucose, indicating a possibility of slower carbohydrate metabolism. As an example, glucose, a metabolite that has been implicated in the pathogenesis of AD71, 72, was significantly increased in the Knight-C4 and ROSMAP-C1 (Supplementary Fig. 15b). Components of glycolysis and the tricarboxylic acid (TCA) cycle were also dysregulated in a manner that suggests decreased glucose metabolism and cellular respiration. Specifically, levels of citrate were decreased in both Knight-C4 (p= 9.7×10-05) and ROSMAP-C1 (p= 1.4×10-03) compared to other AD cases (Supplementary Fig. 16a) indicating possible impaired cellular respiration through the TCA cycle, which produces citrate. NAD+, an important oxidizing cofactor in glucose metabolism and cellular respiration, was also decreased in both Knight-C4 (p=2.2×10-02) and ROSMAP-C1 (p=2.2×10-02) compared to other AD cases (Supplementary Fig. 16b). NAD+ is synthesized from tryptophan, and levels of NAD+ have been shown to decrease with age, potentially contributing to the pathogenesis of AD73. This decreased availability of NAD+ could also contribute to impaired glucose metabolism. These observations are in line with previous findings that associate increased brain glucose and slowed glucose metabolism with more severe AD72, 74.
In addition, several increased sphingolipid metabolites were identified (Fig. 6a,b). Previous studies showed the implication of sphingolipid metabolism deregulation in AD75. For instance, sphingosine, found to have higher levels in AD brains75, showed significantly increased metabolism levels in both Knight-C4 and ROSMAP-C1 (Supplementary Fig. 16c).
Together, these results further characterize the molecular profiles in this subset of AD brains and show synaptic/neuronal dysregulation association and also possible comorbidity between AD and other diseases such as diabetes74 illustrated by the increased carbohydrate/sugar metabolites in this subset.
Single-cell data suggests that pathways in different cell types are more prominently altered in distinct molecular profiles of AD
We studied the cell type specific pathways altered in the AD profiles by leveraging single-nuclei RNA-seq from a subset of the Knight ADRC participants. We selected the top 5% of genes that showed the most specific cell-type expression profile in neurons, astrocytes, oligodendrocytes, oligodendrocyte precursor cells (OPC), and microglia. Then, we used this information to determine if any AD molecular profiles showed a stronger signature of cell-type specific dysregulation. These analyses showed that Knight-C4 was enriched in genes overexpressed in astrocytes and endothelial cells (38% and 23% for all hits respectively; Fig. 7a-left panel and Table 4). KEGG pathways analyses of astrocytic genes identified astrocyte-related pathways (Fig. 7b) including Axon guidance (p=2.8×10-05), and Hedgehog signaling pathway (p=1.6×10-06). Previous work has shown the role of astrocytes in axon guidance during development and repair76 and hedgehog signaling pathways involved in neuroinflammation, neuronal cell differentiation, and neuronal death77. Many of these pathways were also enriched in overexpressed genes in endothelial cells (Supplementary Fig. 17a).
Knight-C4 was also enriched in genes downregulated in microglial and neuronal genes (Fig. 7a-right panel and Table 4). Neuronal genes showed an association with synaptic/neuronal-related pathways (Fig. 7c-d), including Synaptic vesicle cycle (p=1.0×10-06), Retrograde Endocannabinoid Signaling (p=3.6×10-03) key modulators of synaptic function78–80, and Gap-junction (p=2.2×10-02) formation known to play an essential role in intercellular metabolic communication and transmission across electrical synapses81. Microglial downregulated genes implicated AD related pathways including Phagosome (p=1.0×10-06), Herpes simplex virus 1 infection (p=2.0×10-04)—previously associated with AD risk52–55 leading to neuronal damage and induce gliosis56, 57, Gap junction (p=2.2×10-02), and AD (p=5.4×10-08) (Supplementary Fig. 17b). Further explorations showed that Knight-C4 is significantly enriched in downregulated genes related to homeostatic microglia, including C-X3-C Motif Chemokine Receptor 1 (CX3CR1; p= 5.10x-03, Fig. 7e,f) and purinergic receptor P2Y13 (P2RY13; p=1.2×10-2, Fig. 7e,f).
Additionally, we identified genes dysregulated in specific clusters and cell types, including the TP53 induced glycolysis regulatory phosphatase (TIGAR) overexpressed in microglia and upregulated in Knight-C1 (Fig. 7e,f). TIGAR is a p53-inducible protein that regulates energy metabolism and oxidative stress. Among the upregulated endothelial-specific genes in Knight-C1, we identified MYOCD, CNN1, PLN, MYH11, S100A4, SLC13A4, DES, implicating pathways associated with cell cycle, junction, anion transport, Aβ clearance and actin. In particular, MYH11, overexpressed in Knight-C4 (Fig. 7e,f), has been reported to be associated with the risk of dementia82. Moreover, oligodendrocyte-specific genes dysregulated in Knight-C2 include STX4, ACTN4, and FBXO32 relating to pathways associated with vesicle docking, neural tube, junction, apoptosis and actin. Particularly, Syntaxin-4 (STX4), upregulated in Knight-C2,4 (Fig. 7e,f), is critical for vesicle docking and the Wnt frizzled receptor (FZD9) and coreceptor (LRP5), which regulate the synaptic vesicle cycle26. STX4 has also been shown as one of the disease risk genes in AD83.
These results suggest that different pathways mediate distinct molecular profiles of AD in multiple cell types.
Integrative cross-omics analyses identified reduced transcript levels of SNCA across multiple brain regions in AD participants with worse cognitive function
Transcriptomic and proteomic assays showed SNCA (alpha-synuclein) was among the top molecules that best discriminate Knight-C4 versus Knight-C1-3 molecular profiles. Our results show that SNCA transcriptomic levels were significantly downregulated in Knight-C4 compared to Knight-C1-3 (p= 2.0×10-05) and control brains (p=2.2×10-02; Fig. 8a). Interestingly, we did not observe a significant association (p > 0.05) in non-clustered AD compared to the control (Fig. 8a-left panel). This suggests that SNCA levels in the parietal cortex may be altered only in late-stage AD. Alpha-synuclein (αSyn) protein levels were marginally downregulated in brains from Knight-C4 (Fig. 8b) compared to the control (p=5.16×10-02) and significantly lower in Knight-C2 compared to the control (p=7.8×10-5) and Knight-C1,3,4 (p=6.2×10-09).
We also identified downregulation of SNCA/αSyn in MSBB-C1 (Fig. 8c,d) compared to the controls (p=1.8×10-02 and p=1.0×10-05 for transcriptomics and proteomics respectively). Similarly, SNCA levels in MSBB-C1 were significantly lower compared to other AD (p=1.0×10-02 and p=3.4×10-03 for transcriptomics and proteomics respectively, Fig. 8c,d). However, no significant differences in transcriptomic or proteomic levels in the additional donors (MSBB-C2) were observed (p > 0.05, Fig. 8c,d). Moreover, the downregulation of SNCA in the MSBB-C1 donors was also seen in the frontal pole (BM10), superior temporal gyrus (BM22), and inferior frontal gyrus (BM44) brain regions from the same study (Supplementary Fig. 18a-c). Similarly, SNCA transcriptomic levels were significantly downregulated in ROSMAP-C1 (Fig. 8e) compared to the control (p=3.5×10-02) and other AD cases (p=4.5×10-03).
The downregulation of SNCA in Knight-C4 further supports the association of this profile with synaptic dysfunction. SNCA encodes α-synuclein (αSyn), an abundant presynaptic protein that plays a role in regulating synaptic vesicle trafficking and subsequent neurotransmitter release84[PMID:7646890]. αSyn itself is released from neurons via multiple cell biological mechanisms including exocytosis and is detectable in the extracellular space as a monomer and pathologically aggregated species85–87. Mutations in SNCA, overexpression of wildtype αSyn, and injection of pre-formed aSyn fibrils (PFFs) are sufficient to cause neuropathology and parkinsonian syndromes in both humans and animal models88–94.
To further study synapses in a model of pathological αSyn aggregation, we integrated and analyzed NanoString gene expression data (see Methods) generated from a panel of 807 gene transcripts measured in brain tissue of A53T αSyn transgenic mice crossed onto APOE backgrounds95. Of the 807 genes included in the panel, 89 (11%) were classified as synaptic genes (SynGO consortium37). We performed DE analysis to compare mice with high pSyn pathology (defined as > 5% coverage by IHC) to mice with low pathology (< 5% coverage by IHC, see Methods) and identified 255 DEGs, including 13 synaptic genes. We then interrogated which of synaptic genes in DE in Knight-C4 were also dysregulated in the A53T αSyn-APOE transgenic mice (Supplementary Fig. 19). We identified that all of the 13 genes identified in the mouse model were also dysregulated in Knight-C4 compared with other AD cases (Fisher p=3.6×10-09) (Supplementary Fig. 19a). These results suggest that the profile of Knight-C4 might be driven by synaptic dysregulation. When comparing to the control, only five synaptic genes were shared between the three datasets (Supplementary Fig. 19b). These findings were limited by the fact that the NanoString panel contains only 12% of the significant genes in Knight-C4.
Altogether, these results provide more evidence of the association of Knight-C4 with synaptic dysfunction, and suggest that the lower levels of soluble SNCA observed in this profile with worse cognitive function in multiple brain regions may be associated with αSyn (Lewy body) pathology which is present in 50-60% of AD brains96–98.
Discussion
In the current study, we leveraged machine learning approaches to integrate high-throughput multi-omics datasets from AD cases in multiple cohorts and brain regions. Our analyses led us to identify multiple molecular profiles, genes/pathways, and cellular population structures affected in specific AD cases. We found a distinct molecular profile associated with worse cognitive function, earlier age at onset/death, and dysregulated neuronal/synaptic pathways in the parietal cortex. Our results suggest that this molecular profile captures additional neuropathological events not fully captured by β-amyloid or tau staging as previously reported99, 100. Similar to previous studies8, we did not find a significant association between any molecular profile and APOE ε4 or PRS, suggesting that these molecular profiles are not driven by genetic factors associated with genetic AD risk. We also determined, through an exhaustive replication, that this molecular profile is also present in multiple brain regions from the MSBB and ROSMAP cohorts. Further, we demonstrated that in each of these studies, this molecular profile is associated with worse clinical and cognitive function, offering new insights into the molecules that may wax or wane as cognition declines.
Notably, we observed that these donors with worse cognitive function showed a significantly increased proportions of astrocytes and reduced proportions of neurons. This might indicate a neuronal loss at later stages of AD, leading to the development of severe cognitive impairment. Consistent with these findings of gliosis and neuronal loss, we found that the molecular profile of this subset of participants is significantly enriched in dysregulated synaptic genes, which may reflect greater synaptic losses in participants with worse cognitive function. Furthermore, we identified dysfunction of several neuronal and synaptic pathways that are significantly associated with this profile including dopaminergic synapse and glutamatergic pathways. Although the involvement of dopamine in AD is still debatable, several studies have shown an association between AD pathologies (e.g. A³ deposition) and the dopaminergic dysfunction101–109. Similarly, glutamatergic synapse is a major excitatory neurotransmitter in the central nervous system (CNS) that regulates several neuronal functions. Increasingly evidence has demonstrated an association between glutamate alterations and AD pathology110–113. Moreover, metabolic differential abundance analyses identified significant metabolites associated with biologically relevant AD pathways, most of which are related to major neurotransmitter pathways. This may suggest that donors in this profile have a more profound neuronal or synaptic dysregulation. These brains also exhibit a metabolic profile associated with disease duration, supporting its implication in later stages AD. Altogether, these results suggest that these donors have a more prominent synaptic loss and dysfunction associated with worse cognition and advanced AD indicating the feasibility of our approach in identifying molecular changes that occur at late-stage AD. These novel molecular findings can be leveraged to identify new biomarkers and potentially therapeutic targets 3 not necessarily related to amyloid and tau 3 that might be effective in later stages of disease, when therapeutics targeting amyloid seem to be not effective.
Our subsequent transcriptomic and proteomic differential expression analyses showed a significantly higher number of hits associated with this profile. Further investigations showed that many of the genes detected in this profile were neuronal/synaptic genes. Among these, we identified SNAP25 downregulated in this profile in the discovery and replication cohorts. SNAP25 is implicated in several physiological pathways and neuropathologies, including AD and Down syndrome (DS)59. Transcriptomic and proteomic DE analyses also identified several genes associated with AD including GFAP and CLU that are more noticeably dysregulated in the distinct profiles. These two genes were not significantly altered in non-clustered AD cases compared to controls. Increased astrocyte reactivity and overexpression of GFAP has been reported in multiple studies61,114–121. For instance, Greber et al., 199959, showed that elevated expression of GFAP was found in multiple brain regions including frontal, parietal, temporal and occipital cortex regions of AD patients59. In addition, we leveraged CSF proteomics and longitudinal clinical evaluations to show that lower levels of CSF CLU are associated with increased risk of dementia progression in an extended cohort from the Knight ADRC. Multiple studies have shown the ability of CLU to play a neuroprotective role in AD by altering A³ aggregations and promote A³ clearance66, 122–125. More recently, additional studies have identified CLU as mediator regulating A³-induced neurotoxicity66, 126, 127. The heterogeneity among AD cases opens the possibility to develop novel approaches using these and additional genes for biomarkers for precision medicine approaches.
By integrating single-nuclei RNA-seq, we extracted additional insights that support distinct patters of cell types implicated in molecular AD profiles. For instance, we determined that CX3CR1, a homeostatic microglia gene, is significantly downregulated in Knight-C4. The lower expression of CX3CR1 may indicate loss of homeostatic microglia, leading to neuronal damage and loss. The association between homeostatic microglia function and the degree of neuronal cell loss has been shown to be linked to AD128. More recently, the loss of CX3CR1 was shown to be associated with microglial dysfunction illustrated by dampened TGF³-signaling, increased oxidative stress responses, and dysregulated pro-inflammatory activation129. However, we could not determine a significant overexpression of microglia activated genes, nor replicate this in additional cortical regions.
Our results highlight the utility of integrating multi-omics and indicate that cross-omics signatures can better capture differences and discriminate molecular variations in complex and heterogenous diseases such as AD. An increasing body of evidence endorses broad clinical and neuropathological heterogeneity among individuals with AD. Similarly, there is evidence that molecular profiles differ substantially within AD5. Recognizing this heterogeneity is highly important as the statistical noise introduced by molecular heterogeneity within a population has potential to conceal dysregulated pathways and risk biomarkers that might be altered in only subsets of the AD general population. Therefore, identifying cross-omics molecular profiles of AD and their molecular signatures, including genes and pathways, may extract new insights into AD and provide a platform to identify novel treatments and biomarkers and eventually be used toward precision medicine.
We identified that alpha-synuclein is among the top molecules that better discriminate AD profiles. We demonstrated that the transcriptomic/proteomic levels of SNCA were under-expressed in a subset of AD cases with more advanced tau pathology and worse cognition and replicated this finding in several cohorts and brain regions. Alpha-synuclein is the major constituent of Lewy bodies, the histopathologic hallmark of PD and DLB. LB pathology is also present in a majority of advanced AD cases, including autosomal dominant AD and DS96, 97, 130–134. In addition, previous work has demonstrated the relationship between alpha-synuclein and AD pathologies including A³ plaques and tau aggregations using different mouse models135, 136 and implicated alpha-synuclein pathology in neuronal/synaptic dysfunction and death137, 138. Despite the substantial efforts in studying the association between this protein and AD pathology, the mechanism by which alpha-synuclein contributes to the pathogenesis and progression of AD remains elusive. Several regulatory mechanisms that contribute to SNCA levels have been proposed. For instance, previous studies based on primary neuronal cultures87, mice models139, and live cell imaging140 have shown that the neuronal or synaptic activities regulate the aggregation of SNCA. The downregulation of SNCA in a subset of AD cases with unfavorable outcomes suggests that SNCA might be a marker of neuron/synapse losses that can be used for identifying AD stages.
Although this work provides new insights for understanding the genetic and molecular heterogeneity in AD brains, additional work is needed to further validate and better understand the role of key drivers of these profiles using model organisms (e.g., mouse models). Therefore, future studies analyzing longitudinal data from mouse models and longitudinal CSF/plasma from large cohorts can certainly provide novel insights into the role of synaptic dysregulation in AD, and the possible implication in additional neurodegenerative diseases. Additional work is also needed to investigate the possible role that alpha-synuclein in AD, and its association with worse cognitive function. In addition, this work has promising implications in drug repositioning and staging biomarkers that may lead to identify novel therapeutic potentials for AD and therefore bring AD a step closer to precision medicine care and treatment.
Methods
Study Cohorts
Discovery (Knight ADRC)
Frozen postmortem parietal lobe tissue samples from Knight Alzheimer Disease Research Center (Knight ADRC) participants, collected with informed consent for research use, were provided by the Knight ADRC Neuropathology Core. Analysis of these samples was approved by the institutional review board of Washington University in St. Louis. Only sporadic AD participants and controls with transcriptomics, proteomics, and metabolomics data available were included in this study, resulting in 255 sporadic AD and 23 controls.
- RNA-seq
The data generation for the RNA-seq dataset has been previously described29–31. Briefly, total RNA was extracted from frozen parietal cortex tissue using a Tissue Lyser LT and purified using RNeasy Mini Kits (Qiagen). The Nanodrop 8000 (Thermo Scientific) and TapeStation 4200 (Agilent Technologies) were used to perform quality control of the RNA’s concentration and purity, and degradation. The RNA integrity number (RIN) was calculated using an RNA 6000 Pico assay on a Bioanalyzer 2100 and TapeStation 4200 (Agilent Technologies). The software determines the RINe on the Bioanalyzer and TapeStation taking into account the entire electrophoretic trace of the RNA, including the presence or absence of degradation products. The DV200 value is defined as the percentage of nucleotides greater than 200nt. All cDNA libraries were prepared using a TruSeq Stranded Total RNA Sample Prep with Ribo-Zero Gold kit (Illumina) and sequenced on an Illumina HiSeq 4000 using 2 x 151 paired-end reads at the McDonnell Genome Institute at Washington University in St. Louis. The average number of reads per sample was 43,195,080.
- Proteomics
Proteomic data from the Knight ADRC were generated on the SomaLogic platform (SomaLogic Operating Co., Inc., Boulder, CO) from frozen parietal cortical tissue samples (∼500mg) and CSF samples. The SomaLogic platform is a multiplexed, aptamer-based protein quantification platform141. The platform measured 1,305 aptamers in total. Full details of the proteomic data generation have been described previously32.
- Metabolomics
Metabolomic data from the Knight ADRC were generated on the Metabolon Precision MetabolomicsTM UPLC-MS/MS platform (Metabolon, Inc., Morrisville, USA), from 50mg frozen parietal cortical tissue samples. The platform measured 880 metabolites from nine <super pathways=: amino acids, carbohydrates, cofactors and vitamins, energy, lipids, nucleotides, peptides, xenobiotics, and partially characterized molecules. Metabolon provided structural identity annotations for 815 metabolites; the remaining 65 were excluded from the final dataset. Full details of the metabolomics data generation have been described previously33.
- Single-nuclei (snRNA-seq)
The data generation of snRNA-seq has been previously described in142, 143. Briefly, 74 frozen parietal tissues were processed according to the ‘Nuclei extraction and library preparation’ protocol described in142. In this protocol, the tissue was homogenized, and the nuclei were isolated using a density gradient. The nuclei were then sequenced using the 10X Chromium single cell Reagent Kit v3, with 10,000 cells per sample and 50,000 reads per cell for each of the 74 samples.
Replication (MSBB, BM36)
- RNA-seq
The RNA-seq raw data from The Mount Sinai Brain Bank (MSBB) study was downloaded from the Synapse portal (syn3157743). It is publicly available as part of the Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD). In brief, this data were generated and sequenced from RNA extracted from four postmortem brain regions, including frontal pole (BM10), superior temporal gyrus (BM22), parahippocampal gyrus (BM36), and inferior frontal gyrus tissue (BM44) from 313 subjects. RNA-seq libraries were prepared using the TruSeq RNA Sample Preparation Kit v2 (Illumina, San Diego, CA). The rRNAs were depleted using the Ribo-Zero rRNA Removal Kit (human/mouse/rat) (Illumina). Single-end non-standard reads of 101bp were generated by Illumina HiSeq 2500 (Illumina)35. The average number of reads per sample was 32,201,200. RNA-seq data from region parahippocampal gyrus (BM36) was selected for the cross-omics integration approach because of the availability of proteomics data. Data from the remaining regions were used to explore the identified profiles’ expression profiles to determine whether these are consistent across cortical brain regions.
- Proteomics (TMT)
The proteomics TMT data were obtained from the AMP-AD Synapse portal (syn25006650) processed by Johnson et al., 2022144. The raw proteomic TMT data is publicly available at AMP-AD and can be downloaded from Synapse portal (syn21347564). In short, before TMT labeling, the 198 samples were randomized by co-variates (protein quality, sample concentration, diagnosis, age, and sex) into 20 batches (10 cases per batch). The digested peptides were resuspended in 50 mM HEPES (pH 8.5) and labeled with the TMT 11-plex kit (Thermo Fisher) according to the manufacturer protocol. In each batch, GIS samples were labeled using TMT channel 126. All 11 channels were mixed equally, desalted with a 100 mg C18 Sep-Pak column (Waters) for the subsequent fractionation144, 145.
Replication (ROSMAP, DLPFC)
- RNA-seq
The RNA-seq raw data from The Religious Orders Study and Memory and Aging Project (ROSMAP) Study was downloaded from the Synapse portal (syn17008934) available from AMP-AD. Briefly, RNA-seq samples were extracted using Qiagen’s miRNeasy mini kit (cat. no. 217004) and the RNase-free DNase Set (cat. no. 79254) and quantified by Nanodrop. Agilent Bioanalyzer evaluated quality. Library preparation was performed by poly-A selection followed by first strand-specific cDNA synthesis, then dUTP for second strand-specific cDNA synthesis followed by fragmentation and Illumina adapter ligation for library construction. Paired-end read sequences with a length of 101bp were generated by Illumina HiSeq. Fifty-seven samples were submitted later to the platform and run on an updated protocol (batch 2 protocol). For consistency, we removed those samples from this cohort. The average number of reads per sample was 47,434,600. Only No Cognitive Impairment (NCI) and AD with NO other cause of cognitive impairment cases were used in this study. All cases with <unknown= BraakTau scores were discarded. Plaque density and Neurofibrillary tangle burden (PHF tau tangles) are a summary of a AD pathology derived from three AD pathologies: neuritic plaques, diffuse plaques, and neurofibrillary tangles based on five regions: midfrontal cortex, midtemporal cortex, inferior parietal cortex, entorhinal cortex, and hippocampus. Tangles are identified by molecularly specific immunohistochemistry based on the mean of eight regions: tangles_hip (hippocampus), tangles_ec (entorhinal cortex), tangles_mf (midfrontal cortex), tangles_it (inferior temporal), tangles_ag (angular gyrus), tangles_calc (calcarine cortex), tangles_cg (anterior cingulate cortex), angles_sf (superior frontal cortex).
- Metabolomics
Data from the Religious Orders Study and Memory and Aging Project (ROSMAP) were generated by the Duke Metabolomics and Proteomics Shared Resource, a member of the ADMC, using protocols published previously for blood samples146–148. A custom protocol developed for the brain samples can be found on Synapse at syn10235609. The dorsolateral prefrontal cortex (DLPFC) metabolomic data from the ROSMAP studies quantified on the Metabolon Precision Metabolomics platform and preprocessed by the ADMC as described in149 were downloaded from Synapse in July 2021 (syn25878459). The platform measured a total of 1,055 metabolites in the ROSMAP dataset.
Data processing
RNA-seq QC, alignment, and gene expression quantification
All RNA-seq datasets from the discovery and replication were processed and aligned using our in-house RNA-seq pipeline (https://github.com/HarariLab/RNA-seq-Pipeline). Genome reference and gene models were selected similar to the TOPmed pipeline (https://github.com/broadinstitute/gtex-pipeline/blob/master/TOPMed_RNAseq_pipeline.md). Reference genome GRCh38 and GENCODE 33 annotation, including the addition of ERCC spike-in annotations, were used. We excluded ALT, HLA, and Decoy contigs from the reference genome due to the lack of RNA-seq tools that properly handle these regions. Before alignment, the quality of raw read sequences for all libraries was assessed using FastQC (v0.11.9)150. All raw read sequences were then aligned to the human reference genome (GRCh38) using STAR (v.2.7.1a)151. The alignment quality was evaluated using sequencing metrics such as reads distribution, ribosomal content, or alignment quality provided by STAR using Picard tools (v.2.8.2)152. All samples that failed to pass the QC or were outliers (Knight ADRC=9, MSBB=18, ROSMAP=6) were removed from the downstream analyses. Raw read counts for transcripts and genes were generated using STAR and computed transcript/gene expression levels as normalized in FPKM (Fragments Per Kilobase of transcript per Million mapped reads) format.
Metabolomics QC and Quantification
Metabolomics data from Knight ADRC and ROSMAP were processed similarly. Full details of the QC process have been previously described153. Briefly, 188 metabolites with readings missing in over 20% of donors were excluded, readings were log10- transformed, means per metabolite were adjusted to zero, and outlier readings (outside 1.5xIQR) were excluded. Missing values for metabolites in < 20% of donors were replaced by their mean value. No samples were removed due to the missingness rate. Principal component analysis was performed with the FactoMineR R package154 to identify outlier samples; four outlier samples were excluded. The final dataset consisted of 627 metabolites. A similar procedure was used to clean the ROSMAP metabolomic dataset. Metabolites without assigned structural identities and metabolites with greater than 20% missing readings were excluded, readings were log10- transformed, the mean of each metabolite’s distribution was adjusted to zero, and outlier readings were removed. Two samples missing greater than 20% of readings were excluded. The final dataset consisted of 595 metabolites.
Proteomics QC and Quantification
Proteomic data for brain and CSF samples from the Knight ADRC were first filtered based on a limit of detection (LOD). Samples with greater than 15% outlier analytes (analyte readings below the LOD) were excluded. Analytes were then filtered based on scale factor difference, meaning the absolute value of the maximum difference between the calibration scale factor per aptamer and the median for each plate run. Analytes were excluded if their scale factor difference exceeded 0.5. Based on log10-transformation of protein readings, samples exceeding 1.5-fold of the interquartile range (IQR) per analyte were considered outliers. Analytes with greater than 15% outlier samples were excluded. Similarly, samples that were outliers for greater than 15% of analytes were excluded. Full details of the proteomic data QC have been described previously32. Missing analyte readings and removed outlier readings were imputed using the impute.knn function from the impute R package (v1.56.0)155. A plate-wise batch correction was performed using the ComBat function from the sva R package (v3.30.1)156. The final brain dataset consisted of 1,092 analytes, and the final CSF dataset consisted of 713 analytes. Before using this data, analytes with missing expressions in > 20% of donors were excluded, and missing expressions of those in < 20% were replaced with their mean value.
The processed data for TMT proteomics from MSBB BM36 was obtained from AMP-AD (syn25006647) and processed by Johnson et al.,2022144. In brief, 760 raw files generated from 19 TMT 11-plexes were analyzed using the Proteome Discoverer suite (version 2.3, Thermo Fisher Scientific). The UniProtKB human proteome database containing Swiss-Prot and TrEMBL human reference protein sequences was used to search mass spectra. Peptide spectral matches and peptides were filtered to a false discovery rate (FDR) of less than 1% using Percolator software. After spectral assignment, peptides were assembled into proteins and were further filtered based on the combined probabilities of their constituent peptides to a final FDR of 1%144. Unique and razor (that is, parsimonious) peptides were considered for quantification. TMT reporter abundance was transformed into a ratio followed by log2 transformation. Quality control analysis was performed, and outliers were removed. Similar to the Knight ADRC, analytes with missing expressions in > 20% of donors were excluded, and missing expressions of those in < 20% were replaced with their mean value.
Single-nuclei (snRNA-seq) QC, alignment, and gene expression quantification
Full details of the data process of snRNA-seq data have been previously described in 142, 143. In short, CellRanger (v2.1.1 10XGenomics)157 software was employed to align the sequences and quantify gene expression. Read sequences were aligned to a custom pre-mRNA reference constructed from genome assembly GRCh38 and generated as described by 10X Genomics technical manual. Filtering and QC analyses were performed individually using the Seurat package (v2.20 and 2.30) on each subject. All nuclei with high mitochondria gene expression were removed and only genes expressed in > 3 cells were kept for downstream analysis. Cells with < 1800 or > 8000 genes expressed in them were excluded. The data were normalized using the LogNormalize function that normalizes the gene expression measurements for each cell by the total expression, scales by a factor equal to the median counts of all genes, and log-transforms the expression.
Clustering of AD participants
To cluster AD participants, iClusterBayes (Integrative clustering of multiple genomic data types)34 available from iClusterPlus R package (v1.22.0)158 was employed. For each integration process, shared samples across omics datasets were extracted, and the top 2000 most variant features were selected. iClusterBayes was performed by fitting multiple Bayesian models using a different number of clusters ranging from 1 to 10. W ran 22,000 Markov Chain Monte Carlo (MCMC) iterations for each, of which the first 12,000 were discarded as burn-in. The prior probability for the indicator variable gamma of each data set was set to 0.1, whereas the posterior probability cutoff was set at 0.5. The standard deviation of the random walk proposal for the latent variable was set to 0.5. The remaining parameters were left with the default values.
Cell proportion analysis
CellMix R package (v1.6.2)159, employed multiple digital deconvolution methods, was used to estimate cellular population structure from gene expression data (TPM) generated by the Salmon quantification tool. The machine learning model and the gene panel (genes marker expressed highly in specific cell types) previously described in31. Four cell types, astrocytes, microglia, neurons, and oligodendrocytes, were tested using two algorithms, meanProfile159 and ssNMF (semi-supervised nonnegative matrix factorization)160. The significant difference in cell proportion between clusters was computed using a generalized linear model (GLM) test and corrected by sex and age of death (AOD).
Differential expression analysis of AD clusters
RNA-seq
differential expression (DE) analyses were performed to compare subjects in each cluster to the control subjects as well as other AD subjects using the DESeq2 R package (v.1.22.2)161. Our models were adjusted for sex, age at death, and the percentage of astrocytes and neurons. To enhance our confidence in the differentially expressed genes we discovered, lowly expressed genes were removed. Only genes with expression)>)0.5 CPM (count per million) in at least 25% of samples in either group being compared were retained for downstream analyses. The false discovery rate (FDR) was estimated using Benjamini-Hochberg (BH)162. All genes with FDR)<)0.05 were considered differentially expressed genes.
Proteomics
differential expression analyses between clusters were carried out with linear models (R function glm) with age at death and sex as covariates. P-values were adjusted using Benjamini-Hochberg (BH)162.
Metabolomics: differential abundance analyses were also carried out with linear models, with age at death, sex, and post-mortem interval (PMI) as covariates. P-values were adjusted using Benjamini-Hochberg (BH)162.
Single-nuclei
differential expression analyses were performed to compare the overexpression of each gene in each cell-type relative to other cell types using glmmTMB R Package (v1.1.3). This approach fits generalized linear mixed models using a template builder (glmmTMB). Our model was corrected by sex covariate. To ensure cell-type specific overexpression, for each cell type, only top genes whose effect size is greater than 90% percentile cutoff were retained for downstream integration analyses.
Survival analyses
Kaplan-Meier survival analyses and Cox proportional hazards models were used to determine the association of different clusters with survival outcomes and also to determine the association with the risk of dementia progression using survival (v3.2-3)163 and surviplot (v1.1.1)164 R packages.
Significance of overlap
The significance of overlap for all analyses was computed using Fisher’s exact test available from the R stats package (v4.3.0)165.
Pathway Analysis
All pathway analyses for transcriptomics and proteomics were performed using the EnrichR R package (v3.0)166 using genes/proteins significantly up-and downregulated between clusters vs. control and between distinct clusters. Two widely used databases, gene ontologies (GOs), related to molecular function, biological process, and cellular components, and KEGG databases were used. All pathways with an adjusted p-value < 0.05 were considered statistically significant. Similar pathways analyses were performed for single-nuclei (snRNA-seq) using EnrichR using significantly up-and downregulated genes in specific cell types. For metabolomics, pathways analyses were performed using MetaboAnalyst (v5.0) software167 using the metabolites that were increased- and decreased between clusters and the control and across clusters.
Replication analysis
Our replication strategy was designed to focus on the molecular profiles identified in the discovery analyses of the Knight ADRC cohort by extracting the significant genes and proteins identified in the Knight ADRC cohort. Using this set of features, iClusterBayes was run on both datasets (MSBB and ROSMAP) using the same parameters used with the discovery cohort.
NanoString gene expression analysis
The data generation and full details of the QC processes have been previously described95. Briefly, A53T αSyn-Tg mice were crossed with human APOE2, APOE3, or APOE4 knockin mice or Apoe knockout mice to generate A53T mice homozygous for one of the three human alleles or completely null for Apoe. Sample sizes and genotypes for NanoString gene expression analysis were selected on the basis of αSyn pathology and biochemistry results. RNA was isolated from 12-month old A53T/EKO (n=10), A53T/E2 (n=2), and A53T/E4 (n=10) mouse midbrain. 793 transcripts were quantified with the NanoString nCounter multiplexed target platform using a customized chip based on the Mouse Neuroinflammation panel (www.nanostring.com). The geometric mean of negative control lanes was subtracted from gene transcript counts, and gene expression data were normalized to the geometric mean of positive control lanes and a list of housekeeping genes95. QC analyses including principal components analysis were performed using nSolver 4.0 and the Advanced Analysis 2.0 plugin (NanoString). For differential expression analysis, pSyn pathology was expressed as low (< 5% coverage by IHC) or high (> 5% coverage by IHC) for each animal, and APOE genotype and pSyn pathology were selected as predictor covariates. Differentially expressed analyses were performed using linear regression models in R with sex as covariates. P-values were adjusted using Benjamini-Hochberg (BH)162.
Data availability
Transcriptomics data from the Knight ADRC are available by request from the National Institute on Aging Genetics of Alzheimer’s Disease Data Storage Site (NIAGADS) with accession number ng00083 (https://www.niagads.org/datasets/ng00083). Proteomics data from the Knight ADRC donors are available at the NIAGADS Knight ADRC collection with accession number ng00102 (https://www.niagads.org/datasets/ng00102). Metabolomics data from the Knight ADRC are available at the NIAGADS Knight ADRC collection with accession number NG00113 (https://dss.niagads.org/datasets/ng00113/). The single nucleus data from the Knight ADRC are available at NIAGADS Knight ADRC collection with accession number NG00108 (https://dss.niagads.org/datasets/ng00108/). MSBB transcriptomics and proteomics data are publicly available at Synapse under Synapse IDs syn3157743 (https://www.synapse.org/#!Synapse:syn3157743) and syn25006650 (https://www.synapse.org/#!Synapse:syn25006650) respectively. ROSMAP transcriptomics and metabolomics data are publicly available at Synapse under Synapse IDs syn17008934 (https://www.synapse.org/#!Synapse:syn17008934) and syn25878459 (https://www.synapse.org/#!Synapse:syn25878459) respectively.
Code availability
The analysis code used for this study will be available at the time pf publication.
Data Availability
Transcriptomics data from the Knight ADRC are available by request from the National Institute on Aging Genetics of Alzheimer's Disease Data Storage Site (NIAGADS) with accession number ng00083 (https://www.niagads.org/datasets/ng00083). Proteomics data from the Knight ADRC donors are available at the NIAGADS Knight ADRC collection with accession number ng00102 (https://www.niagads.org/datasets/ng00102). Metabolomics data from the Knight ADRC are available at the NIAGADS Knight ADRC collection with accession number NG00113 (https://dss.niagads.org/datasets/ng00113/). The single nucleus data from the Knight ADRC are available at NIAGADS Knight ADRC collection with accession number NG00108 (https://dss.niagads.org/datasets/ng00108/). MSBB transcriptomics and proteomics data are publicly available at Synapse under Synapse IDs syn3157743 (https://www.synapse.org/#!Synapse:syn3157743) and syn25006650 (https://www.synapse.org/#!Synapse:syn25006650) respectively. ROSMAP transcriptomics and metabolomics data are publicly available at Synapse under Synapse IDs syn17008934 (https://www.synapse.org/#!Synapse:syn17008934) and syn25878459 (https://www.synapse.org/#!Synapse:syn25878459) respectively.
https://www.niagads.org/datasets/ng00083
https://www.niagads.org/datasets/ng00102
https://dss.niagads.org/datasets/ng00113
https://dss.niagads.org/datasets/ng00108
https://www.synapse.org/#!Synapse:syn3157743
https://www.synapse.org/#!Synapse:syn25006650
ROSMAP (RNA-seq)
Study data were provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago. Data collection was supported through funding by NIA grants P30AG10161 (ROS), R01AG15819 (ROSMAP; genomics and RNAseq), R01AG17917 (MAP), R01AG30146, R01AG36042 (5hC methylation, ATACseq), RC2AG036547 (H3K9Ac), R01AG36836 (RNAseq), R01AG48015 (monocyte RNAseq) RF1AG57473 (single nucleus RNAseq), U01AG32984 (genomic and whole exome sequencing), U01AG46152 (ROSMAP AMP-AD, targeted proteomics), U01AG46161(TMT proteomics), U01AG61356 (whole genome sequencing, targeted proteomics, ROSMAP AMP-AD), the Illinois Department of Public Health (ROSMAP), and the Translational Genomics Research Institute (genomic). Additional phenotypic data can be requested at www.radc.rush.edu. Study data were provided through NIA grant 3R01AG046171-02S2 awarded to Rima Kaddurah-Daouk at Duke University, based on specimens provided by the Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, where data collection was supported through funding by NIA grants P30AG10161, R01AG15819, R01AG17917, R01AG30146, R01AG36836, U01AG32984, U01AG46152, the Illinois Department of Public Health, and the Translational Genomics Research Institute.
Alzheimer’s Disease Metabolomics Consortium (ADMC)
The results published here are in whole or partly based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.org). Metabolomics data is provided by the Alzheimer’s Disease Metabolomics Consortium (ADMC) and funded wholly or in part by the following grants and supplements: NIA R01AG046171, RF1AG051550, 3U01AG024904-09S4, RF1AG057452, R01AG059093, RF1AG058942, U01AG061359, U19AG063744, and FNIH: #DAOU16AMPA awarded to Dr. Kaddurah-Daouk at Duke University in partnership with a large number of academic institutions. As such, the investigators within the ADMC, not listed specifically in this publication’s author’s list, provided data along with its pre-processing and prepared it for analysis but did not participate in the analysis or writing of this manuscript. A complete listing of ADMC investigators can be found at: https://sites.duke.edu/adnimetab/team/.
MSBB
The results published here are in whole or in part based on data obtained from the AD Knowledge Portal (https://adknowledgeportal.org/). These data were generated from postmortem brain tissue collected through the Mount Sinai VA Medical Center Brain Bank and were provided by Dr. Eric Schadt from Mount Sinai School of Medicine. Proteomics data were provided by Dr. Levey from Emory University based on postmortem brain tissue collected through the Mount Sinai VA Medical Center Brain Bank provided by Dr. Eric Schadt from Mount Sinai School of Medicine.
Competing interests
The authors declare no competing interests.
Supplementary information
Supplementary Figures
Supplementary Tables 1-9
Acknowledgements
We thank all the participants and their families, as well as the many involved institutions and their staff, whose help and participation made this work possible. Research reported in this work was supported by the Knight Alzheimer Disease Research Center at Washington University School of Medicine through the National Institute on Aging (NIA: grant no. P30 AG066444 - JCM), Healthy Aging and Senile Dementia (HASD: grant no. P01AG003991 - JCM), and Antecedent Biomarkers for Alzheimer Disease: The Adult Children Study (ACS: grant no. P01AG026276 - JCM). This work was supported by grants from the National Institutes of Health: NIA R01AG057777 (OH), K01AG046374 (CMK), R56AG067764 (OH), the Chan Zuckerberg Initiative (CMK). O.H. is an Archer Foundation Research Scientist. A.M.E. is a scholar recipient of the Knight ADRC Research Education Component (NIA: grant no. P30 AG066444). This work was supported by access to equipment made possible by the Hope Center for Neurological Disorders, and the Departments of Neurology and Psychiatry at Washington University School of Medicine.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.
- 10.
- 11.↵
- 12.↵
- 13.
- 14.
- 15.
- 16.
- 17.↵
- 18.↵
- 19.↵
- 20.
- 21.
- 22.
- 23.
- 24.
- 25.
- 26.↵
- 27.
- 28.↵
- 29.↵
- 30.
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.
- 41.↵
- 42.
- 43.↵
- 44.
- 45.
- 46.↵
- 47.↵
- 48.
- 49.
- 50.↵
- 51.↵
- 52.↵
- 53.
- 54.
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.
- 65.
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.↵
- 78.↵
- 79.
- 80.↵
- 81.↵
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.
- 87.↵
- 88.↵
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.↵
- 95.↵
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.↵
- 101.↵
- 102.
- 103.
- 104.
- 105.
- 106.
- 107.
- 108.
- 109.↵
- 110.↵
- 111.
- 112.
- 113.↵
- 114.
- 115.
- 116.
- 117.
- 118.
- 119.
- 120.
- 121.↵
- 122.↵
- 123.
- 124.
- 125.↵
- 126.↵
- 127.↵
- 128.↵
- 129.↵
- 130.
- 131.
- 132.
- 133.
- 134.↵
- 135.↵
- 136.↵
- 137.↵
- 138.↵
- 139.↵
- 140.↵
- 141.↵
- 142.↵
- 143.↵
- 144.↵
- 145.↵
- 146.↵
- 147.
- 148.↵
- 149.↵
- 150.↵
- 151.↵
- 152.↵
- 153.↵
- 154.↵
- 155.↵
- 156.↵
- 157.↵
- 158.↵
- 159.↵
- 160.↵
- 161.↵
- 162.↵
- 163.↵
- 164.↵
- 165.↵
- 166.↵
- 167.↵