Abstract
Frontotemporal dementia (FTD) is a heterogeneous neurodegenerative disorder characterized by frontal and temporal lobe atrophy, typically manifesting with behavioural or language impairment. Because of its heterogeneity and lack of available diagnostic laboratory tests there can be a substantial delay in diagnosis. Cell-free, circulating, microRNAs are increasingly investigated as biomarkers for neurodegeneration, but their value in FTD is not yet established. In this study, we investigate microRNAs as biomarkers for FTD diagnosis. We performed next generation small RNA sequencing on cell-free plasma from 52 FTD cases and 21 controls. The analysis revealed the diagnostic importance of 20 circulating endogenous miRNAs in distinguishing FTD cases from controls. The study was repeated in an independent second cohort of 117 FTD cases and 35 controls. The combinatorial microRNA signature from the first cohort, precisely diagnosed FTD samples in a second cohort. To further increase the generalizability of the prediction, we implemented machine learning techniques in a merged dataset of the two cohorts, which resulted in a comparable or improved classification precision with a smaller panel of miRNA classifiers. In addition, there are intriguing molecular commonalities with cell free miRNA signature in ALS, a motor neuron disease that resides on a pathological continuum with FTD. However, the signature that describes the ALS-FTD spectrum is not shared with blood miRNA profiles of patients with multiple sclerosis. Thus, microRNAs are promising FTD biomarkers that might enable earlier detection of FTD and improve accurate identification of patients for clinical trials
Introduction
Frontotemporal dementia (FTD) is a clinically and neuroanatomically heterogeneous neurodegenerative disorder characterized by frontal and temporal lobe atrophy. It typically manifests between the ages of 50 and 70 with behavioral or language problems, and below the age of 65 is the second most common form of dementia, after Alzheimer’s disease (1).
Due to heterogeneity in clinical presentation FTD can be difficult to diagnose (2). Three main phenotypes are described: behavioral variant frontotemporal dementia (bvFTD), characterized by changes in social behaviour and conduct, semantic dementia (SD), characterized by the loss of semantic knowledge, leading to impaired word comprehension, and progressive non-fluent aphasia (PNFA), characterized by progressive difficulties in speech production (2, 3).
FTD is also pathologically heterogeneous with inclusions seen containing hyperphosphorylated tau (4), TDP-43 (5), or fused in sarcoma (FUS) (6, 7). Mutations in the genes encoding for these proteins, as well as in other genes such as progranulin (GRN), chromosome 9 open reading frame 72 (C9ORF72), valosin-containing protein (VCP), TANK-binding kinase 1 (TBK1) and charged multivesicular body protein 2B (CHMP2B) are also associated with FTD (8-11).
FTD overlaps clinically, pathologically and genetically with several other degenerative disorders. In particular, there is often overlap with amyotrophic lateral sclerosis (ALS): one in 5 ALS patients meets the clinical criteria for a concomitant diagnosis of FTD, and one in eight FTD patients is also diagnosed with ALS. TDP-43 inclusions are observed in the brains of both people with FTD and ALS, and genetic evidence supports that these diseases reside along a continuum (5, 12-14).
Previous studies have aimed to develop cell-free biomarkers for FTD, including TDP-43 (15), tau (16), and neurofilament light chain (NfL) (17), but none of these have shown use for diagnosis. microRNAs (miRNAs), endogenous non-coding RNAs, can be quantified in biofluids (18), and have been shown previously to be dysregulated in amyotrophic lateral sclerosis (ALS) and in FTD (19). Furthermore, they may be biomarkers of disease progression in other brain diseases, including ALS (20). Previous studies have assessed the initial potential of microRNAs as diagnostic FTD biomarkers including miRNA analysis in plasma (21, 22), CSF and serum (23), and CSF exosomes (24) but no definitive markers have so far been found. We therefore aimed to study a large cohort of patients with different clinical phenotypes and pathological forms of FTD, to see whether they are able to reliably distinguish cases from controls, and different forms of FTD from each other.
Here, we provide an unbiased signature of plasma miRNAs that has good diagnostic power in a large and heterogeneous cohort of patients with FTD, which is further predictive in an independent second cohort and may contribute to FTD subtyping. Therefore, circulating miRNAs hold a fascinating potential as diagnostic biomarkers and as means for patient stratification in clinical trials.
Results
A plasma miRNA classifier for FTD
In order to characterize the potential of plasma miRNAs as biomarkers for FTD we assembled a cohort of 73 plasma samples (subject information in Table 1), purified RNA and performed next generation sequencing (NGS). As many as 2313 individual miRNA species were aligned to the human genome (GRCh37/hg19) across all samples. However, only 137 miRNA species exceeded a cut-off of ≥100 UMIs per sample averaged on all samples. Of the 137 detected miRNAs, 20 miRNA changed in a statistically significant manner in FTD plasma relative to control (p-value < 0.05, Wald test; Figure 1A). Two miRNAs, whose levels decreased to the greatest extent in FTD compared to controls, namely, miR-379-5p and miR-654-3p (1.4 fold), remained significant after multiple hypothesis testing (Figure 1B).
We next studied miRNA capacity as binary disease classifiers, by generating receiver-operating characteristic (ROC) curves. ROC area under the curve (AUC) suggested modest predictive capacity for miR-379-5p and for miR-654-3p (AUC for both: 0.71±0.07, p<0.01; Figure 1C).
We further utilized the combinatorial signature of the 20 miRNAs that were differentially expressed between FTD patient plasma and control (Table S1). Using these, an ROC AUC of 0.79±0.05 (p<0.0001, Fig. 1C) was found, which was superior to the prediction capacity of any individual miRNA.
Assessment of plasma miRNA classifier for FTD in a second cohort
We then performed a replication study in an independent cohort of 117 FTD cases and 35 age-matched controls (Table 1). In this study, the levels of 58 miRNAs decreased and 89 increased in a statistically significant manner in FTD, relative to control plasma (p-value <0.05, Wald test, Figure 2A and Table S1). Noticeable miRNAs were miR-125b-2-3p (× 26 up, p = 1.4×10−25), miR-34b-5p (× 23 up, adj. p = 9.8×10−23) and miR-379-5p (× 2.2 down, p = 1.9×10−14). 144 of the 147 miRNAs further survived adjustment for multiple comparisons by Benjamini–Hochberg procedure (adjusted p-value < 0.05).
The expression of the 20 miRNAs that were most differentially expressed in the first cohort correlated with their respective expression in the second cohort (Pearson R of log 2 fold-change = 0.75, p=0.0001, Fig. 2B). Furthermore, the combined predictive power of the 20 miRNAs, that were decided on as a classifier based on data of the first cohort, was slightly superior in the replication cohort, with an AUC of 0.82±0.04 (p<0.0001, Fig. 2C).
In addition to external validation, by testing a second cohort, we sought to guarantee the generalizability by applying K-fold cross-validation, which is an internal validation technique to evaluate performance and prevent overfitting (25, 26). Towards this we divided the 225 datasets (from 56 control and 169 FTD samples) randomly into three equal parts, or ‘folds’, of 75 datasets, each. A machine learning model was trained using each time 2 of the 3 data folds (150 samples) for building a prediction model and applying the prediction rule to estimate the prediction precision on the remaining 75 samples in the remaining third fold. This step was repeated k = 3 times iteratively so all folds were used twice in training and once for the testingprocess. 136 miRNAs that were measured above noise levels in all cohorts were included, yielding the following AUCs: 0.90 for fold 1; 0.87 for fold 2; and 0.93 for fold 3, with an average AUC of 0.90 (Fig. 2D).
We next aimed to reduce the complexity of the measurements required for prediction by identifying the top 20 miRNA predictors per fold, i.e. the 20 miRNAs with the highest weighted importance in predicting disease status (Fig. 3A-C). We reduced the number of miRNAs gradually, starting from a 43 miRNA panel composed of the top 20 predictors in at least one fold (i.e., in one, two or three folds), which resulted in AUCs of 0.87, 0.87, 0.94 and an average AUC of 0.89 (Fig. 3D). We then utilized 13 miRNAs that were among the top 20 in at least two folds which resulted in AUCs of 0.85, 0.89 and 0.93, and an average of 0.89 (Fig. 3E). Finally, we used only four miRNAs - miR-26a-5p, miR-326, miR-203a-3p and miR-629-5p – that were among the top 20 predictors in all three folds. Their combinatorial AUCs after cross-validation were 0.81, 0.83 and 0.89 and 0.85 on average (Fig. 3F). All panels of miRNAs used for the cross-validation are listed in Table S1.
These measurements were comparable to the AUC obtained with 136 miRNAs (Fig. 2D), revealing that the diagnostic power was not compromised by a substantial reduction of the miRNA numbers.
Overlap between FTD miRNA signature and ALS miRNA signature
FTD and ALS are two diseases on a neuropathology continuum. We aimed to determine whether the miRNA signatures found in FTD and in ALS reveal any molecular similarity. For this purpose, we sequenced and analyzed the differences between 115 ALS cases and 103 controls (see Table 2). We also sequenced 17 samples from patients with multiple sclerosis (MS), because this disease is mechanistically different from FTD and involves autoimmune-related demyelination, so molecular similarity to FTD is not expected to be seen. 161 miRNA species were differentially expressed in either one of the diseases (FTD, ALS or MS) vs controls. Differentially expressed miRNAs in either FTD or ALS were correlated in fold-change values between the diseases (Pearson R for log-transformed values = 0.35, p<0.0001, Figure 4A), but no such correlation was found between FTD and MS (R= −0.15, p=0.15, Figure 4B). Intriguingly, muscle-specific miR-206 robustly increases in ALS, in agreement with previous reports (27-30) with no change at all in FTD.
We next tested the degree of overlap between miRNAs differentially expressed in FTD vs. ALS. Seven out of 20 miRNAs changed exclusively in FTD, and the remaining 13 miRNAs changed in a significant manner in both FTD and ALS (Figure 4C; Table S1). Remarkably, the directionality of change for these miRNAs (increase/decrease) was consistent across diseases for all of the miRNAs but one, miR-29a-3p which decreased in FTD and increased in ALS (Figure 4A). Moreover, the fold-change values in this subset of 13 miRNAs that have changed in both ALS and FTD, were highly correlated between the diseases (Pearson R = 0.90, p<0.0001). In contrast, only five out of the 20 miRNAs that changed in FTD, also changed in MS (Figure 4D; Table S1). Taken together, the miRNA signature in FTD plasma shows a similarity to the ALS plasma signature, but not to the MS signature, in accordance with pathological and clinical similarities between FTD and ALS.
Finally, we employed the FTD predictor, based on 20 miRNAs that are changing in FTD on ALS and healthy control cohorts. The signature of 20 miRNAs was able to correctly call ALS from controls more than at random (ROC AUC = 0.63, p<0.001, Table S2), while the seven miRNAs that are exclusively changed in FTD were not able to distinguish between ALS and control in a statistically significant manner (ROC AUC = 0.57, p=0.06). Thus, miRNAs that are differentially expressed in FTD have a moderate capacity to predict ALS.
miRNAs signature of FTD subtypes and FTD patients with different pathologies
We next tested whether specific miRNAs changed in the main FTD subtypes, bvFTD, SD and PNFA. After statistical adjustment for multiple comparisons, four miRNAs decreased in a significant manner in PNFA, and two miRNAs decreased and one miRNA increased significantly in bvFTD, whereas the small SD sample numbers (n=8) did not allow to depict microRNAs that are changed in a significant manner after adjustment for multiple comparisons (Fig. S1A-C).
We calculated a decent combinatorial predictive power for the 20 miRNAs in distinguishing bvFTD / SD / PNFA from healthy controls: thus, for bvFTD vs. healthy controls in the original cohort we obtained an AUC of 0.85±0.06, p<0.0001; in the replication cohort AUC of 0.80±0.05, p<0.0001, Fig. S1D; for SD vs. controls, original cohort AUC was 0.86±0.08, p=0.003; replication cohort AUC was 0.79±0.06, p=0.0003 (Fig. S1E); for PNFA vs. controls, original cohort AUC was 0.81±0.08, p=0.002; replication cohort AUC was 0.81±0.05, p<0.0001 (Fig. S1F). We concluded that the combinatorial 20 miRNAs signature distinguishes FTD and its subtypes from controls with comparable AUCs, for all three subtypes.
The overlap of symptoms between subtypes of FTD poses a diagnostic challenge (31). We therefore tested whether FTD subtypes could be distinguished based on miRNA signature. We analyzed miRNA differential expression of PNFA cases vs. non-PNFA, which pooled together bvFTD and SD cases, due to a similar molecular signature of SD and bvFTD. Fourteen miRNAs changed in a significant manner in PNFA vs non-PNFA: miR-625-3p, miR-625-5p, miR-126-5p, miR-146a-5p, miR-146b-5p, miR-340-5p, miR-181a-5p (all increased in PNFA compared to non-PNFA) and miR-342-3p, let-7d-3p, miR-122-5p, miR-192-5p, miR-16-5p, miR-203a-3p (decreased; Fig. S1G). The combinatorial signature of these fourteen miRNAs yielded an AUC of 0.81±0.08 (Fig. S1H; p=0.0007), indicating that PNFA can be differentiated from other types of FTD with a high accuracy.
We also tested whether specific miRNAs changed between FTD cases with different likely underlying pathologies, i.e. tau and TDP-43. 19 FTD cases with predicted Tau pathology based on genetics (4 in cohort I + 15 in cohort II) were compared to 63 cases with predicted TDP-43 pathology (23 in cohort I + 40 in cohort II). Fourteen miRNAs changed in a statistically significant manner, but none remained significant after correction for multiple hypotheses (Fig. S2A). The combinatorial signature of these 14 miRNAs had a weak classification power, though it was statistically significant (AUC of ROC = 0.7±0.06, p=0.009, Fig. S2B). Taken together, the miRNA profile in our dataset has limited diagnostic power for pathological subtypes of FTD, as opposed to FTD vs control and different clinical subtypes of FTD.
Discussion
Our study utilizes a large cohort of FTD blood samples. It is the first work that employs next generation sequencing technology for FTD biomarkers. We defined a signature, composed of 20 miRNAs, that is able to classify FTD. This signature that was discovered in an initial cohort was informative when applied to a second cohort. These observations suggest that miRNAs can be potentially utilized in clinical sampling as diagnostic FTD markers, which is needed because of non-specific early symptoms and overlap with other degenerative and non-degenerative conditions. Ours is the largest cohort used for miRNA profile, and its use of unbiased exhaustive next generation sequencing can potentially explain the discrepancies from past studies with smaller cohorts and biased miRNA choices (21-24).
A classifier panel of 20 miRNAs had ∼80% chance to correctly call FTD in the first cohort. Reassuringly, it was comparably informative in calling FTD correctly also on a second cohort. In addition to external (second cohort) validation, we applied machine learning to the whole dataset of 225 samples. Through iterative learning, we defined a signature created by 136 miRNAs that was able to call FTD correctly in 90% of cases. We then reduced the signature complexity to the usage of only 43 miRNAs with the highest classification power that kept a true FTD calling capacity of 90%. Toward clinical diagnostic usage it is warranted to test the predictor that was developed in machine learning on an independent cohort, preferentially of different ethnicity.
Interestingly, the miRNA signatures of FTD is akin of ALS perhaps reflecting on a shared patho-mechanism for these two neurodegenerative disorders on the ALS-FTD continuum. This similarity cannot be extended to multiple sclerosis, a disease that is driven by a different, autoimmune, mechanism. Nonetheless, the two diseases are still two different entities and accordingly only 10% of the miRNAs that has changed in either disease were shared.
In summary, we have characterized a large FTD plasma cohort for miRNA expression by next generation sequencing and found specific patterns of changes that can contribute to diagnosis of FTD. These patterns seem to involve the ALS-FTD continuum, alluding to differences and commonalities in the underlying mechanisms that drive molecular changes in ALS and FTD.
Materials and Methods
Standard protocol approvals, registrations, and patient consents
Approvals were obtained from the local research ethics committee and all participants provided written consent (or gave verbal permission for a carer to sign on their behalf). For ALS samples, recruitment, sampling procedures and data collection have been performed according to Protocol (Protocol number 001, version 5.0 Final – 30th November 2015).
Study design
Based on power analysis, we found that about 20 control and 50 FTD samples are required to obtain an ROC of 0.7 with a power of 80% and a p-value of 0.05. We determined the sample size based on these calculations. Because sample processing was done in different batches, samples were randomly allocated to the batches and within each batch, the number of control and FTD/ALS/MS samples was balanced in order to reduce batch-associated bias.
Participants and sampling
Participants were enrolled in the longitudinal FTD cohort studies at UCL. Frozen plasma samples from the UCL FTD Biobank were shipped to the Weizmann Institute of Science for molecular analysis. Study cohort I: 52 FTD patients, 21 healthy controls. Study cohort II: 117 FTD patients, 35 healthy controls. FTD patients were further assigned into two groups with predicted pathology of TDP-43 or tau, based on genetics and clinical phenotype. Patients positive for C9ORF72 repeats and progranulin (PRGN) mutations and/or presented with semantic dementia, were predicted to have TDP-43 pathology, while patients with MAPT mutations were predicted to have tau pathology. Demographic data are detailed in table 1.
ALS and MS samples and their respective healthy controls (N = 115, 17 and 103, respectively) were obtained from the ALS biomarker study. ALS patients were diagnosed according to standard criteria by experienced ALS neurologists (32). Healthy controls were typically spouses or relatives of patients. Demographic data are detailed in table 2.
Plasma samples were stored in −80° C until RNA extraction and subsequent small RNA next generation sequencing.
Small RNA next generation sequencing
Total RNA was extracted from plasma using the miRNeasy micro kit (Qiagen, Hilden, Germany) and quantified with Qubit fluorometer using RNA broad range (BR) assay kit (Thermo Fisher Scientific, Waltham, MA). For small RNA next generation sequencing (NGS), libraries were prepared from 7.5 ng of total RNA using the QIAseq miRNA Library Kit and QIAseq miRNA NGS 48 Index IL (Qiagen), by an experimenter who was blinded to the identity of samples. Following 3’ and 5’ adapter ligation, small RNA was reverse transcribed, using unique molecular identifier (UMI), primers of random 12-nucleotide sequences. This way, precise linear quantification miRNA is achieved, overcoming potential PCR-induced biases (18). cDNA libraries were amplified by PCR for 22 cycles, with a 3’ primer that includes a 6-nucleotide unique index. Following size selection and cleaning of libraries with magnetic beads, quality control was performed by measuring library concentration with Qubit fluorometer using dsDNA high sensitivity (HS) assay kit (Thermo Fisher Scientific, Waltham, MA) and confirming library size with Tapestation D1000 (Agilent). Libraries with different indices were multiplexed and sequenced on a single NextSeq 500/550 v2 flow cell (Illumina), with 75bp single read and 6bp index read. Fastq files were demultiplexed using the User-friendly Transcriptome Analysis Pipeline (UTAP) developed at the Weizmann Institute (33). Sequences were mapped to the human genome using Qiagen GeneGlobe analysis web tool.
Statistical analysis and machine learning
Plasma samples with ≥40,000 total miRNA UMIs were included. miRNA with average abundance of ≥100 UMIs per sample, across all samples, were considered above noise levels. miRNA NGS data was analyzed via DESeq2 package in R Project for Statistical Computing (34, 35), under the assumption that miRNA counts followed negative binomial distribution and data were corrected for library preparation batch in order to reduce its potential bias. Ratio of normalized FTD counts to the normalized control counts presented after logarithmic transformation on base 2. P values were calculated by Wald test (35, 36) and adjusted for multiple testing according to Benjamini and Hochberg (37). For binary classification by miRNAs, receiver operating characteristic (ROC) curves for individual miRNAs or combinations of miRNAs were plotted based on voom transformation of gene expression data in R (38). Graphs were generated with GraphPad Prism 5.
Machine learning was performed on Python 3.6. Cohorts were merged and case-control number imbalance was mitigated by applying ADASYN algorithm (https://imbalanced-learn.readthedocs.io/en/stable/api.html), which simulates synthetic new healthy sample data from the existing data. Then, K-Fold cross validation was performed on the pooled data set with K=3. An ROC was generated for each of the three folds and individual and mean AUCs were calculated.
Data Availability
Source data are available for figures 1-4
Funding
EH was supported by the ISF Legacy 828/17 grant, Target ALS 118945 grant, European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement n° 617351. Israel Science Foundation, the ALS-Therapy Alliance, AFM Telethon 20576 grant, Motor Neuron Disease Association (UK), The Thierry Latran Foundation for ALS research, ERA-Net for Research Programmes on Rare Diseases (FP7), A. Alfred Taubman through IsrALS, Yeda-Sela, Yeda-CEO, Israel Ministry of Trade and Industry, Y. Leon Benoziyo Institute for Molecular Medicine, Benoziyo Center Neurological Disease, Kekst Family Institute for Medical Genetics, David and Fela Shapell Family Center for Genetic Disorders Research, Crown Human Genome Center, Nathan, Shirley, Philip and Charlene Vener New Scientist Fund, Julius and Ray Charlestein Foundation, Fraida Foundation, Wolfson Family Charitable Trust, Adelis Foundation, MERCK (UK), Maria Halphen, Estates of Fannie Sherr, Lola Asseof, Lilly Fulop, E. and J. Moravitz. Teva Pharmaceutical Industries Ltd. as part of the Israeli National Network of Excellence in Neuroscience (NNE) postdoc Fellowship to IM 117941. The Dementia Research Centre is supported by Alzheimer’s Research UK, Brain Research Trust, and The Wolfson Foundation. This work was supported by the NIHR Queen Square Dementia Biomedical Research Unit, the NIHR UCL/H Biomedical Research Centre and the Leonard Wolfson Experimental Neurology Centre (LWENC) Clinical Research Facility as well as an Alzheimer’s Society grant (AS-PG-16-007). JDR is supported by an MRC Clinician Scientist Fellowship (MR/M008525/1) and has received funding from the NIHR Rare Disease Translational Research Collaboration (BRC149/NS/MH). PF is supported by an MRC/MND LEW Fellowship and by the NIHR UCLH BRC. This work was also supported by the Motor Neuron Disease Association (MNDA) 839-791
Competing interests
The authors state that they have no competing interests.
Acknowledgements
We thank Vittoria Lombardi (UCL) for technical assistance. We acknowledge patients with FTD, ALS, MS and healthy volunteers for their contribution and ALS biomarkers study co-workers for biobanking, which has made this study possible (REC 09/H0703/27). We also thank the the North Thames Local Research Network (LCRN) for its support. EH is the Mondry Family Professorial Chair and Head of the Nella and Leon Benoziyo Center for Neurological Diseases.