Abstract
The number of people affected by Type 2 Diabetes Mellitus (T2DM) is close to half a billion and is on a sharp rise, representing a major and growing public health burden. As the case for many other complex diseases, early diagnosis is key to prevent irreversible end-organ damages. However, given its mild initial symptoms, T2DM is often diagnosed several years after its onset, leaving half of diabetic individuals undiagnosed. While several classical clinical and genetic biomarkers have been identified, improving early diagnosis by exploring other kinds of omics data remains crucial. In this study, we have combined longitudinal data from two population-based cohorts CoLaus and DESIR (comprising in total 493 incident cases vs 1’360 controls) to identify new or confirm previously implicated metabolomic biomarkers predicting T2DM incidence more than five years ahead of clinical diagnosis. Our longitudinal data have shown robust evidence for valine, leucine, carnitine and glutamic acid being predictive of future conversion to T2DM, and also confirmed to be causal by 2-sample Mendelian randomisation (based on independent data). Interestingly, for valine and leucine a strong reverse causal effect was detected, indicating that the genetic predisposition to T2DM may trigger early changes of these metabolites, which appear well-before any clinical symptoms. These findings indicate that molecular traits linked to the genetic basis of T2DM may be particularly promising early biomarkers.
1. Introduction
Type 2 Diabetes Mellitus (T2DM) is a major public health concern and its prevalence is increasing. Almost 500 million individuals are currently affected worldwide by diabetes and almost 700 million may be affected by 2045 [https://www.diabetesatlas.org/en/sections/worldwide-toll-of-diabetes.html]. Diabetes remains among the leading causes of cardiovascular disease, blindness, kidney failure, and lower-limb amputation. By the time T2DM is diagnosed, many individuals have already established end-organ damage including neuropathy, kidney failure and/or premature cardiac or brain atherosclerosis. Diabetes and pre-diabetes are diagnosed by routinely assessed clinical markers (glycaemia and HbA1c levels) above a given threshold. Still, agreement between the different markers in diagnosing T2DM is not optimal [1], and their screening capacity for pre-diabetes is low [2]. While these markers are powerful predictors of the disease, they are far from perfect for the identification of individuals who are prone to develop T2DM. Early detection of individuals with high T2DM predisposition is important as non-pharmacological approaches (i.e. lifestyle changes) can reduce substantially (and at a reduced cost) the risk of developing T2DM [3]. Thus, the use of predictive biological tests or clinical scores enabling a more precise identification of individuals at risk is a pressing need so that preventive measures can be applied to limit the increase of T2DM prevalence world-wide. Several predictive scores using clinical data and/or biological markers have been developed (e.g. [4]), but their performance (AUC ∼0.75) is far from optimal. Over 90% positive- and negative predictive value would be clinically relevant, which is achievable for some diseases [5]. Recently, polygenic scores based on cross-sectional T2DM case-control status have been explored and may be of great value to enhance the predictive capacity of the disease onset and to better understand the clinical heterogeneity of T2DM [6]. Since many metabolites and proteins are expected to be altered in pre-diabetic state, using different omics profiles in addition to the classical clinical, biological and genetic risk factors is expected to increase the prediction accuracy.
Several cross-sectional and longitudinal metabolomic studies, focussing on blood samples using targeted approaches, have been initiated to identify candidate biomarkers of pre-diabetes, with a few exceptions employing untargeted approaches (e.g. a cross-sectional study of 115 T2DM individuals [7]). In the population-based cooperative health research of Augsburg (KORA), 140 metabolites were quantified for 4,297 participants and several metabolites altered in pre-diabetic individuals have been identified [8]. Using metabolite-protein network and targeted approaches on serum samples, Wang-Sattler et al. identified seven T2DM-related genes associated with these metabolites by multiple interactions with four enzymes. Lysophosphatidylcholine (18:2) and glycine were strong predictors of glucose intolerance, even 7 years before disease onset. These metabolites, in addition to sugar metabolites, acylcarnitines and other aminoacids, have been identified as predictors of T2DM also in the European Prospective Investigation into Cancer and nutrition cohort [9]. More recently, Padberg et al. described a metabolic signature that includes glyoxylate associated with T2DM and prediabetic individuals [10]. Wang et al, using longitudinal data on 201 incident T2DM cases, identified a signature of five branched-chain and aromatic metabolites for which individuals in the top quartile exhibited a five-fold higher risk to develop T2DM [11]. Particularly, a combination of three amino acids predicted future T2DM, with a more than five-fold higher risk for individuals in top quartile, suggesting that amino acid profiles could aid in diabetes risk assessment. These results were confirmed by a recent meta-analysis from 8 prospective studies on 8’000 individuals, which found a higher risk of T2DM for isoleucine, leucine, valine and phenylalanine [12].
By targeting serum carnitine metabolites on 173 incident T2DM cases among 2’519 patients with coronary artery disease, Strand et al. demonstrated that trimethyl-lysine, g-butyrobetaine, as both precursors on free carnitine and palmitoyl-carnitine, predict long-term risk of T2DM independently of traditional risk factors [13].
As another example, shotgun lipidomics was applied in a transversal study on plasma of pre-diabetic mice from different genetic backgrounds and revealed a group of ceramides correlated with glucose tolerance and insulin secretion [14]. These results were interestingly confirmed by quantitative analysis in the plasma of individuals from two population-based prospective cohorts showing that dihydro-ceramides were significantly elevated in the plasma of individuals who will progress to diabetes up to 9 years before disease onset [14]. Other studies have struggled to identify the contribution of individual metabolites and focused more on metabolome-wide prediction [15], which are difficult to replicate.
The previously listed studies provide several important candidate metabolites to benchmark our experimental and modelling setup. Here, we used a subset of the CoLaus study intentionally enriched for T2DM incident cases to maximise discovery power of baseline metabolite levels being associated with developing T2DM at a later follow-up stage. We compared our findings with a similarly sized population-based cohort, DESIR, and also with bidirectional Mendelian randomisation using metabolite- and T2DM QTLs as instruments.
2. Methods
2.1 The CoLaus study
The CoLaus study (www.CoLaus-psycolaus.ch) is a population-based prospective study based on a single random sample of 6’733 participants from the overall population aged between 35-75 living in Lausanne (10). The baseline survey was conducted between 2003 and 2006. Each participant was extensively phenotyped regarding personal, lifestyle and cardiovascular risk factors; extensive blood and urine characterization was performed, and over 500’000 SNPs were directly genotyped and a further 20.4 million imputed (with r2-hat>0.3). The first follow-up was performed between April 2009 and September 2012; median follow-up time was 5.4 (average 5.6, range 4.5-8.8) years; it included 5’064 participants, and the 5.5-year incidence of T2DM was 6.5%, with 284 incident cases. The second follow-up was performed between May 2014 and April 2017; median follow-up was 10.7 (average 10.9, range 8.8-13.6) years. In this study, we selected 262 T2DM incident cases at the first follow-up and 524 controls matched for sex, age and baseline glucose. For each case, two types of controls were selected: one with a very low risk of T2DM (as assessed by a multivariable T2DM risk score [16]) and one with pre-diabetes (with a high-risk score, but no T2DM at the CoLaus second follow-up). Incident T2DM cases were defined as fasting glucose ≥ 7 mmol/L and/or presence of antidiabetic drug treatment and/or HbA1c ≥ 6.5%. The most important study characteristics are included in Table 1. The study protocols were approved by the Ethical Committee of the Canton de Vaud and all participants provided written informed consent.
2.2 The DESIR cohort
The prospective D.E.S.I.R. cohort is a nine-year follow up study of 2,391 middle-aged European ancestry participants [17-19]. We analysed participants from a case-cohort design embedded within the larger cohort that includes 231 cases of incident T2DM and 836 participants randomly sampled from the entire cohort. Baseline and follow-up clinical characteristics of participants included in the training population are shown in [4] (see Table 1). T2DM was defined using one of the following criteria: use of glucose lowering medication, fasting plasma glucose [FG] ≥7 mmol/L, or glycated hemoglobin A1c [HbA1c] ≥6.5% (48 mmol/mol). Clinical and biological evaluations were performed at inclusion and after three, six, and nine years, as previously described [20]. All participants provided written informed consent and the study protocol was approved by the Ethics Committee for the Protection of Subjects for Biomedical Research of Bicêtre Hospital, France. Metabolites measurements have been described elsewhere in full details [20].
2.3 Targeted Metabolomics analysis
Plasma and urine samples collected at the baseline of the CoLaus cohort were processed for targeted metabolomics analysis as described elsewhere [21]. Briefly, metabolites were extracted from 100 μL of plasma or urine samples and Quality Control (QC) samples using a cold methanol-ethanol solvent mixture in a 1:1 ratio. After centrifugation at 14’000 rpm for 15 minutes, supernatant was recovered, evaporated and resuspended in 100μL (for plasma) or 200μL (for urine) of H2O:MeOH (9:1). 5μL of the samples were analyzed by LC-MRM/MS on a hybrid triple quadrupole-linear ion trap QqQLIT (Qtrap 5500, Sciex) hyphenated to a LC Dionex Ultimate 3000 (Dionex, Thermo Scientific). Analyses were performed in positive and negative electrospray ionization using a TurboV ion source. The chromatographic separation was performed on a Kinetex column C18 (100×2.1 mm, 2.6 μm). The mobile phases were constituted by A: H2O with 0.1% FA and B: ACN with 0.1% FA for the positive mode. In the negative mode, the mobile phases were constituted by A: ammonium fluoride 0.5 mM in H 2O and B: ammonium fluoride 0.5 mM in ACN. The linear gradient program was 0-1.5 min 2%B, 1.5-15 min up to 98%B, 15-17 min held at 98% B, 17.5 min down to 2%B at a flow rate of 250 μL/min.
The MRM/MS method included 299 and 284 transitions in positive and negative mode respectively, corresponding to 583 endogenous metabolites. For each biological matrix, the 786 samples were prepared and analyzed in 8 batches. In order to monitor the signal drift and system performance over time, and to avoid repeated thawing-freezing cycles of the study samples, quality control (QCs) surrogate samples were used. These QC samples were prepared in the same way and at the same time of the study samples from aliquots of a pool of human plasma or urine that was the same for all the analytical batches. QC samples were injected every 8 samples in both positive and negative modes.
The MS instrument was controlled by Analyst software v.1.6.2 (AB Sciex). Peak integration was performed with MultiQuant software v.3.0 (AB Sciex). The integration algorithm was MQ4 with a Gaussian smoothing of a half-width equal to 1.5 points. For plasma samples, the analysis was narrowed to the 124 and 48 metabolites that were detected in all samples with a noise percentage of 80% and a gaussian peak shape, in positive and negative modes, respectively. For urine samples, we detected 124 and 77 metabolites in positive and negative modes, respectively. To correct for batch effect, raw data were normalized with the dbnorm package [22], by using the ber model.
2.4 Statistical approaches
We performed logistic- and linear regression analysis to test for association between baseline metabolite levels and T2DM incidence and glucose level changes, respectively. We included the following covariates: family history of diabetes, smoking status, body-mass index (BMI), HDL cholesterol, triglycerides, insulin, glucose and HOMA measure at baseline. Since none of our association P-values passed strict Bonferroni correction for multiple testing (P<0.05/172, we declared P-values below 0.01 as suggestively significant.
2.5 Bi-direction Mendelian Randomization
For each metabolite associated with glucose, we performed two-sample bidirectional Mendelian randomization (MR), an instrumental variable method to distinguish correlation from causation in observational data [23]. The idea of MR is to use genetic variants as instrumental variables to attempt causal inference about the effect of modifiable risk factors, which can overcome some types of confounding and reverse causation.
We tested whether genetically predicted levels of a particular metabolite affect the risk for elevated glucose and type 2 diabetes and whether genetically increased risk of type 2 diabetes or elevated glucose is associated with circulating levels of a particular metabolite. The associations between the instrumental variables and the exposure and the outcome are estimated from independent studies, either the metabolite GWAS or fasting glucose/T2DM GWAS.
For each metabolite associated with glucose, we performed two-sample bidirectional MR. We tested whether genetically varying levels of a particular metabolite affect the risk for elevated glucose and T2DM (we call this MR) and whether genetically increased risk of T2DM or elevated glucose is associated with circulating levels of a particular metabolite (we call this reverse MR). The associations between the instrumental variables and the exposure and the outcome are estimated from independent studies. We used summary statistics from the UKBB for glucose and those published by the DIAGRAM Consortium for T2DM [24]. The metabolite data are from a large GWAS performed on 7,824 adult individuals from European population studies.
3. Results
3.1 Study characteristics
Selected basic features of the CoLaus study are listed in Table 1. We selected 788 participants, including 263 T2DM incident cases at the first follow-up and 525 controls matched for sex, age and baseline glucose. Summary data are expressed either as counts (and percentage) for categorical variables and as median [interquartile range] or mean ± standard deviation for continuous variables.
3.2 Metabolites association scan
Based on quality criteria such as sensitivity and peak shape, we detected 172 urine and plasma metabolites (MS) in the 788 selected CoLaus participants. The analytical samples were collected at the baseline and included 525 participants without diabetes and 263 participants who became diabetic over the following 10 years. For each metabolite, we ran association scans to find those correlated to change in diabetes or glucose levels. All the results are based on the first follow-up which is the most predictive analysis. Indeed, when we compared the effects estimated in the first and second follow-up we observed significant weaker effects in the second follow-up (Pt-test =1.38 ×10−05, see Figure1).
The metabolome-wide association scan revealed seven metabolites associated with glucose change at suggestively significant level (P<0.01) in the CoLaus study, see Table 2. When we meta-analysed metabolome-wide results from the CoLaus and DESIR studies, we similarly found leucine and four additional suggestively significant (P<0.01) metabolites associated with glucose change (Table 3).
Finally, three out of the seven metabolites associated with glucose were testable with MR. Interestingly, T2DM showed a significant causal effect on valine (P = 0.003), leucine (P = 5.8 × 10−05) and glutamate (P= 2.8 × 10−06).
4. Discussion
Using mass-spectrometry targeted metabolomics analysis, we identified a panel of metabolites whose levels are associated with glucose changes before the onset of T2DM in the CoLaus Cohort. We replicated our findings in an independent study (from DESIR), which reassuringly revealed five metabolites with combined P-value below 0.01. While we focused our analysis on predicting T2DM for a 5-years follow-up period, we observed that the effect of the potentially predictive metabolites diminished over time. This is unsurprising as risk factors change over time, hence more and more unknowns contributing to diabetes conversion accumulate with time. We have noticed, furthermore, that more than 80% (140/172) of the metabolite effects are positive, meaning that generally an increased level of metabolites represents a risk factor for diabetes. This observation needs to be considered with caution, since it might be due to a latent diabetes-associated confounding factor, which is linked to overall metabolite concentration. This explanation is rather unlikely since we accounted for metabolomic principal components in all association scan.
The confirmed metabolites have been repeatedly supported by various types of evidence in both human and model organisms. Individuals with obesity and T2DM have elevated levels of branched-chain amino acids (leucine, isoleucine and valine) [11, 25-28]. Such changes are already present before the onset of diabetes [11, 12] and their causal role is believed to be exerted via the modulation of the mTOR pathway. Increased leucine levels can lead to insulin resistance via activation of the TORC1 pathway, with induction of beta cell proliferation and insulin secretion [11, 29] and disruption of insulin signal in skeletal muscle [30]. On the other hand, insulin resistance enhances protein catabolism in skeletal muscle, which can increase the release of branched-chain amino acids [31]. Moreover, hyperglycemia negatively correlates with adipose tissue expression of genes involved in branched-chain amino acid oxidation, which can further contribute to raise the levels of BCAA [31]. Thus, so far it remains unclear whether the observed BCAA changes are only a consequence of hyperglycemia or if they have a causative role in the development of T2DM.
Another metabolite which is displaying a significant association with glucose changes in the CoLaus cohort is glutamic acid, although this is not replicated in the DESIR study. A link between increased glutamate levels and insulin resistance traits was already observed in multiple cohorts [26, 32], and more recently, a meta-analysis conducted in 18 prospective studies highlighted glutamate as positively correlated with T2DM [33]. Glutamate is a glucogenic aminoacid, which can enter into the Krebs cycle through its conversion to α-ketoglutarate. In addition, it can favour gluconeogenesis by increasing the transamination of pyruvate to alanine [34], and can directly stimulate glucagon release from pancreatic α-cells [35]. However, its possible causal role remains controversial and several reports suggest that it is rather a reduced ratio between glutamine, a glutamate derivative, and glutamate itself, that is informative of metabolic risk [32, 36].
In our study, free carnitine, cortisol, phenylacetylglutamine and pantothenic acid also appeared as significantly associated to the development of T2DM, although only when our data were combined with the French cohort (CoLaus+DESIR). Carnitine esterification with fatty acids is required for the shuttling of the latter into the mitochondria for fatty acid oxidation. Lower levels of free carnitine were reported in diabetic individuals [13, 37, 38], while changes of acylcarnitines and/or carnitine precursors were highlighted in several studies as indicators of prediabetes/T2DM [8, 9, 27], although data are not always consistent. Our targeted LC-MS method included several of these acylcarnitines, such as acetylcarnitine, propionylcarnitine, and isovalerylcarnitine, but no significant association was found with subsequent development of T2DM.
Cortisol dysregulation has been linked to T2DM in cross-sectional and longitudinal studies [39, 40]. More specifically, diabetic individuals present a flattened diurnal cortisol curve compared to non-diabetic ones [41], with lower morning and higher afternoon and evening concentrations [42]. Interestingly, high levels of evening cortisol were also shown to be predictive of T2DM development in an occupational cohort [43]. Even though the mechanisms underlying this association are not completely understood, cortisol contributes to many metabolic processes that can potentially perturb glucose homeostasis [44]. Cortisol has a major role in raising glucose levels through gluconeogenesis activation. Moreover, it induces lipolysis, thus increasing the release of free fatty acids that may favour the impairment of glucose uptake. Of note, diabetes is considered a common complication in clinical states characterized by prolonged hypercortisolaemia, such as in Cushing disease [45]. However, the causality of this association remains to be determined. Phenylacetylglutamine is a nitrogenous metabolite almost exclusively derived from the gut microbiota conversion of phenylalanine [46]. Its accumulation is known to occur in uraemia and was shown to be increased in type 2 diabetic patients, particularly in association with renal damage [47, 48]. Other findings with variable degree of evidence in our study have also solid corroborating literature. Sarcosine (N-methylglycine) is an intermediate and by-product in glycine synthesis and has been found to be a moderately strong (OR=1.3) predictor of T2DM incidence [49]. More specifically, the addition of urine sarcosine to other established predictors of incident T2DM was shown to improve model performance and T2DM risk prediction in a cohort of 4’164 patients with suspected stable angina pectoris. Diabetic individuals have higher circulating proline levels and, moreover, proline-induced insulin transcription impairment may contribute to the β-cell dysfunction observed in T2DM [50].
4.1 Causal inference
Drawing causal inference is extremely difficult. While most human studies are observational and cross-sectional predominantly, only correlations are calculated between a disease status and the levels of various predictors. Such a simple measure cannot tease apart forward-, reverse causation or confounding. Longitudinal studies provide more specific directional link (called Granger causality) between a potential biomarker and disease outcomes. With its roots in differential equation modelling, an association between the baseline level of a predictor and the change of the outcome over time may imply a causal relationship. An orthogonal axis of evidence for causality can be provided by Mendelian randomization (MR), where exposure-associated genetic markers act as instruments to tease out the causal relationship between a potential risk factor and an outcome [23]. Interventional studies, due to their intrusive nature are performed mostly in model organisms and can help triangulating causal evidence. Comparisons between the different approaches for causality have been very scarce due to the little overlap between the respective scientific communities. A pioneering work [51] in this aspect has shown good agreement between disease-to-biomarker MR results, observational correlation and longitudinal associations. However, their longitudinal association included exposure (adiposity) change regressed on outcome (metabolite level) change, which implies no directionality and is not the intuitive way to perform such analysis. Among the metabolites that we identified here as early biomarkers of T2D, we have shown that leucine has bidirectional causal relationship with diabetes, with a reverse (diabetes to metabolite) causal effect that seems to be stronger. Interestingly, (predisposition to) diabetes has a significant causal effect also on glutamate and valine, while their direct effect on diabetes is not significant. Hence small molecules targeting these metabolites may be more effective for treating downstream organ damage of T2DM, such as cardiovascular disease, neuropathies or nephropathy. In line with this hypothesis, glutamate accumulation in the retina can cause neurotoxicity and the development of diabetic retinopathy [52, 53], even though the actual connection between plasma and retinal glutamate levels remains to be assessed [36].
4.2 Strengths and limitations
Our study has numerous strengths. First, is our use of two well characterized prospective cohorts (one for replication), whose participants have been followed longitudinally for more than ten years. This approach allowed us to investigate potential biomarkers in blood samples collected when individuals were still free of diabetes. Second, the robustness of our targeted methods and of our results is evidenced by the fact that we confirm many previous findings. Third, we triangulate evidence by combining these longitudinal association results with other causal inference techniques, such as Mendelian Randomisation.
The major limitation of this study is the relatively low number of incident diabetes cases that we could analyse, which prohibited us from new discoveries with unequivocal statistical evidence. In the light of these findings, we recommend future research focussing more on untargeted metabolomic approaches better exploring the vast space of metabolite species and the investigation of other omics biomarkers in parallel.
4.2 Conclusions
Our study has confirmed most of the identified-to-date metabolites in a medium-sized longitudinal population-based study (enriched for incident cases) and provided complementary evidence from bi-directional Mendelian randomisation. However, the quest for early metabolic biomarkers predicting the development of T2DM require more research effort including larger studies in order to understand the potentially minute contributions of many circulating metabolites.
Data Availability
The individual data from CoLAus and DESIR cohorts are not publicly available.
Acknowledgements
The study was supported by the Swiss National Science Foundation (#32473B-166450; FN 32003B_173092, and #31003A_182420).