Trans-ancestry genome-wide association study of gestational diabetes mellitus highlights genetic links with type 2 diabetes

Gestational diabetes mellitus (GDM) is associated with increased risk of pregnancy complications and adverse perinatal outcomes. GDM often reoccurs and is associated with increased risk of subsequent diagnosis of type 2 diabetes (T2D). To improve our understanding of the aetiological factors and molecular processes driving the occurrence of GDM, including the extent to which these overlap with T2D pathophysiology, the GENetics of Diabetes In Pregnancy (GenDIP) Consortium assembled genome-wide association studies (GWAS) of diverse ancestry in a total of 5,485 women with GDM and 347,856 without GDM. Through trans-ancestry meta-analysis, we identified five loci with genome-wide significant association (p<5x10-8) with GDM, mapping to/near MTNR1B (p=4.3x10-54), TCF7L2 (p=4.0x10-16), CDKAL1 (p=1.6x10-14), CDKN2A-CDKN2B (p=4.1x10-9) and HKDC1 (p=2.9x10-8). Multiple lines of evidence pointed to genetic contributions to the shared pathophysiology of GDM and T2D: (i) four of the five GDM loci (not HKDC1) have been previously reported at genome-wide significance for T2D; (ii) significant enrichment for associations with GDM at previously reported T2D loci; (iii) strong genetic correlation between GDM and T2D; and (iv) enrichment of GDM associations mapping to genomic annotations in diabetes-relevant tissues and transcription factor binding sites. Mendelian randomisation analyses demonstrated significant causal association (5% false discovery rate) of higher body mass index on increased GDM risk. Our results provide support for the hypothesis that GDM and T2D are part of the same underlying pathology but that, as exemplified by the HKDC1 locus, there are genetic determinants of GDM that are specific to glucose regulation in pregnancy.


ABSTRACT
Gestational diabetes mellitus (GDM) is associated with increased risk of pregnancy complications and adverse perinatal outcomes. GDM often reoccurs and is associated with increased risk of subsequent diagnosis of type 2 diabetes (T2D). To improve our understanding of the aetiological factors and molecular processes driving the occurrence of GDM, including the extent to which these overlap with T2D pathophysiology, the GENetics of Diabetes In Pregnancy (GenDIP) Consortium assembled genome-wide association studies (GWAS) of diverse ancestry in a total of 5,485 women with GDM and 347,856 without GDM. Through trans-ancestry meta-analysis, we identified five loci with genome-wide significant association (p<5x10 -8 ) with GDM, mapping to/near MTNR1B (p=4.3x10 -54 ), TCF7L2 (p=4.0x10 -16 ), CDKAL1 (p=1.6×10 -14 ), CDKN2A-CDKN2B (p=4.1x10 -9 ) and HKDC1 (p=2.9x10 -8 ). Multiple lines of evidence pointed to genetic contributions to the shared pathophysiology of GDM and T2D: (i) four of the five GDM loci (not HKDC1) have been previously reported at genome-wide significance for T2D; (ii) significant enrichment for associations with GDM at previously reported T2D loci; (iii) strong genetic correlation between GDM and T2D; and (iv) enrichment of GDM associations mapping to genomic annotations in diabetes-relevant tissues and transcription factor binding sites. Mendelian randomisation analyses demonstrated significant causal association (5% false discovery rate) of higher body mass index on increased GDM risk. Our results provide support for the hypothesis that GDM and T2D are part of the same underlying pathology but that, as exemplified by the HKDC1 locus, there are genetic determinants of GDM that are specific to glucose regulation in pregnancy.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; Gestational diabetes mellitus (GDM), defined as hyperglycaemia with onset or first recognition during pregnancy, is associated with increased risk of pregnancy complications and adverse perinatal outcomes, including pre-eclampsia, stillbirth, large for gestational age, neonatal hypoglycaemia, preterm birth, low Apgar scores and admission to neonatal intensive care [1][2][3][4] . Whilst hyperglycaemia commonly resolves postpartum, GDM often reoccurs 5 and is associated with subsequent diagnosis of type 2 diabetes (T2D) and coronary heart disease 6,7 . Although the global prevalence of GDM is increasing, it varies according to population characteristics (such as maternal age, ancestry and obesity rates) and the criteria used for screening and diagnosis 8 .
GDM and T2D share both genetic and non-genetic risk factors, including obesity, poor diet and sedentary lifestyle 9,10 . Family studies have demonstrated that women with GDM have 30.1% probability of having at least one parent with T2D, compared to just 13.2% for pregnant women with normal glucose tolerance 11 . Furthermore, women with a history of GDM appear to have a nearly 10-fold higher risk of developing T2D than those with a normo-glycaemic pregnancy 7 . Taken together, these observations support the hypothesis that the two diseases are part of the same underlying pathology, with pregnancy potentially acting as a stress test that reveals women at increased risk of GDM and/or T2D 12,13 .
There have been considerable advances in our understanding of the genetic contribution to T2D through large-scale genome-wide association studies (GWAS) across diverse populations [14][15][16][17] . In contrast, despite the observed familial clustering of GDM 18 , most genetic association studies of the disease have focussed on evaluating the impact of previously reported loci for T2D and glycaemic traits in modest sample sizes 19 . The most comprehensive systematic review of genetic susceptibility to GDM (from 23 studies) revealed association with T2D risk variants from seven loci, of which six are related to insulin secretion and one to insulin resistance 20 . A genetic risk score (GRS) of risk variants across 34 loci associated with T2D and/or fasting glucose was significantly associated with GDM and improved predictive power over a model including only clinical variables 21 . Variants associated with both insulin secretion and insulin resistance have also been used to construct an aggregated GRS that was shown to predict GDM risk, with and without adjustment for body mass index (BMI), maternal age, and gestational age, although this score was not compared with established clinical predictors 22 . To date, the largest GWAS of GDM has been undertaken in women from a Korean population, including 468 cases and 1,242 non-diabetic controls in the discovery stage, with an additional 931 cases and 783 non-diabetic controls in the follow-up stage 23 . Two loci were associated with GDM at genome-wide significance (p<5x10 -8 ), mapping near MTNR1B and CDKAL1, both of which have also been previously implicated in T2D risk.
To gain novel insight into the genetic architecture of GDM, the GENetics of Diabetes In Pregnancy (GenDIP) Consortium assembled GWAS of diverse ancestry in a total of 5,485 women with GDM and 347,856 women without GDM: the effective sample size was 72.2% European, 13.4% East Asian, 9.9% South Asian, 2.8% Hispanic/Latino and 1.7% African (Tables S1 and S2). To maximise sample size, we used a phenotype definition that makes best use of the information available in each study, including data from health records, oral glucose tolerance tests and self-report (Table S1). Each GWAS was imputed to reference panels from the 1000 Genomes Project 24 , Haplotype Reference Consortium 25 , or population-specific whole-genome sequence data (Table S3). Within each GWAS, GDM association summary . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; statistics were derived for all single nucleotide variants (SNVs) passing quality control after appropriate adjustment to account for population structure (Supplementary Materials and Methods, Table S3). With these resources, we aimed to improve our understanding of the aetiological factors and molecular processes driving the occurrence of GDM, including the extent to which these overlap with T2D pathophysiology, and investigate the effects of potential causal metabolic risk factors on the disease through Mendelian randomisation (MR).
We began by aggregating GDM association summary statistics across GWAS through transancestry meta-analysis. The most powerful methods allow for potential allelic effect heterogeneity on disease between ancestry groups that cannot be accommodated in a fixedeffects model 26 . Our primary analysis used MR-MEGA 27 , which models heterogeneity between GWAS by including axes of genetic variation that represent ancestry as covariates in a meta-regression model. We considered three axes of genetic variation that separated the five ancestry groups, but which also revealed finer-scale genetic differences between GWAS of the same ancestry ( Figure S1). We also conducted trans-ancestry and ancestry-specific fixed-effects meta-analyses. We identified five loci at genome-wide significance in the transancestry meta-regression ( Table 1, Figures S2 and S3), including the previously reported associations from GDM GWAS at MTNR1B (rs10830963, p=4.3x10 -54 ) and CDKAL1 (rs9348441, p=1.6×10 -14 ). The remaining three loci for GDM mapped to/near TCF7L2 (rs7903146, p=4.0x10 -16 ), CDKN2A-CDKN2B (rs10811662, p=4.1x10 -9 ) and HKDC1 (rs9663238, p=2.9x10 -8 ). Through approximate conditional analyses, conducted using ancestry-matched linkage disequilibrium (LD) reference panels for each GWAS (Supplementary Materials and Methods), we observed no evidence for multiple distinct association signals at genome-wide significance at any of the five GDM loci ( Figure S4).
We next sought to investigate the impact of differences in ancestry and phenotype definition between GWAS on heterogeneity in allelic effects at GDM loci. To do this, we extended the MR-MEGA meta-regression model to include an additional covariate to represent whether GDM status in the study was confirmed via "a universal blood-based test" (Supplementary Materials and Methods, Table S1). Here, we use this term to refer to a blood-based test that was applied to all participants, including a diagnostic oral glucose tolerance test (OGTT) or a screening glucose challenge or fasting glucose test, in contrast to clinician decision, risk factor screening, or a lack of clarity on what basis women did or did not have a diagnostic OGTT. This model enables partitioning of heterogeneity into three components ( Table 2). The first component captures heterogeneity that is correlated with genetic ancestry (that can be explained by the three axes of genetic variation), which can occur because of differences in the structure of LD between ancestry groups or interactions with lifestyle factors that vary across populations. The second component measures heterogeneity that can be explained by the use of a universal blood-based test to screen for or diagnose GDM. The final component reflects residual heterogeneity due to study design that cannot be explained by the first two components. The greatest evidence of ancestry-correlated heterogeneity (after accounting for the use of a universal blood-based test) was observed at the CDKAL1 locus (pHET=3. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ;  effects on GDM in GWAS of East Asian ancestry than in other populations, despite the risk allele being common in all ancestry groups ( Figure S5, Table S4). A similar pattern of ancestrycorrelated heterogeneity in allelic effects on T2D has been reported at the CDKAL1 locus 16 . Weaker evidence of ancestry-correlated heterogeneity was observed at the CDKN2A-CDKN2B locus (pHET=0.0022), where there were marked differences in the effects on GDM of the lead SNV between GWAS undertaken in different ancestry groups ( Figure S5, Table S4). In contrast, there was no evidence of heterogeneity due to phenotype definition for any lead SNV, suggesting that differences in allelic effects between GWAS are more likely due to factors related to genetic ancestry than the use of a blood-based test in all women to screen for or diagnose GDM. Of the five GDM loci identified at genome-wide significance in the trans-ancestry metaregression, four have been previously implicated in T2D susceptibility: MTNR1B, TCF7L2, CDKAL1 and CDKN2A-CDKN2B. In fact, in previously reported trans-ancestry GWAS metaanalyses of 180,834 T2D cases and 1,159,055 controls from the DIAMANTE Consortium 16 , the lead T2D SNV is the same as we report for GDM at MTNR1B, TCF7L2 and CDKAL1, and is in strong linkage disequilibrium (LD) at the CDKN2A-CDKN2B locus (rs10811661, r 2 =0.91 across diverse populations in the 1000 Genomes Project 24 ). To further investigate the genetic correlation between the two diseases, we extracted GDM association summary statistics from

Figure 1. Correlation between GDM and T2D association summary statistics for lead
CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; our trans-ancestry meta-analysis for lead SNVs at 222 previously reported loci for T2D from the DIAMANTE Consortium 16 (Figure 1, Table S5). We observed a strong positive correlation in log-ORs for the T2D risk allele between the two diseases: Pearson r=0.573 (p<2.2x10 -16 ). There was also a highly significant enrichment of GDM associations at T2D loci (50 of 222 lead SNVs with p<0.05 and same direction of effect, binomial test p<2.2x10 -16 ), indicating that they would be discovered at genome-wide significance with larger effective sample sizes. Indeed, after excluding the four overlapping GDM-T2D loci (Supplementary Materials and Methods), a weighted genetic risk score of lead T2D SNVs was significantly associated with GDM (p=9.7x10 -123 , pseudo-R 2 =2.86%). Extending our analyses, genome-wide, using LD-score regression, we observed strong genetic correlation between GDM and T2D: rG (95% CI): 0.744 (0.052, 1.437). Weaker genetic correlations between GDM and other glycaemic traits were also observed ( Table 3, Table S6). These results are consistent with sharing of genetic determinants of GDM and T2D, although we acknowledge that LD score regression has limited statistical power because of the relatively small GDM sample size, and we note that the correlation from LD-score regression is not bound by -1 to 1, particularly when power is low. a Genetic correlation obtained from LD-score regression is not bound by -1 to 1 and estimates can therefore be found outside these limits due to high imprecision caused by factors such as low sample size in the association summary statistics used.
The most obvious difference in allelic effect sizes between GDM and T2D was observed at the MTNR1B locus (Figure 1). The lead SNV, rs10830963, is the same for both diseases, but the allelic effect on GDM is substantially greater than on T2D: OR (95% CI) for GDM is 1.41 (1.35-1.47) and for T2D is just 1.09 (1.08-1.10). The MTNR1B lead SNV is associated, at genomewide significance, with fasting glycaemic traits in non-diabetic individuals from the Meta-Analysis of Glucose and Insulin-related traits Consortium (MAGIC) Investigators 28,29 . The GDM risk allele at the lead SNV is also associated with higher fasting plasma glucose and 1-hour plasma glucose in pregnant women from the Hyperglycemia and Adverse Pregnancy Outcomes (HAPO) Study 30 . This SNV also has the strongest association of the maternal glucose-raising allele with higher offspring birth weight in women from the Early Growth Genetics Consortium 31 , in line with the known effects of maternal hyperglycaemia on fetal growth. In non-diabetic individuals from the MAGIC Investigators 32 , the MTNR1B lead SNV has a much larger impact on fasting glucose than those at TCF7L2, CDKAL1 and CDKN2A-. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; https://doi.org/10.1101/2021.10.11.21264235 doi: medRxiv preprint CDKN2B 33 (Table S7). Therefore, the difference in allelic effect sizes between GDM and T2D at MTNR1B may reflect the fact that thresholds of fasting plasma glucose used to diagnose GDM are lower than those used to diagnose T2D, meaning that a larger proportion of GDM than T2D cases will have higher fasting glucose that is regulated within the normal range.
To gain insight into the molecular processes and tissues through which GDM association signals are mediated, genome-wide, we then undertook fGWAS enrichment analyses within three categories of functional and regulatory annotations: (i) genic regions 34 ; (ii) chromatin immuno-precipitation sequence (ChIP-seq) binding sites for 165 transcription factors 35,36 ; and (iii) 13 unique and recurrent chromatin states in four diabetes-relevant tissues (pancreatic islets, liver, adipose, and skeletal muscle) 37 . We observed significant joint enrichment (p<0.05) for GDM associations mapping to protein coding exons, binding sites for FOXA2, NFE2 and TFAP2, and chromatin states in adipose tissue and skeletal muscle that mark enhancers and transcribed regions ( Table S8). FOXA2 is a pioneer factor involved in pancreatic and hepatic development, and T2D association signals have been previously reported to be enriched for FOXA2 binding sites 38 . Skeletal muscle is the most prominent site of insulinmediated glucose uptake in humans, and enhancers in skeletal muscle have been reported to overlap association signals for metabolic disorders, including T2D, insulin resistance and obesity 39 . These enrichment analyses highlight molecular processes and tissues that are broadly consistent with those important in mediating T2D association signals 16 , although the involvement of pancreatic islets appears to be less prominent for GDM.
In contrast to the other GDM loci reported in this investigation, the lead SNV at the HKDC1 locus (rs9663238) demonstrates only weak statistical evidence of T2D association in previously reported trans-ancestry GWAS meta-analyses from the DIAMANTE Consortium 16 (p=0.0083, compared with p<10 -65 at the other four loci). GDM risk alleles at variants in strong LD (European ancestry r 2 >0.9) with the lead SNV have been previously associated, at genomewide significance, with higher 2-hour plasma glucose (2HPG) in pregnant women in the HAPO Study and two replication studies of European ancestry 30 , as well as with higher birth weight of first child (likely via greater maternal glucose availability), higher own birth weight (fetal effect independent of the maternal effect on birth weight), and comparative height and body size at age 10 in UK Biobank 40,41 (Table S9). In addition to demonstrating the association of the maternal SNVs at this locus with GDM in the current study, we observed that 99% credible set variants are lead SNVs for HKDC1 expression quantitative trait loci in a range of tissues in the GTEx Project 42 , including visceral adipose, subcutaneous adipose and pancreas (Supplementary Materials and Methods, Table S10). HKDC1 (Hexokinase Domain Containing 1) catalyzes the phosphorylation of hexose to hexose 6-phosphate and is involved in glucose homeostasis and hepatic lipid accumulation. Haplotypes of variants associated with 2HPG in pregnancy disrupt regulatory element activity and reduce HKDC1 expression across diverse tissues (including metabolically relevant liver stellate cells and pancreatic islet beta cells), which has been demonstrated to reduce hexokinase activity in multiple cellular models 43 . Knockout of hepatic HKDC1 in pregnant mice has also been demonstrated to significantly impair glucose tolerance, highlighting the importance of liver HKDC1 on glucose metabolism during pregnancy 44 . Taken together, the evidence from our study and others suggests a more important role for HKDC1 in glucose metabolism during pregnancy than outside of pregnancy, in addition to independent maternal and offspring effects on early growth, and highlights that . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; while GDM shares many similarities with T2D, there are differences in at least one underlying pathway.
Finally, we used two-sample MR to investigate causal effects on GDM of 282 metabolic measures and risk factors available in the MR-Base GWAS catalogue (www.mrbase.org) 45 , including metabolites, anthropometric measures, hormones, immune system phenotypes, kidney traits and metals (Supplementary Materials and Methods, Table S11). We did not consider glycaemic traits (including HbA1c) because they are used to define GDM status. For each metabolic measure, we selected independent SNVs attaining genome-wide significance with the trait as instrumental variables. For each SNV, we extracted association summary statistics for GDM from the European ancestry-specific meta-analysis because we assessed independence of genetic instruments using LD from European ancestry haplotypes from the 1000 Genomes Project 24 . Of the 282 exposures considered, only BMI demonstrated significant evidence for a causal effect on GDM risk at a false discovery rate of 5% ( Table S11). The estimated causal effect of higher BMI on higher GDM risk was directionally consistent across multiple MR models (Figure 2). The causal relationship of BMI with GDM is consistent with its effect on T2D 46 .

Figure 2. Effects of BMI on GDM from MR analyses. Each point corresponds to an independent SNV (genetic instrument), plotted according to the effect on BMI (on the x-axis) and the effect on GDM (log-OR, on the y-axis). Horizontal and vertical bars represent the standard errors of effect estimates. The coloured regression lines represent the effect of BMI on GDM from six MR models.
In conclusion, we have conducted the largest and most ancestrally diverse GWAS metaanalysis for GDM, where we identified associations mapping to MTNR1B, TCF7L2, CDKAL1, CDKN2A-CDKN2B and HKDC1. Our results demonstrated strong correlation in the effects of previously reported associations for T2D and those observed for GDM, and highlighted . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; overlapping molecular mechanisms and tissues that mediate associations for both diseases. In contrast, variation at the HKDC1 locus is not strongly associated with T2D, but instead plays a more important role in glucose metabolism during pregnancy than outside of pregnancy. The genetic diversity of GWAS contributing to our meta-analysis enabled identification of ancestry-correlated heterogeneity in allelic effects on GDM at two loci. Such heterogeneity could reflect variable impact of different pathophysiology driving glycaemic dysregulation in pregnancy between ancestries and emphasizes the need for increased sample sizes in underrepresented population groups. In contrast, results were consistent between GWAS in which all women had a universal blood-based test and those that did not, suggesting little impact from misclassification due to selective use of diagnostic tests only in those deemed to be at high-risk. Finally, MR analyses revealed a significant causal effect of higher BMI on GDM risk, consistent with the causal association observed with T2D. Taken together, these results provide further support for the hypothesis that T2D and GDM are part of the same underlying pathology. However, they also highlight there are pathways to GDM that impact on glucose regulation only in pregnancy, and that additional GDM-specific associations will be revealed through GWAS in larger sample sizes.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted October 14, 2021. ; EGCUT. This study was funded by the European Union through the European Regional STORK. The STORK study received additional funding from the Norwegian Diabetes Association, the Norwegian Odd Fellow Research Fund and Johan Selmer Kvanes' Endowment for Research in Diabetes.

STORKG.
Acknowledge Hormone Laboratory, Oslo University Hospital for DNA-extraction.
UKBB. UK Biobank analyses were conducted using the UK Biobank resource under application 11867.
VIVA. Grants from the US National Institutes of Health (R01 HD034568, UH3 OD023286). Project Viva is thankful to all Project Viva participants for their participation to research over many years.
of Genentech and a holder of Roche stock. G.T., V.S., and K.S. are employees of deCODE genetics/Amgen, Inc. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

AUTHOR CONTRIBUTIONS
The copyright holder for this this version posted October 14, 2021. ;