Discovery of 318 novel loci for type-2 diabetes and related micro- and macrovascular outcomes among 1.4 million participants in a multi-ethnic meta-analysis

Marijana Vujkovic; Jacob M. Keaton; Julie A. Lynch; Donald R. Miller; Jin Zhou; Catherine Tcheandjieu; Jennifer E. Huffman; Themistocles L. Assimes; Renae L. Judy; Jie Huang; Kyung Min Lee; Derek Klarin; Saiju Pyarajan; John Danesh; Olle Melander; Asif Rasheed; Nadeem Qamar; Saqib S. Sheikh; Shahid Hameed; Irshad H. Qureshi; Muhammad N. Afzal; Uzma Jahazaib; Anjum Jalal; Shahid Abbas; Xin Sheng; Long Gao; Klaus H. Kaestner; Katalin Susztak; Yan V. Sun; Scott L. Duvall; Kelly Cho; Jennifer S. Lee; John M. Gaziano; Lawrence S. Philips; James B. Meigs; Peter D. Reaven; Peter W. Wilson; Todd L. Edwards; Daniel J. Rader; Scott M. Damrauer; Christopher J. O’Donnell; Philip S. Tsao; HPAP Consortium, Regeneron Genetics Center, VA Million Veteran Program; Kyong-Mi Chang; Benjamin F. Voight; Danish Saleheen

doi:10.1101/19012690

Abstract

We investigated type 2 diabetes (T2D) genetic susceptibility in a multi-ethnic meta-analysis of 228,499 cases and 1,178,783 controls in the Million Veteran Program (MVP) and other biobanks. We identified 558 autosomal and 10 X-chromosome T2D-associated variants, of which 286 autosomal and 7 X-chromosome variants were previously unreported. Ancestry-specific analyses identified 25 additional novel T2D-susceptibility variants. Transcriptome-wide association analysis detected 3,568 T2D-associations with T2D-colocalized genetically predicted gene expression of 804 genes in 52 tissues, of which 687 are novel. Fifty-four of these genes are known to interact with FDA-approved drugs and chemical compounds. T2D polygenic risk score was strongly associated with increased the risk of T2D-related retinopathy, and additionally showed evidence for association with chronic kidney disease (CKD), neuropathy, and peripheral artery disease (PAD). We investigated the genetic etiology of T2D-related vascular outcomes in the MVP and observed statistical SNP-T2D interactions at 13 variants, including 3 for coronary heart disease, 1 for PAD, 2 for stroke, 4 for retinopathy, 2 for CKD, and 1 for neuropathy. Our findings may identify potential novel therapeutic targets for T2D and genomic pathways that link T2D and its vascular outcomes.

Type 2 diabetes mellitus (T2D) - a leading cause of morbidity globally - is projected to affect up to 629 million people by 2045¹. People with T2D are at increased risk of developing a wide range of macro- and microvascular outcomes², and there are large disparities in prevalence, severity and co-morbidities across global populations. T2D is highly polygenic and over 400 common variants have been identified that confer disease susceptibility; however, most studies have been performed in European and Asian ancestry populations, and dominant risk alleles in other ethnic groups remain to be discovered. The genetic etiology of T2D-related sequelae remain poorly understood. Identifying genetic risks and critical genes for T2D-related micro- and macrovascular disease could inform clinical management strategies, including patient stratification and optimizing study design of randomized controlled trials for novel treatments. The lack of large-scale multi-ethnic deeply phenotyped cohorts linked to genetic data has made it difficult to address these questions.

We conducted a multi-ethnic association study of T2D risk comprised of 228,499 T2D cases and 1,178,783 controls of European, African American, Hispanic, South Asian, and East Asian ancestry. We investigated the association of a T2D polygenic risk score with 3 major T2D-related macrovascular outcomes (coronary heart disease [CHD], ischemic stroke, and peripheral artery disease [PAD]) and 3 microvascular disease (chronic kidney disease [CKD], retinopathy and neuropathy) in the Million Veteran Program (MVP)³. We conducted a genome-wide SNP-T2D interaction analysis on vascular outcomes in MVP to identify genetic variants where the effect of SNP on the vascular outcome depends on the context of T2D presence. We also performed association analyses of genetically-predicted expression levels and expression quantitative trait-T2D colocalization analyses to identify the effects of gene-tissue pairs that influence T2D risk through inter-individual variation in expression.

This study differs from prior GWAS of T2D risk by leveraging large-scale clinical data in conjunction with polygenic scores, extensive evaluation of context specificity for genetic effects on T2D vascular sequelae and describing the regulatory circuits that influence T2D risk. These observations move the study of T2D genetics beyond lists of variants and provide novel biological insights with translational potential.

Results

Study Populations

We performed a genome-wide, multi-ethnic T2D-association analysis (228,499 cases and 1,178,783 controls) encompassing five ancestral groups (Europeans, African Americans, Hispanics, South Asians and East Asians) by meta-analyzing genome-wide association study (GWAS) summary statistics derived from the Million Veteran Program (MVP)³ and other studies with non-overlapping participants: DIAMANTE Consortium⁴, Penn Medicine Biobank⁵, Pakistan Genomic Resource⁶, Biobank Japan⁷, Malmö Diet and Cancer Study⁸, Medstar⁹, and PennCath⁹ (Online Methods, Supplementary Table 1 and 3). MVP participants (n = 273,409) are comprised predominantly of male subjects (91.6%) and were classified as Europeans (72.1%), African Americans (19.5%), Hispanics (7.5%), and Asians (0.9%)(Supplemental Table 2).

View this table:

Table 1:

T2D locus discovery in African Americans

Single-variant autosomal analyses

We identified a total of 558 independent sentinel SNPs (286 novel, 272 previously reported)^4,7,10,11 associated with T2D (Figure 1, Table 1, Supplemental Tables 4, 5, 6, 10, and 11, Supplemental Figure 1a-e). Variants were classified as novel if they were at least 500 kb distant from a previously reported sentinel SNP^4,7,10,11 and/or if they were not in linkage disequilibrium (LD, r² ≤ 0.05) with an established sentinel SNP^4,7,10,11. A total of 21 SNPs showed genome-wide significance in Europeans only (Supplemental Table 6). We found that novel loci had, on average, smaller magnitudes of effect (0.032 ± 0.012 per allele) than previously established SNPs (0.054 ± 0.045 per allele, derived from data shown in Supplementary Table 5), presumably resulting from enhanced power to discover weaker effects due to the large sample size and ancestral diversity. Genome-wide chip heritability analysis explained 19% of T2D risk⁴.

Figure 1:

Trans-ancestry GWAS meta-analysis identifies 318 loci associated with T2D. The graph represents a circos plot. The outer track corresponds to –log10 (P) for association with T2D in the trans-ethnic meta-analysis (y axis truncated at 30), by chromosomal position. The red line indicates genome-wide significance (P=5.0×10−08). Purple gene labels indicated genes identified in skeletal muscle eQTLs by S-PrediXcan analysis, red-labeled gene names in adipose eQTLs, black-labeled gene names in pancreas eQTLs, and blue-labeled gene names were identified in eQTLs from arteries. The green band corresponds to measures of heterogeneity related to the index SNPs associated with T2D. Dot sizes are proportional to I2 or ancestry-related heterogeneity. The inner track corresponds to – log10(P) for association with skeletal muscle, adipose, pancreas, and artery tissue eQTLs from S-PrediXcan analysis (y axis truncated at 20), by chromosomal position. The red line indicates genome-wide significance (P=5.0×10−08). Inset, effects of all 318 index SNPs on T2D by minor allele frequency, stratified and colored by ancestral group.

In an analysis of African American participants (Table 1), we observed a total of 21 loci associated with T2D susceptibility at genome-wide significance, 16 of which were in strong LD with established T2D variants. Three variants were novel and the effect on T2D appeared specific to African-Americans. The frequency of the T2D risk-increasing allele is notably higher in African reference populations than European for these SNPs, with the ancestral allele the major allele in African populations and the derived allele being the major allele in European populations. Single variant analysis in the Hispanics subset identified 2 SNPs, both of which tagged previously reported T2D loci (Supplemental Table 10). No novel variants were observed in the subjects of Asian ancestry (Supplemental Table 11).

Chromosome-X analyses

A total of 10 association signals for T2D were identified on chromosome X in a multi-ethnic meta-analysis, of which 7 were novel (Table 2a-b, Supplemental Figure 2a-e). A European-restricted analysis identified 4 chromosome-X loci associated with T2D risk, all of which were identified in the multi-ethnic meta-analysis. One novel chromosome-X locus was associated with T2D that is specific to African Americans. Of interest is one novel, multi-ethnic locus near the androgen receptor gene that was in strong LD with a previously reported variant (rs4509480) known to be associated with male-pattern baldness (EUR r²=0.98, rs200644307).

View this table:

Table 2a:

T2D chromosome X analysis (overall results)

View this table:

Table 2b:

T2D chromosome X analysis (stratified results by gender)

Figure 2:

T2D gPRS is mainly predictive of micro vascular outcomes. A T2D gPRS was calculated and categorized into deciles based on the scores in controls. The PRS-outcome associations are shown for macrovascular outcomes (CHD, CKD and PAD) and for microvascular outcomes (retinopathy, neuropathy, and acute ischemic stroke). Effect sizes and 95% confidence intervals are shown per decile per micro- or macrovascular outcome. For each of the complication outcomes separate logistic regression model are fitted for people with T2D, and the models include the following independent variables: T2D PRS (from Diamante consortium), age, gender, BMI, and 10 PCA’s. For coronary heart disease, a CHD PRS (from CardiogramplusC4DplusUKBB) is included in the regression model as an additional covariate. For acute ischemic stroke, a stroke PRS (from Megastroke consortium) is included in the regression model as an additional covariate. For chronic kidney disease, a CKD PRS (from CKDgen consortium) is included in the regression model as an additional covariate.

Effect heterogeneity between Europeans and African Americans

For most loci, we found no evidence of significant heterogeneity of effect estimates between Europeans and African Americans, whereas 44 (7.9%) had significantly different effect estimates (Supplemental Table 20). Overall, the strength of effect was found to be highest in Europeans, an expected result given the abundance of individuals of that ancestry in our study. However, 4 loci nearby the genes SLC30A8, PTPRQ, GRB10, and COLB showed higher effect sizes for T2D at stronger levels of significance in African Americans compared with Europeans. Of these loci, associations with loss of function variants in the SLC30A8 gene were previously reported in Europeans, African Americans and South Asians^12,13.

Secondary signal analysis

We detected a total of 233 conditionally independent SNPs that flank 49 novel and 108 previously reported lead SNPs in Europeans (Supplementary Table 7 and 9), and identified 9 independent variants in Europeans and 3 in African Americans flanking 3 novel and 6 previously reported sentinel SNPs. We observed no novel conditionally independent variants in participants of South Asian, East Asian and Hispanic ancestry.

Rare coding variant mapping

To identify coding variants that may influence T2D risk by changing protein structures, we investigated predicted loss of function (pLoF) and missense variants in LD with the identified T2D lead variants (Supplementary Table 12). We identified 2 pLoF (LPL and ANKDD1B) and 45 missense variants in 47 genes that were in LD with at-least one of the T2D lead SNPs (r² > 0.5, MVP reference panel in Europeans) and were associated at a P-value < 1.0×10⁻⁴. We also identified 9 missense variants that were either within 5 Mb of sentinel SNPs or in moderate to low LD (r² < 0.5) and were associated with T2D at a P-value < 1.0×10⁻⁴. Of the 56 pLoF and missense variants, 14 missense variants were found to be the sentinel T2D SNPs and 19 variants were in LD with novel lead SNPs, and 37 variants were previously reported.

We then performed a genome-wide screen of all pLoF’s (not bound by proximity to sentinel T2D lead variants) and missense variants available for analysis (Supplemental Table 13) and identified one additional pLoF variant in the CCHCR1 gene, whereas 37 novel missense variants were associated with T2D at a P-value threshold of 5×10⁻⁸.

Rare coding variant PheWAS

To understand biological pleiotropy and underlying mechanisms, we performed a PheWAS of the 3 pLoF variants associated with T2D in MVP participants of European ancestry (Table 5). These loci included ANKDD1B p.Trp480*, CCHCR1 p.Trp78*, and LPL p.*474Ser in the MVP, and were significantly associated with metabolic and inflammatory conditions. For example, ANKDD1B p.Trp480* was associated with dyslipidemia, hypercholesterolemia, and diabetic neurological manifestations. CCHCR1 p.Trp78* was associated with type 1 diabetes, epistaxis, celiac disease, microscopic hematuria, and psoriatic arthropathy. Finally, the recent report by Klarin et al. includes a Phewas of LPL p.*474Ser in MVP¹⁴, which was associated with dyslipidemia, coronary atherosclerosis and other chronic ischemic heart disease.

Transcriptome-wide association analyses

Common variants from the European T2D GWAS meta-analysis were used to evaluate the association of genetically predicted gene expression levels with T2D risk across 52 tissues including kidney and islet cells using S-PrediXcan¹⁵ (Supplemental Table 8, Supplemental Figure 3). We identified 4,468 statistically significant gene-tissue combination pairs genetically predictive of T2D risk, of which 4,211 transcript eQTLs were in LD (r² > 0.5) with T2D signals. We identified 873 genes in this analysis that would not have been identified by nearest-gene annotation alone. The strongest gene-tissue combination signals were for NRAP in the cerebellum and TCF7L2 in the aortic artery.

We additionally used COLOC to identify the subset of significant genes where there was a high posterior probability that the set of model SNPs in the S-PrediXcan analysis for each gene were both causal for the expression and T2D. This analysis refined the results of the TWAS and excluded some results that might be the consequence of LD between causal SNPs for gene expression and T2D. We detected 3,568 gene-tissue pairs where there was statistically significant association with T2D risk and high posterior probability (P4>0.5) of colocalization. The colocalized and significant results included 804 genes. When comparing the 804 genes to the GWAS catalog mapped and reported genes for all prior studies of diabetes or diabetes complications, 687 had not been identified previously.

Hypergeometric enrichment analysis showed that most enriched gene expression signals were in the following tissues: cervical spinal cord, basal ganglia, glomerular kidney, tibial nerve, transformed fibroblasts, and skeletal muscle (Supplemental Table 15). A cross-section of the putative genes identified through transcriptome-wide analysis and the genes identified from the genome-wide coding variant lookup (Supplemental Table 12) identified the genes: PCNT p.Ile539Thr, NMI p.Leu16Ser, TRIM66 p.His322Arg, STARD3 p.Gln117Arg, and ZNF641 p.Gln363Pro.

Assessment of gene–drug relationships

Of the 991 genes identified in S-PrediXcan analyses, 54 genes have documented interactions with a total of 283 FDA-approved drugs and chemical compounds that do not have an indication for T2D treatment or reported ADE in diabetic patients using the SIDER database of drugs and side effects¹⁶. Using the Drug-Gene Interaction Database¹⁷ (DGIdb version 3.0) a total of 322 gene-drug combinations were identified for which it is predicted that they would lower blood glucose based on direction of effect on T2D risk with increasing gene expression and drug action (activator or inhibitor)(Supplemental Table 14). Gene-drug combinations included several established T2D loci including KCNJ11 targeted by 15 compounds (e.g. sulfonylureas, glinides, and p-glycoprotein inhibitors), SCNA3 targeted by 57 compounds (e.g. anti-arrythmetics, anti-epileptics), PIK3CB targeted by 46 compounds (e.g. cancer drugs), ACE targeted by 36 compounds (e.g. angiotensin-converting enzyme [ACE] inhibitors), HMGCR targeted by 18 compounds (e.g. HMG-CoA reductase inhibitors), PIK3C2A targeted by 15 compounds (anti-cancer drugs), F2 targeted by 11 compounds (anti-coagulants), and BLK targeted by 9 compounds (protein kinase inhibitors).

The gene most significant in S-PrediXcan targeted by a non-antiglycemic medication was PPARG in EBV-transformed lymphocytes (Supplemental Table 8). The gene most significant in S-PrediXcan targeted by a drug with report of ADE in diabetic patients was TH in non-sun-exposed suprapubic skin. The gene most significant in S-PrediXcan targeted by an established anti-glycemic medication was KCNJ11 in sun-exposed skin from the lower leg.

Pathway and functional enrichment analysis

To explore whether our results recapitulate the pathophysiology of T2D, we performed gene-set enrichment analysis with all the variants using DEPICT (P < 1×10⁻⁵, Supplemental Table 16). MeSH-based analysis showed that several different adipose tissues and sites were enriched (abdominal subcutaneous fat, abdominal fat, adipose tissue, subcutaneous fat, and white adipose tissue). Finally, DEPICT analysis showed that the most significant gene-set involved the AKT2 subnetwork, lung cancer, the GAB1 signalosome, protein kinase binding, signal transduction, and EGFR signaling (Supplemental Table 17).

Genetic correlation between T2D and other phenotypes

Genome-wide genetic correlations of T2D were calculated with a total of 774 complex traits and diseases by comparing allelic effects using LD score regression (Online Methods). A total of 270 significant associations were observed (P < 5.0×10⁻⁸, Supplementary Table 19). The strongest positive correlations were observed with waist circumference, overall health, BMI, and fat mass of arms, legs, body and trunk, hypertension, coronary artery disease, dyslipidemia, alcohol intake, years of education, wheezing, and cigarette smoking. There was also a strong negative correlation with years of education.

T2D-related micro- and macrovascular outcomes

Using a genome-wide approach, we investigated SNP-T2D interaction effects associated with T2D-related vascular outcomes among European-descent MVP participants (P < 5×10⁻⁸, Online Methods, Table 3a-b). The analysis included a total case count of 67,403 for CKD, 56,285 for CHD, 35,882 for PAD, 11,796 for acute ischemic stroke, 13,881 for retinopathy, and 40,475 for neuropathy. Several genome-wide significant interactions were identified where the genetic associations with T2D-related vascular outcomes were modified by T2D (Table 3a-b). We identified 2 loci for CHD (rs1831733 in 9p21 and rs602633 near SORT1) and 1 for CKD (rs34857077 in UMOD) for which the difference in the effect estimates between T2D strata was genome-wide significant (P < 5.0×10⁻⁰⁸) and at least one T2D-stratum was genome-wide significant as well. We additionally identified 1 locus for CHD (rs71039916 near PDE3A), 1 for CKD (rs2177223 near TENM3), 1 for PAD (rs3104154 in PTDSS1), 1 for neuropathy (rs78977169 near NRP2), 4 for retinopathy (rs76754787 nearby GJA8, rs10733997 in SVILP2, rs2255624 near SLC18A2, and rs4132670 in TCF7L2) and 2 for acute ischemic stroke (rs491203 near TMEM51, and rs2134937 near TRIQK) that showed a genome-wide significance for difference in effect estimates between the T2D strata and nominal significance (P < 0.001) for at least one T2D stratum.

View this table:

Table 3a:

Genome-wide interaction analysis of vascular and non-vascular complications (overall results)

View this table:

Table 3b:

Genome-wide interaction analysis of vascular and non-vascular complications (T2D strata specific effect estimates)

Polygenic risk scores and T2D-related micro- and macrovascular outcomes

Genome-wide polygenic risk scores (gPRS) for T2D were calculated in Europeans based on the T2D effect estimates from the previously reported DIAMANTE consortium⁴ and then categorized into deciles (Table 4a-b). As expected, participants with the highest T2D gPRS scores (90-100% T2D gPRS percentile) showed the highest risk for T2D (OR = 5.21, 95% CI 4.94-5.49, Supplemental Figure 5) when compared to the reference group (0-10% T2D gPRS percentile) in a cross-sectional study design.

View this table:

Table 4a:

Polygenic risk scores and vascular complications

View this table:

Table 4b:

Polygenic risk scores and non-vascular complications

View this table:

Table 5:

Phewas of 2 LoF variants in MVP whites

We evaluated whether the T2D gPRS was associated with the risk of micro- and macrovascular outcomes in an analysis restricted to participants with T2D. The P-value are calculated using gPRS as a continuous exposure and odds ratios are shown by contrasting the top to the bottom gPRS decile (Figure 2 and Table 4a-b). The T2D gPRS was more strongly associated with microvascular than macrovascular outcomes, in particular with retinopathy, but also with neuropathy and CKD. For macrovascular outcomes, T2D gPRS was associated with the risk of PAD, but not with the risk of CHD or acute ischemic stroke.

Discussion

We report the discovery of 318 novel autosomal and X-chromosome variants associated with T2D susceptibility in a large multi-ethnic T2D study. Furthermore, we observed 13 variants associated with differences in T2D-related micro- and macrovascular outcomes between diabetic and non-diabetics. The substantial locus discovery was achieved by combining data from several large-scale biobanks and consortia, where the MVP data constituted over 40% of all T2D cases. Furthermore, we present the largest cohort of African Americans including over 56K subjects, substantially greater than any African-specific T2D study published to date.

Analyses of coding variants identified 44 variants for T2D, including three pLoF variants in LPL, ANKDD1B and CCHCR1. 804 putative causal genes at both novel and previously reported loci were identified and 54 genes were found to be possible targets for FDA-approved drugs and chemical compounds. Our SNP-T2D interaction analyses identified several loci where the association between a genetic variant and a vascular outcome differed between people with T2D as compared to those without. We further found that a high polygenic risk for T2D strongly increased the risk for retinopathy in patients with T2D, and also for CKD, neuropathy, and PAD.

T2D is highly prevalent in people of African ancestry; however there are a total of three published T2D GWAS reports in this ancestral group with only 4 definitely detected loci^18,19,20. In our study with over 56K participants of recent African ancestry, we report 4 novel loci for T2D that are solely observed in this ancestral group, including one that is located on chromosome X. Of the previously reported loci, only rs3842770 (INS-IGF2) was replicated here. The reported HLA-B variant rs2244020¹⁸ did not replicate in our study; but we did observe a significant association with another SNP in the HLA region (rs10305420, OR 1.15, P = 8.5×10⁻⁹). We did not replicate the association with the variant rs7560163²⁰. Finally, 1 locus (rs73284431) that was recently identified in a large study conducted in Sub-Saharan African countries¹⁹ failed to replicate in our study consisting of African Americans despite comparable minor allele frequencies (9% vs 10%). Finally, we observed that one SLC30A8 variant showed a very prominent T2D effect in African Americans that was absent in other populations; mouse studies have shown that the human R138X LoF variant in SLC30A8 results in increased insulin secretory capacity²¹.

The sole presence of a coding variant near a tagging SNP does not constitute enough evidence to infer a causal association. However, recent exome-array genotyping of over 350K subjects identified 40 coding variants associated with T2D, of which 26 mapped to near known risk-associated loci²².

Similarly, an exome sequencing study in over 40K participants reported 15 variants associated with T2D, of which only 2 were not previously reported by GWAS¹³. Sequencing efforts are indispensable for identifying causal variants and genes related to disease, as well as providing insight into the contributions of ultra-rare alleles, but have also substantiated the value of array-based association studies.

Our transcriptome-wide analyses identified 804 putatively causal genes, including 54 genes that appear to be regulated by approved drugs and 687 genes that have not been previously reported. Some of these genes are already well established for T2D etiology (e.g. KCNJ11). Except for skeletal muscle, the tissues that showed strongest associations are not known to be of importance in T2D etiology. This raises the hypothesis that the eQTL associations are universal and not tissue dependent; if the gene is expressed in the respective tissue at reasonable levels an association will be detected. We did not observe any significant association in the alpha and beta islet cells, which could be the result of the small sample size (e.g. 30 alpha cells and 19 beta cells). In addition, whole islet transcriptomes are notoriously variable due to the large differences in islet composition among humans, and a few transcripts make up half the transcriptome²³.

Of particular clinical importance, we identified several genes which are therapeutic targets for medications in patients that are treated for cardiometabolic conditions. We identified two genes, SCN3A and SV2A whose expression is modified by anti-epileptic agents, and evidence exists showing that anti-epileptic agents may influence glucose regulation. A randomized-controlled trial has reported that the anticonvulsant valproic acid lowers blood glucose concentrations²⁴. The information from the gene-drug screen may facilitate future drug repurposing screens.

DEPICT enrichment analysis identified AKT2 subnetwork gene set to be show the highest association with T2D. Of interest is that AKT2 is critical to insulin signaling and not beta-cell function. The majority of the early era T2D GWAS studies²⁵ predominantly identified genetic variants that alter T2D risk through reduced beta-cell function, such as KCNJ11, HHEX, SLC30A8, CDKAL1, TCF7L2, CDKN2A/2B, and IGF2BP2. However, as the study sample size have substantially increased and the number of T2D-associated variants is approaching the 1.000 number mark, the cumulative effect of genetic variants on T2D is pointing towards a predominant role for insulin resistance.

When we evaluated polygenic risk scores and T2D-related outcomes, we observed that among vascular outcomes, the T2D gPRS was most significantly associated with retinopathy. It is possible that the use of the T2D gPRS provides an opportunity to identify patients who are at the highest risk of developing microvascular complications, such as retinopathy. Using the T2D gPRS, we observed significant associations with other T2D-related outcomes such as CKD, PAD, and neuropathy. Studies at specific loci using both common and rare coding variants will be required to understand pathways leading to T2D-related vascular outcomes.

In a SNP-T2D interactions analysis on T2D-related vascular outcomes we identified 13 loci where the effect on outcome was different between the strata of T2D, of which 3 occurred at previously established variants and 10 had not been previously reported. Our findings have clinical translational potential for risk-stratification and identify diabetic patients who are predisposed to develop subsequent vascular outcomes and present therapeutic opportunities to attenuate the risk of diabetes progression in individuals with T2D.

For T2D-related retinopathy, four variants were found to have different effect sizes between people with and without T2D. The strongest signal for interaction in relation to retinopathy was observed for GJA8. Deletion of this gene has been associated with eye abnormalities and retinopathy of prematurity in premature infants, inherited cataracts, visual impairment and cardiac defects and eye abnormalities^26–28. TCF7L2 is a known diabetes locus and its association with progression to retinopathy has been previously established²⁹. SLC18A2 is expressed in adult retina and retinal pigment epithelium tissues³⁰; this gene is involved in the transport of monoamines into secretory vesicles for exocytosis³¹. SVILP1 has been previously shown to be associated with thiamine (vitamin B1) prescription, which is frequently prescribed to people with blurry vision³².

For chronic kidney disease, two loci, UMOD and TENM3, were identified for gene-T2D interaction effects. UMOD encodes for uromodilin which is exclusively produced by the kidney tubule where it plays an important role in kidney and urine function³³. A large-scale study in over 133K participants has shown that the serum creatinine-lowering allele in UMOD (rs12917707) is more prevalent in diabetic individuals with CKD as compared to diabetic participants without CKD³⁴. Variation in TENM3 has been associated with cholangitis and kidney disorders in the UK Biobank ³⁵.

SNP-T2D interaction analysis of neuropathy identified one locus, NRP2. NRP2 encodes for neuropilin-2 which is an essential cell surface receptor involved in VEGF-dependent angiogenesis and sensory nerve regeneration^36,37.

For coronary heart disease, we identified several SNP-T2D interactions. Variation at 9p21 has been reported previously for CHD and T2D. SORT1 is a lipid-associated locus; in our analyses, allelic variation at this locus that decreases CHD risk and decreases lipids conferred a stronger protection in people with T2D compared to those without T2D. Coupled with findings in mice that identified SORT1 as a novel target of insulin signaling³⁸, our findings raise the hypothesis that SORT1 may contribute to altered hepatic apoB metabolism under insulin-resistant conditions.

The SNP rs71039916 is located nearby PDE3A, and collocates with a SNP (rs3752728, D’ = 0.867, r² = 0.08) that is associated with diastolic blood pressure^39,40. As a phosphodiesterase that reduces cAMP levels, the PDE3A protein limits protein kinase A/cAMP signaling, and has been shown to affect proliferation of vascular smooth muscle cells⁴¹. Cell line research has shown that cAMP levels might impact the regulation of insulin secretion in pancreatic β-cells⁴², and more recent gene ablation studies in mice have established that cAMP/CREB signaling controls the insulinotropic and anti-apoptotic effects of GLP-1 signaling in adult mouse β-cells⁴³. Subcutaneous adipose tissue of patients with T2D show increased PDE activity, and inverse correlations between total PDE3 activity and BMI have been reported in adipocytes⁴⁴.

We have conducted the largest discovery study for T2D to date, including over 220K T2D cases and 1.2 million controls. Over 21% of our discovery sample comprised of non-European participants; our African American sample included over 56K subjects. A unique strength in our study is the comprehensive individual level phenotype data available for analyzing T2D and related micro- and macrovascular outcomes. This allowed us to test for genetic interaction instead of evaluating effect differences between strata using summary statistics.

In summary, we have identified 318 novel genetic variants associated with T2D risk and T2D-related vascular outcomes, including 3 population-specific autosomal loci in African Americans, 8 chromosome-X variants, plus an additional 13 variants associated with differences in T2D-related micro- and macrovascular outcomes across diabetic stratum. Coding variant analyses and transcriptome wide analyses have identified many putative causal genes that could be pursued in functional mechanistic studies and provide an updated map of effects for gene-tissue pairs for T2D risk. Our gene-drug analyses provide a genetic basis to guide future drug repurposing studies. Lastly, our SNP-T2D interaction analyses elucidate differences in pathogenetic mechanisms of T2D-related vascular outcomes with potential clinical implications for preventing development of microvascular complications in T2D.

Online Methods

We conducted a large-scale multi-ethnic autosomal and X-chromosome T2D GWAS of common variants in over 1.4 million participants. We subsequently conducted analyses to facilitate the prioritization of these individual findings, including transcriptome-wide predicted gene expression for T2D, secondary signal analysis, T2D-related vascular outcomes analysis using SNP-T2D interaction analysis, coding variant mapping, phenome-wide association analyses of pLoF variants, and a drug repurposing screen.

Discovery cohort

The Million Veteran Program (MVP) is a large cohort of fully consented veterans of the United States military forces recruited from 63 participating Department of Veterans Affairs (VA) medical facilities³. Recruitment started in 2011, and all veterans were eligible for participation. Compared with the general population, the Million Veteran Program is overrepresented with participants of male gender, but the study is representative of the US population in terms of race and ethnicity. This study analyzed clinical data through July 2017 for participants who enrolled between January 2011 and October 2016. Across ancestries, the average age at study enrollment ranged from 56.1 for Asian to 68.2 for European participants (Supplementary Table 3). Average body mass index (BMI) ranged from 28.5 for Asians to 30.8 for African Americans. The proportion of males ranged from 87.2% for African Americans to 93.9% for Asians. The prevalence of T2D was 35.5% for Europeans, 36.4% for Asians, 42.1% for Hispanics, and 43.6% for African Americans. All MVP study participants provided blood samples for DNA extraction and genotyping, and completed surveys about their health, lifestyle, and military experiences. Consent to participate and permission to re-contact was provided after counseling by research staff and mailing of informational materials. Study participation includes consenting to access to the participant’s electronic health records for research purposes, data that captures a median follow-up time of 10.0 years at time of study enrollment. Each veteran’s electronic health care record is integrated into the MVP biorepository, including inpatient International Classification of Diseases (ICD-9-CM and ICD-10-CM) diagnosis codes, Current Procedural Terminology (CPT) procedure codes, clinical laboratory measurements, and reports of diagnostic imaging modalities. Researchers are provided data that is de-identified except for dates, and do not have the ability or authorization to link these details with a participant’s identity. Blood samples are collected by phlebotomists and banked at the VA Central Biorepository in Boston, where DNA is extracted and shipped to two external centers for genotyping. The Million Veteran Program received ethical and study protocol approval from the VA Central Institutional Review Board (cIRB) in accordance with the principles outlined in the Declaration of Helsinki.

Genotyping

DNA extracted from buffy coat was genotyped using a custom Affymetrix Axiom biobank array. The MVP 1.0 genotyping array contains a total of 723,305 SNPs, enriched for 1) low frequency variants in African and Hispanic populations, and 2) variants associated with diseases common to the VA population³.

Genotype quality-control

The MVP genomics working group applied standard quality control and genotype calling algorithms to the data in two batches using the Affymetrix Power Tools Suite (v1.18). Standard quality control pipelines were used to exclude duplicate samples, samples with more heterozygosity than expected, samples with an excess (> 2.5%) of missing genotype calls and samples with discordance of genetically inferred sex versus self-report. We excluded related individuals (halfway between second- and third-degree relatives or closer) as measured by the KING software⁴⁵. Before imputation, variants that were poorly called or that deviated from their expected allele frequency based on reference data from the 1000 Genomes Project were excluded ⁴⁶. After prephasing using EAGLE v2⁴⁷, genotypes from the 1000 Genomes Project phase 3, version 5 reference panel were imputed into MVP participants via Minimac3 software⁴⁸. The top 30 principal components were computed using FlashPCA⁴⁹ on a genotype dataset that included all MVP participants and an additional 2,504 individuals from 1000 Genomes. These 30 PCs were then used for the unification of self-reported race/ancestry and genetically inferred ancestry to compose ancestral groups used for stratification in subsequent association analyses⁵⁰.

Race and ethnicity

Information on race (Europeans, African Americans, Asians and Native Americans) and ethnicity (Hispanic, yes or no) was obtained based on self-report through centralized VA data collection methods using standardized survey forms, or through the use of information from the VA Corporate Data Warehouse or Observational Medical Outcomes Partnership data, when information from self-report survey was missing. Even though self-reported race/ethnicity was missing in 3.67% of study participants, 39.4% of participants had some form of discordant information between the various sources of which the data was extracted. Race and ethnicity categories were merged to form the following race or ancestral groups: Europeans, African Americans, Hispanics, and Asians using a unifying classification algorithm based on self-identified race/ethnicity and genetically inferred ancestral information, termed HARE (Harmonized Ancestry and Race/Ethnicity)⁵⁰. Using this approach, of the total of 351,820 genotyped individuals all but 6,257 (1.78%) were assigned to one of the four ancestral groups.

Phenotype classification

ICD-9-CM diagnosis codes from electronic health care records were available for MVP participants from as early as 1998. Participants were classified as a T2D case if they had 2 or more T2D-related diagnosis codes (ICD-9-CM 250.2x) from VA or fee basis inpatient stays or face-to-face primary care outpatient visits in the 731 days before the enrollment date up to July 1^st of 2017, excluding those with co-occurring diagnosis codes for T1D (250.1x), secondary or other diabetes or a medical condition that may cause diabetes (249.xx). Participants were selected as controls if they had no ICD-9-CM diagnosis code for type 1, type 2, or secondary diabetes mellitus up to July 2017.

For T2D-related vascular outcomes, the following definitions were used: CHD, at least one admission to a VA hospital with discharge diagnosis of admission for myocardial information, or at least one procedure code for revascularization (coronary artery bypass grafting, percutaneous coronary intervention), or at least 2 ICD-9-CM codes for CAD (410 to 414) registered on at least 2 separate encounters. Subjects were classified as having PAD if they had at least two of ICD-9-CM codes or CPT codes as outlined in the report of Klarin et al.¹⁴, or having one code and at least two visits to a vascular surgeon within a 14 month period. Acute ischemic stroke was defined if at least 1 ICD-9-CM discharge diagnosis code for stroke excluding head injury or rehab (433.x1, 434 [excluding 434.x0], and 436) was present⁵¹. CKD was classified as an estimated glomerular filtration rate <60 mL/min−1·1.73 m−2 on 2 separate measurements 90 days apart, or ICD-9-CM diagnosis codes for chronic renal failure (585) and/or a history of kidney transplantation (ICD-9-CM V42)⁵²; Neuropathy was defined using the following ICD-9-CM diagnosis codes: diabetic neuropathy (356.9, 250.6), amyotrophy (358.1), cranial nerve palsy (951.0, 951.1, 951.3), mono-neuropathy (354.0-355.9), Charcot’s arthropathy (713.5), polyneuropathy (357.2), neurogenic bladder (596.54), autonomic neuropathy (337.0, 337.1), orthostatic hypotension (458). Retinopathy was defined using ICD-9-DM diagnosis codes for: T2D with ophthalmic manifestations (250.50, 250.52), retinal detachments and defects (361.0, 361.1), disorders of vitreous body (379.2), other retinal disorders (362.0, 362.1, 362.3, 362.81, 362.83, 362.84), excluding ICD-9-CM codes associated with macular degeneration (362.5).

MVP analysis

We tested imputed SNPs that passed quality control (e.g. HWE > 1.0×10⁻¹⁰, INFO > 0.3, call rate > 0.975) for association with T2D through logistic regression assuming an additive model of variants with MAF > 0.1% in Europeans, and MAF > 1% in African Americans, Hispanics and Asians using PLINK2a software⁵³. Covariates included age, gender, and 10 principal components of genetic ancestry.

Other studies: Summary statistics available from previous reports in the DIAMANTE Consortium⁴, Biobank Japan⁷, Penn Medicine Biobank⁵, Malmö Diet and Cancer Study⁸, MedStar/PennCath studies⁹, and the Pakistani Genomic Resource⁶ were obtained for meta-analysis (Supplemental Table 2). All cohort were imputed using the 1000 Genomes Project phase 3, version 5 reference panel with exception of the DIAMANTE consortium where genotype calls were imputed using the Haplotype Reference Consortium reference panel⁵⁴. Briefly, following central and study-specific quality control protocols, an additional 917,392 Europeans, 2,647 African Americans, and 213,834 Asians were available for T2D meta-analysis. Only SNPs with ancestry-specific MAF > 1% in these studies were used. Within each study a logistic regression model was used where T2D was set as the dependent variable, imputed SNP as the independent variable, and analyses were adjusted for age, gender, and the top ten principal components of genetic ancestry.

Meta-analysis

We aggregated association summary statistics from the above mentioned studies to perform ancestry-specific and multi-ethnic meta-analysis. The association summary statistics for each analysis were meta-analyzed in a fixed-effects model using METAL with inverse-variance weighting of log odds ratios⁵⁵. From the meta-analysis, all variants were extracted that passed quality control.

Between-study allelic effect size heterogeneity was assessed with Cochran’s Q statistic as implemented in METAL. Variants were considered genome-wide significant if they passed the conventional p-value threshold of 5.0×10-8. We excluded variants with a high amount of heterogeneity (I² statistic > 75%) across the four evaluated ancestral groups.

Novel loci

Novel loci were defined as those in which the lead variant was not in LD or physically nearby (500 Kb) to a previously reported lead GWAS variant(s). For each locus, we first searched ± 500 Kb surrounding the lead SNP to ensure that potential long-range genetic influences were assessed.

Chromosome-X analysis

X chromosome genotypes were processed separately from autosomal genotypes. During prephasing and imputation an additional flag of -chrX was added. In post-imputation stage, the XWAS QC pipeline was employed to remove variants in the pseudo-autosomal regions, variants that were not in HWE in females (P > 1.0×10⁻⁶), variants with differential allele frequencies, and variants with differential missingness (p<10−7) between males and female controls (Supplemental Figure 2)⁵⁶. For each ancestry-specific subset, we performed sex-stratified analysis where dosages (number of chromosome copies) for the X-chromosome in T2D cases are equivalent to controls within each sex stratum. For the meta-analysis of sexes, differences in dosage compensation should be considered between males (with one X chromosome) and females (with two X chromosomes). In a separate verification analysis we performed a chromosome-X association analysis on dosage data using PLINK (--xchr-model) which produced consistent results compared with analysis of best-guess data where the X chromosome was coded as 0/2 for males instead of 0/1. The output from sex-stratified chromosome-X analyses were first meta-analyzed into a multi-ethnic sex-stratified analysis. Then, the multi-ethnic results from males and multi-ethnic results from females were meta-analyzed, where none of the analyzed variants was detected using the Cochrane test for heterogeneity (P < 5.0×10⁻⁸). Results presented include overall effect estimates, male-specific effect estimates, female-specific estimates, and the heterogeneity p-value (Supplementary Table 13).

Secondary signal analysis

GCTA⁵⁷ was used to conduct conditional analyses to detect ancestry-specific distinct association signals at each of the lead SNPs utilizing the meta-analyzed GWAS summary statistics in Europeans and African Americans. Race-stratified MVP cohorts (197,066 Europeans and 53,445 African Americans) were used to model LD patterns between variants. The reference panel of genotypes consisted of the variants with allele frequencies > 0.1% in Europeans and >1% in African Americans that passed quality control criteria in the MVP-specific T2D GWAS (INFO > 0.3, HWE P > 1.0×10⁻¹⁰, call rate > 0.975). For each lead SNP, conditionally independent variants that reached locus-wide significance (P < 1.0×10⁻⁵) were considered as secondary signals of distinct association. If the minimum distance between any distinct signals from two separate loci was less than 500 Kb, we performed additional conditional analysis including both regions and reassessed the independence of each signal. Finally, the predicted conditionally independent variants were tested in a logistic regression model in the MVP study only to empirically validate the signal, and results are shown in Supplemental Tables 7 and 10.

Coding variant mapping

All imputed variants in MVP were evaluated with Ensemble variant effect predictor⁵⁸, and all predicted LoF and missense variants were extracted. The LD was calculated with established variants, and in a second round of analysis the effect of the missense variant was calculated conditioning on the lead SNP to assess how much residual variance the SNP explains in T2D risk. A p-value of 0.05 was considered as statistically significant.

S-PrediXcan and colocalization analyses

Genetically predicted gene expression and its association with T2D risk was estimated using S-PrediXcan¹⁵. Input included meta-analyzed summary statistics from the European T2D GWAS for common variants and reference eQTL summary statistics for 52 tissues including 48 tissues from GTEx⁵⁹, 2 cell types in kidney tissue (glomerulus and tubulus)⁶⁰, and 2 cell types in pancreatic islet tissue (alpha and beta)⁶¹. Analyses incorporated genotype covariance matrices based on 1000 Genomes⁴⁶ European populations to account for LD structure. Colocalization analysis was performed to address the issue of LD-contamination in S-PrediXcan analyses⁶². Input data were identical to those evaluated by S-PrediXcan and colocalization was restricted to only variants included in the S-PrediXcan gene⁶³. The output is shown in Supplemental Table 8.

Phenotypic variance explained by SNPs

The MVP European cohort (69,869 T2D cases and 127,197 controls) was used to calculate the variance explained by the 558 genome-wide-significant sentinel SNPs from the multi-ethnic meta-analysis. A logistic regression model was performed assuming additive effects where T2D was set as the dependent variable, and the sentinel SNPs as independent variables, adjusting for sex, age, and 10 principal components. The baseline model included age, gender and 10 principal components. The variance explained was calculated with the r² statistic, calculated as 1 – (residual deviance / null deviance).

Phenome-wide association analysis

For the three LoF variants that were identified using coding variant analysis, we performed a PheWAS study⁶⁴ to fully leverage the diverse nature of the Million Veteran Program as well as the full catalog of relevant ICD-9-CM diagnosis and CPT procedure codes (Supplementary Table 18). Of genotyped veterans, participants were included in the PheWAS analysis if their respective electronic health record reflected two or more separate encounters in the VA Healthcare System in each of the two years prior to enrollment in MVP. Using this method, a total of 277,531 veterans spanning 21,209,658 available ICD-9 diagnosis codes were available for PheWAS analysis. We restricted our analysis on the subgroup of 197,066 European participants. Diagnosis and procedure codes were collapsed to clinical disease groups and corresponding controls using predefined groupings⁶⁵. Phenotypes were required to have a case count over 25 in order to be included in the PheWAS analysis, and a multiple testing thresholds for statistical significance was set to P < 2.8×10⁻⁵ (Bonferroni method). Each of the previously unpublished LoF variants (ANKDD1B p.Trp480* and CCHCR1 p.Trp78*) were tested using logistic regression adjusting for age, sex, and 10 principal components in an additive effects model using the PheWAS R package in R v3.2.0⁶⁶. The results from these analyses are reported as odds ratios, in which the estimate is the average change in odds of the PheWAS trait per weighted T2D-increasing allele (Supplementary Table 18, Supplementary Figure 4).

Analysis of T2D-related outcomes

Genetic data on European participants was separately analyzed using outcomes as a binary outcome, and T2D as an interaction variable with SNPs using interaction analysis with robust variance to reduce effect heteroscedacity (e.g. unequal variance between strata) using SUGEN software (v8.8)⁶⁷. We evaluated the interaction between SNP and presence of T2D status using an interaction term for the two independent variables. Due to the binary nature of the outcome, the standard output from the interaction effect estimate are interpreted on a multiplicative scale, e.g., the departure from the product of the two independent effects. To obtain interaction on an additive scale (e.g. departure from the sum of the independent effects), we calculated the relative excess risk due to interaction (RERI) metric, a well-known analytic method in classic epidemiologic literature⁶⁸. In case-control studies using the linear additive odds-ratio model as proposed by Richardson and Kaufman⁶⁸ in our study has the form of:

In which the coefficient β₃ measures the departure from additivity of exposure effect on an odds ratio scale; that is

Where and note the absence of the respective covariates.

In contrast to the standard exponential odds model, where the effects of explanatory variables are additive on an exponential scale, we performed analysis using a linear odds model, in order to quantify the excess odds per unit of the given explanatory variables on the outcome. In this model, the excess relative risk due to interaction is an estimate of the excess odds on a linear scale due to the interaction between two explanatory variables (in our case, T2D state and genotype dosage of the tested variant). In the SNPxT2D interaction analysis we used a significance threshold of P < 5.0×10⁻⁰⁸ to denote variants that statistically different effect sizes between participants with T2D as compared those without. An additional filter was applied, and variants for which the effect size in at least one of the two T2D strata was nominally significant at P < 0.001 were included. Manhattan plots and the table are used to represent the interaction coefficients on this scale.

Polygenic risk scores and risk of T2D and related outcomes

We constructed a genome-wide polygenic risk score (gPRS) for T2D in the MVP participants of European ancestry by calculating a linear combination of weights derived from the Europeans in the DIAMANTE Consortium⁴ using the prune and threshold method in PRSice software⁶⁹. After an initial sensitivity analysis, the r² threshold for pruning was set to 0.8, and the P-value for significance threshold was set to 0.05. The gPRSs were divided into deciles and the risk of T2D was assessed using a logistic regression model using the lowest decile as a reference (e.g. the 10% of participants with lowest of T2D gPRS), together with the potential confounding factors of age, gender, BMI, and the first 10 principal components of European ancestry. An additional outcomes analysis was performed to evaluate to what extent a T2D gPRS is predictive of T2D-induced morbidities. The dataset was restricted to subjects with T2D, and a stratum-restricted T2D gPRS deciles were generated. Logistic regression models were applied where the micro- and macrovascular conditions were modeled as outcomes, and independent variables included strata-restricted gPRS deciles, age, gender, and the first 10 principal components of European ancestry. The data were visualized using shape-plots.

Heritability estimates and genetic correlations with other complex traits and diseases

LD-score regression was used to estimate the heritability coefficient, and subsequently population and sample prevalence estimates were applied to estimate heritability on the liability scale⁷⁰. A genome-wide genetic correlation analysis was performed to investigate possible co-regulation or a shared genetic basis between T2D and other complex traits and diseases. Pairwise genetic correlation coefficients were estimated between the meta-analyzed T2D GWAS summary output in Europeans and each of 774 precomputed and publicly available GWAS summary statistics for complex traits and diseases by using LD score regression through LD Hub v1.9.3 (http://ldsc.broadinstitute.org). Statistical significance was set to a Bonferroni-corrected level of P < 6.5×10⁻⁵.

Enrichment and pathway analyses

Tissue enrichment for S-PrediXcan results was evaluated by calculating exact P-values for under- or over-enrichment based on the cumulative distribution function of the hypergeometric distribution. The Bonferroni-corrected threshold for significance was P < 0.001 considering evaluation of 52 tissues.

Enrichment analyses in DEPICT ⁷¹ were conducted using genome-wide significant (P < 5.0×10⁻⁸) T2D GWAS lead SNPs. DEPICT is based on predefined phenotypic gene sets from multiple databases and Affymetrix HGU133a2.0 expression microarray data from >37k subjects to build highly-expressed gene sets for Medical Subject Heading (MeSH) tissue and cell type annotations. Output includes a P-value for enrichment and a yes/no indicator of whether the FDR q-value is significant (P < 0.05). Tissue and gene-set enrichment features were considered.

Evaluation of drug classes for genes with associations with gene expression

To identify drug-gene pairs that may be leads for repurposing or may be attractive leads for novel inhibitory drugs, we identified drugs targeting genes whose predicted expression was significantly associated with T2D risk in S-PrediXcan analyses and which we predicted would lower blood glucose based on direction of effect on T2D risk with increasing gene expression and drug action (activator or inhibitor). Medications with a primary indication for diabetes and medications with adverse drug events for diabetic patients were evaluated using the SIDe Effect Resource (SIDER)¹⁶. Medications targeting genes were queried using DGIdb¹⁷. These drug targets represent a set of genes that are both likely to be involved in glucose regulation in one or more tissues and can be targeted by drugs. Genes and medications identified in this analysis are presented in Supplementary Table 14.

Data Availability

The full summary level association data from the trans-ancestry meta-analysis from this report will be available through dbGAP (accession codes will be available before publication).

Competing Interests

None of the sponsors of the following authors had a role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; and preparation, review, or approval of the manuscript. D.S. has received support from the British Heart Foundation, Pfizer, Regeneron, Genentech, and Eli Lilly pharmaceuticals. L.S.P. has served on Scientific Advisory Boards for Janssen, and received research support from Abbvie, Merck, Amylin, Eli Lilly, Novo Nordisk, Sanofi, PhaseBio, Roche, Abbvie, Vascular Pharmaceuticals, Janssen, Glaxo SmithKline, Pfizer, Kowa, and the Cystic Fibrosis Foundation. L.S.P. is a cofounder, officer, board member, and stockholder of a diabetes management-related software company names Diasyst, Inc. S.L.D. has received research grant support from the following for-profit companies through the University of Utah or the Western Institute for Biomedical Research (VA Salt Lake City’s affiliated non-profit): AbbVie Inc., Anolinx LLC, Astellas Pharma Inc., AstraZeneca Pharmaceuticals LP, Boehringer Ingelheim International GmbH, Celgene Corporation, Eli Lilly and Company, Genentech Inc., Genomic Health, Inc., Gilead Sciences Inc., GlaxoSmithKline PLC, Innocrin Pharmaceuticals Inc., Janssen Pharmaceuticals, Inc., Kantar Health, Myriad Genetic Laboratories, Inc., Novartis International AG, and PAREXEL International Corporation.

Ethics statement

The Central Veterans Affairs Institutional Review Board (IRB) and site-specific Research and Development Committees approved the Million Veteran Program study. The Vanderbilt University Medical Center IRB approved the use of BioVU data for this study. All other cohorts participating in this meta-analysis have ethical approval from their local institutions. All relevant ethical regulations were followed.

Consortia

VA Million Veteran Program

Regeneron Genetics Center

The HPAP Consortium

Contributions

M.V., J.M.K., K.-M.C., D.S., B.F.V., P.S.T., C.J.O. were responsible for the concept and design. The acquisition, analysis or interpretation of data were performed by M.V., J.M.K., K.-M.C., D.S., B.F.V., P.S.T., R.L.J., C.T., T.L.A., J.E.H., J.Z., J.H., K.M.L, D.K., S.P., J.D., O.M., A.R., N.Q., S.S.S., S.H., I.H.Q, M.N.A, U.J., A.J., S.A., X.S., L.G., K.K., K.S., Y.V.S., S.L.D., K.C., J.S.L., J.M.G., L.S.P., D.R.M., J.B.M., P.R., P.W.W., T.L.E., D.J.R., S.M.D., and C.J.O. The authors M.V. and D.S. drafted the manuscript. The critical revision of the manuscript for important intellectual content was carried out by M.V., J.M.K., K.-M.C., D.S., B.F.V., P.S.T., C.T., J.Z., J.H., D.K., X.S., L.G., K.K., K.S., L.S.P., J.B.M., P.R., T.L.E., S.M.D., and C.J.O. Finally, K.-M.C., D.S., and B.F.V. provided administrative, technical, or material support.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

The full summary level association data from the trans-ancestry, European, African American, Hispanic, and Asian meta-analysis from this report will be available through dbGAP (Accession codes will be available before publication).

Code availability

Imputation was performed using MiniMac4 and EAGLE, association analysis was performed using PLINK2A and XWAS. Post-GWAS processing software include: FlashPCA, KING, METAL, GCTA-COJO, S-PrediXcan, SUGEN, DEPICT, SIDER, and DGidb as outlined in the Online Methods. Clear code for analysis is available at their associated websites. Additional analyses were performed in R-3.2.

Acknowledgements

This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration and was supported by award no. MVP000. This publication does not represent the views of the Department of Veterans Affairs, the US Food and Drug Administration, or the US Government. This research was also supported by funding from: the Department of Veterans Affairs award I01-BX003362 (P.S.T. and K.M.C) and the VA Informatics and Computing Infrastructure (VINCI) VA HSR RES 130457 (S.L.D) B.F.V. acknowledges support for this work from the NIH/NIDDK (DK101478), the NIH/NHGRI (HG010067) and a Linda Pechenik Montague Investigator award. K.M.C, S.M.D, J.M.G, C.J.O, L.S.P, and P.S.T. are supported by the VA Cooperative Studies Program. S.M.D. is supported by the Veterans Administration [IK2-CX001780]. D.K. is supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health [T32 HL007734]. L.S.P. is supported in part by VA awards I01-CX001025, and I01CX001737, NIH awards R21DK099716, U01 DK091958, U01 DK098246, P30DK111024, and R03AI133172, and a Cystic Fibrosis Foundation award PHILLI12A0. We thank all study participants for their contribution. Data on T2D have been contributed by investigators from DIAMANTE Consortium, Biobank Japan, Malmö Diet and Cancer Study, PennCath, MedStar, Pakistan Genomic Resource, Penn Medicine Biobank, and Regeneron Genetics Center. Data on stroke were provided by MEGASTROKE investigators, and data on CKD have been contributed by CKDgen investigators. Data on alpha and beta islet cells have been contributed by the HPAP Consortium. Data on coronary artery disease have been contributed by the CARDIoGRAMplusC4D investigators.

References

1.↵
IDF Diabetes Atlas, 8th edn. International Diabetes Federation (2017).
2.↵
Standards of Medical Care in Diabetes, 2018. Diabetes Care 41, S1–S2 (2018).
OpenUrl FREE Full Text
3.↵
Gaziano, J.M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70, 214–23 (2016).
OpenUrl CrossRef PubMed
4.↵
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50, 1505–1513 (2018).
OpenUrl CrossRef PubMed
5.↵
Levin, M.G. et al. Genomic Risk Stratification Predicts All-Cause Mortality After Cardiac Catheterization. Circ Genom Precis Med 11, e002352 (2018).
OpenUrl
6.↵
Saleheen, D. et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544, 235–239 (2017).
OpenUrl CrossRef PubMed
7.↵
Suzuki, K. et al. Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. Nat Genet 51, 379–386 (2019).
OpenUrl CrossRef
8.↵
Berglund, G., Elmstahl, S., Janzon, L. & Larsson, S.A. The Malmo Diet and Cancer Study. Design and feasibility. J Intern Med 233, 45–51 (1993).
OpenUrl CrossRef PubMed Web of Science
9.↵
Reilly, M.P. et al. Identification of ADAMTS7 as a novel locus for coronary atherosclerosis and association of ABO with myocardial infarction in the presence of coronary atherosclerosis: two genome-wide association studies. Lancet 377, 383–92 (2011).
OpenUrl CrossRef PubMed Web of Science
10.↵
Bonas-Guarch, S. et al. Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. Nat Commun 9, 321 (2018).
OpenUrl CrossRef
11.↵
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun 9, 2941 (2018).
OpenUrl CrossRef PubMed
12.↵
Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 46, 357–63 (2014).
OpenUrl CrossRef PubMed
13.↵
Flannick, J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019).
OpenUrl CrossRef PubMed
14.↵
Klarin, D. et al. Genome-wide association study of peripheral artery disease in the Million Veteran Program. Nat Med 25, 1274–1279 (2019).
OpenUrl
15.↵
Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 9, 1825 (2018).
OpenUrl CrossRef PubMed
16.↵
Kuhn, M., Letunic, I., Jensen, L.J. & Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res 44, D1075–9 (2016).
OpenUrl CrossRef PubMed
17.↵
Cotto, K.C. et al. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Nucleic Acids Res 46, D1068–D1073 (2018).
OpenUrl CrossRef PubMed
18.↵
Ng, M.C. et al. Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet 10, e1004517 (2014).
OpenUrl CrossRef PubMed
19.↵
Chen, J. et al. Genome-wide association study of type 2 diabetes in Africa. Diabetologia 62, 1204–1211 (2019).
OpenUrl
20.↵
Palmer, N.D. et al. A genome-wide association search for type 2 diabetes genes in African Americans. PLoS One 7, e29202 (2012).
OpenUrl CrossRef PubMed
21.↵
Kleiner, S. et al. Mice harboring the human SLC30A8 R138X loss-of-function mutation have increased insulin secretory capacity. Proc Natl Acad Sci U S A 115, E7642–E7649 (2018).
OpenUrl Abstract/FREE Full Text
22.↵
Mahajan, A. et al. Refining the accuracy of validated target identification through coding variant fine- mapping in type 2 diabetes. Nat Genet 50, 559–571 (2018).
OpenUrl CrossRef PubMed
23.↵
Carrano, A.C., Mulas, F., Zeng, C. & Sander, M. Interrogating islets in health and disease with single- cell technologies. Mol Metab 6, 991–1001 (2017).
OpenUrl
24.↵
Martin, C.K., Han, H., Anton, S.D., Greenway, F.L. & Smith, S.R. Effect of valproic acid on body weight, food intake, physical activity and hormones: results of a randomized controlled trial. J Psychopharmacol 23, 814–25 (2009).
OpenUrl CrossRef PubMed
25.↵
Perry, J.R. & Frayling, T.M. New gene variants alter type 2 diabetes risk predominantly through reduced beta-cell function. Curr Opin Clin Nutr Metab Care 11, 371–7 (2008).
OpenUrl CrossRef PubMed Web of Science
26.↵
Buse, M. et al. Expanding the phenotype of reciprocal 1q21.1 deletions and duplications: a case series. Ital J Pediatr 43, 61 (2017).
OpenUrl
27.
Devi, R.R. & Vijayalakshmi, P. Novel mutations in GJA8 associated with autosomal dominant congenital cataract and microcornea. Mol Vis 12, 190–5 (2006).
OpenUrl PubMed Web of Science
28.↵
Mackay, D.S., Bennett, T.M., Culican, S.M. & Shiels, A. Exome sequencing identifies novel and recurrent mutations in GJA8 and CRYGD associated with inherited cataract. Hum Genomics 8, 19 (2014).
OpenUrl
29.↵
Luo, J. et al. TCF7L2 variation and proliferative diabetic retinopathy. Diabetes 62, 2613–7 (2013).
OpenUrl Abstract/FREE Full Text
30.↵
Young, T.L. et al. Whole genome expression profiling of normal human fetal and adult ocular tissues. Exp Eye Res 116, 265–78 (2013).
OpenUrl
31.↵
Eiden, L.E., Schafer, M.K., Weihe, E. & Schutz, B. The vesicular amine transporter family (SLC18): amine/proton antiporters required for vesicular accumulation and regulated exocytotic secretion of monoamines and acetylcholine. Pflugers Arch 447, 636–40 (2004).
OpenUrl CrossRef PubMed Web of Science
32.↵
Sharma, P. & Sharma, R. Toxic optic neuropathy. Indian J Ophthalmol 59, 137–41 (2011).
OpenUrl PubMed
33.↵
Tokonami, N. et al. Uromodulin is expressed in the distal convoluted tubule, where it is critical for regulation of the sodium chloride cotransporter NCC. Kidney Int 94, 701–715 (2018).
OpenUrl PubMed
34.↵
Pattaro, C. et al. Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat Commun 7, 10023 (2016).
OpenUrl CrossRef PubMed
35.↵
Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat Genet 50, 1593–1599 (2018).
OpenUrl CrossRef PubMed
36.↵
Guo, H.F. & Vander Kooi, C.W. Neuropilin Functions as an Essential Cell Surface Receptor. J Biol Chem 290, 29120–6 (2015).
OpenUrl Abstract/FREE Full Text
37.↵
Lieb, W. et al. Genome-wide meta-analyses of plasma renin activity and concentration reveal association with the kininogen 1 and prekallikrein genes. Circ Cardiovasc Genet 8, 131–40 (2015).
OpenUrl Abstract/FREE Full Text
38.↵
Li, J., Matye, D.J. & Li, T. Insulin resistance induces posttranslational hepatic sortilin 1 degradation in mice. J Biol Chem 290, 11526–36 (2015).
OpenUrl Abstract/FREE Full Text
39.↵
Ehret, G.B. et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nat Genet 48, 1171–1184 (2016).
OpenUrl CrossRef PubMed
40.↵
Sung, Y.J. et al. A Large-Scale Multi-ancestry Genome-wide Study Accounting for Smoking Behavior Identifies Multiple Significant Loci for Blood Pressure. Am J Hum Genet 102, 375–400 (2018).
OpenUrl CrossRef
41.↵
Maass, P.G. et al. PDE3A mutations cause autosomal dominant hypertension with brachydactyly. Nat Genet 47, 647–53 (2015).
OpenUrl CrossRef PubMed
42.↵
Landa, L.R., Jr.. et al. Interplay of Ca2+ and cAMP signaling in the insulin-secreting MIN6 beta-cell line. J Biol Chem 280, 31294–302 (2005).
OpenUrl Abstract/FREE Full Text
43.↵
Shin, S. et al. CREB mediates the insulinotropic and anti-apoptotic effects of GLP-1 signaling in adult mouse beta-cells. Mol Metab 3, 803–12 (2014).
OpenUrl
44.↵
Omar, B., Banke, E., Ekelund, M., Frederiksen, S. & Degerman, E. Alterations in cyclic nucleotide phosphodiesterase activities in omental and subcutaneous adipose tissues in human obesity. Nutr Diabetes 1, e13 (2011).
OpenUrl PubMed
45.↵
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–73 (2010).
OpenUrl CrossRef PubMed Web of Science
46.↵
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
OpenUrl CrossRef PubMed
47.↵
Loh, P.R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 48, 1443–1448 (2016).
OpenUrl CrossRef PubMed
48.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed
49.↵
Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One 9, e93766 (2014).
OpenUrl CrossRef PubMed
50.↵
Fang, H. et al. Harmonizing genetic ancestry and self-identified race/ethnicity in genome-wide association studies. Am J Hum Gen 105, 763–772 (2019).
OpenUrl
51.↵
Tirschwell, D.L. & Longstreth, W.T., Jr.. Validating administrative data in stroke research. Stroke 33, 2465–70 (2002).
OpenUrl Abstract/FREE Full Text
52.↵
Levey, A.S. et al. Definition and classification of chronic kidney disease: a position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney Int 67, 2089–100 (2005).
OpenUrl CrossRef PubMed Web of Science
53.↵
Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
OpenUrl CrossRef PubMed
54.↵
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–83 (2016).
OpenUrl CrossRef PubMed
55.↵
Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–1 (2010).
OpenUrl CrossRef PubMed Web of Science
56.↵
Gao, F. et al. XWAS: A Software Toolset for Genetic Data Analysis and Association Studies of the X Chromosome. J Hered 106, 666–71 (2015).
OpenUrl CrossRef PubMed
57.↵
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88, 76–82 (2011).
OpenUrl CrossRef PubMed
58.↵
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016).
OpenUrl CrossRef PubMed
59.↵
Consortium, G.T. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–5 (2013).
OpenUrl CrossRef PubMed
60.↵
Ko, Y.A. et al. Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease. Am J Hum Genet 100, 940–953 (2017).
OpenUrl CrossRef PubMed
61.↵
Ackermann, A.M., Wang, Z., Schug, J., Naji, A. & Kaestner, K.H. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes. Mol Metab 5, 233–244 (2016).
OpenUrl
62.↵
Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–8 (2015).
OpenUrl CrossRef PubMed
63.↵
Hormozdiari, F. et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet 99, 1245–1260 (2016).
OpenUrl CrossRef PubMed
64.↵
Verma, A. & Ritchie, M.D. Current Scope and Challenges in Phenome-Wide Association Studies. Curr Epidemiol Rep 4, 321–329 (2017).
OpenUrl
65.↵
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene- disease associations. Bioinformatics 26, 1205–10 (2010).
OpenUrl CrossRef PubMed Web of Science
66.↵
Carroll, R.J., Bastarache, L. & Denny, J.C. R PheWAS: data analysis and plotting tools for phenome- wide association studies in the R environment. Bioinformatics 30, 2375–6 (2014).
OpenUrl CrossRef PubMed Web of Science
67.↵
Lin, D.Y. et al. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 95, 675–88 (2014).
OpenUrl
68.↵
Richardson, D.B. & Kaufman, J.S. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol 169, 756–60 (2009).
OpenUrl CrossRef PubMed Web of Science
69.↵
Choi, S.W. & O’Reilly, P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8(2019).
70.↵
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–41 (2015).
OpenUrl CrossRef PubMed
71.↵
Pers, T.H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6, 5890 (2015).
OpenUrl CrossRef PubMed

Posted November 25, 2019.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Genetic and Genomic Medicine

Subject Areas

All Articles

Addiction Medicine (317)
Allergy and Immunology (621)
Anesthesia (162)
Cardiovascular Medicine (2318)
Dentistry and Oral Medicine (283)
Dermatology (205)
Emergency Medicine (373)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (821)
Epidemiology (11653)
Forensic Medicine (10)
Gastroenterology (687)
Genetic and Genomic Medicine (3656)
Geriatric Medicine (344)
Health Economics (625)
Health Informatics (2340)
Health Policy (922)
Health Systems and Quality Improvement (874)
Hematology (337)
HIV/AIDS (765)
Infectious Diseases (except HIV/AIDS) (13222)
Intensive Care and Critical Care Medicine (763)
Medical Education (362)
Medical Ethics (102)
Nephrology (394)
Neurology (3407)
Nursing (193)
Nutrition (512)
Obstetrics and Gynecology (658)
Occupational and Environmental Health (657)
Oncology (1787)
Ophthalmology (529)
Orthopedics (212)
Otolaryngology (284)
Pain Medicine (227)
Palliative Medicine (66)
Pathology (442)
Pediatrics (1014)
Pharmacology and Therapeutics (423)
Primary Care Research (411)
Psychiatry and Clinical Psychology (3118)
Public and Global Health (6047)
Radiology and Imaging (1243)
Rehabilitation Medicine and Physical Therapy (721)
Respiratory Medicine (816)
Rheumatology (373)
Sexual and Reproductive Health (364)
Sports Medicine (319)
Surgery (391)
Toxicology (50)
Transplantation (171)
Urology (144)

[1] 1.↵
IDF Diabetes Atlas, 8th edn. International Diabetes Federation (2017).

[2] 2.↵
Standards of Medical Care in Diabetes, 2018. Diabetes Care 41, S1–S2 (2018).
OpenUrl FREE Full Text

[3] 3.↵
Gaziano, J.M. et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol 70, 214–23 (2016).
OpenUrl CrossRef PubMed

[4] 4.↵
Mahajan, A. et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50, 1505–1513 (2018).
OpenUrl CrossRef PubMed

[5] 5.↵
Levin, M.G. et al. Genomic Risk Stratification Predicts All-Cause Mortality After Cardiac Catheterization. Circ Genom Precis Med 11, e002352 (2018).
OpenUrl

[6] 6.↵
Saleheen, D. et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544, 235–239 (2017).
OpenUrl CrossRef PubMed

[7] 7.↵
Suzuki, K. et al. Identification of 28 new susceptibility loci for type 2 diabetes in the Japanese population. Nat Genet 51, 379–386 (2019).
OpenUrl CrossRef

[8] 8.↵
Berglund, G., Elmstahl, S., Janzon, L. & Larsson, S.A. The Malmo Diet and Cancer Study. Design and feasibility. J Intern Med 233, 45–51 (1993).
OpenUrl CrossRef PubMed Web of Science

[9] 9.↵
Reilly, M.P. et al. Identification of ADAMTS7 as a novel locus for coronary atherosclerosis and association of ABO with myocardial infarction in the presence of coronary atherosclerosis: two genome-wide association studies. Lancet 377, 383–92 (2011).
OpenUrl CrossRef PubMed Web of Science

[10] 10.↵
Bonas-Guarch, S. et al. Re-analysis of public genetic data reveals a rare X-chromosomal variant associated with type 2 diabetes. Nat Commun 9, 321 (2018).
OpenUrl CrossRef

[11] 11.↵
Xue, A. et al. Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes. Nat Commun 9, 2941 (2018).
OpenUrl CrossRef PubMed

[12] 12.↵
Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 46, 357–63 (2014).
OpenUrl CrossRef PubMed

[13] 13.↵
Flannick, J. et al. Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature 570, 71–76 (2019).
OpenUrl CrossRef PubMed

[14] 14.↵
Klarin, D. et al. Genome-wide association study of peripheral artery disease in the Million Veteran Program. Nat Med 25, 1274–1279 (2019).
OpenUrl

[15] 15.↵
Barbeira, A.N. et al. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat Commun 9, 1825 (2018).
OpenUrl CrossRef PubMed

[16] 16.↵
Kuhn, M., Letunic, I., Jensen, L.J. & Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res 44, D1075–9 (2016).
OpenUrl CrossRef PubMed

[17] 17.↵
Cotto, K.C. et al. DGIdb 3.0: a redesign and expansion of the drug-gene interaction database. Nucleic Acids Res 46, D1068–D1073 (2018).
OpenUrl CrossRef PubMed

[18] 18.↵
Ng, M.C. et al. Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet 10, e1004517 (2014).
OpenUrl CrossRef PubMed

[19] 19.↵
Chen, J. et al. Genome-wide association study of type 2 diabetes in Africa. Diabetologia 62, 1204–1211 (2019).
OpenUrl

[20] 20.↵
Palmer, N.D. et al. A genome-wide association search for type 2 diabetes genes in African Americans. PLoS One 7, e29202 (2012).
OpenUrl CrossRef PubMed

[21] 21.↵
Kleiner, S. et al. Mice harboring the human SLC30A8 R138X loss-of-function mutation have increased insulin secretory capacity. Proc Natl Acad Sci U S A 115, E7642–E7649 (2018).
OpenUrl Abstract/FREE Full Text

[22] 22.↵
Mahajan, A. et al. Refining the accuracy of validated target identification through coding variant fine- mapping in type 2 diabetes. Nat Genet 50, 559–571 (2018).
OpenUrl CrossRef PubMed

[23] 23.↵
Carrano, A.C., Mulas, F., Zeng, C. & Sander, M. Interrogating islets in health and disease with single- cell technologies. Mol Metab 6, 991–1001 (2017).
OpenUrl

[24] 24.↵
Martin, C.K., Han, H., Anton, S.D., Greenway, F.L. & Smith, S.R. Effect of valproic acid on body weight, food intake, physical activity and hormones: results of a randomized controlled trial. J Psychopharmacol 23, 814–25 (2009).
OpenUrl CrossRef PubMed

[25] 25.↵
Perry, J.R. & Frayling, T.M. New gene variants alter type 2 diabetes risk predominantly through reduced beta-cell function. Curr Opin Clin Nutr Metab Care 11, 371–7 (2008).
OpenUrl CrossRef PubMed Web of Science

[26] 26.↵
Buse, M. et al. Expanding the phenotype of reciprocal 1q21.1 deletions and duplications: a case series. Ital J Pediatr 43, 61 (2017).
OpenUrl

[27] 27.
Devi, R.R. & Vijayalakshmi, P. Novel mutations in GJA8 associated with autosomal dominant congenital cataract and microcornea. Mol Vis 12, 190–5 (2006).
OpenUrl PubMed Web of Science

[28] 28.↵
Mackay, D.S., Bennett, T.M., Culican, S.M. & Shiels, A. Exome sequencing identifies novel and recurrent mutations in GJA8 and CRYGD associated with inherited cataract. Hum Genomics 8, 19 (2014).
OpenUrl

[29] 29.↵
Luo, J. et al. TCF7L2 variation and proliferative diabetic retinopathy. Diabetes 62, 2613–7 (2013).
OpenUrl Abstract/FREE Full Text

[30] 30.↵
Young, T.L. et al. Whole genome expression profiling of normal human fetal and adult ocular tissues. Exp Eye Res 116, 265–78 (2013).
OpenUrl

[31] 31.↵
Eiden, L.E., Schafer, M.K., Weihe, E. & Schutz, B. The vesicular amine transporter family (SLC18): amine/proton antiporters required for vesicular accumulation and regulated exocytotic secretion of monoamines and acetylcholine. Pflugers Arch 447, 636–40 (2004).
OpenUrl CrossRef PubMed Web of Science

[32] 32.↵
Sharma, P. & Sharma, R. Toxic optic neuropathy. Indian J Ophthalmol 59, 137–41 (2011).
OpenUrl PubMed

[33] 33.↵
Tokonami, N. et al. Uromodulin is expressed in the distal convoluted tubule, where it is critical for regulation of the sodium chloride cotransporter NCC. Kidney Int 94, 701–715 (2018).
OpenUrl PubMed

[34] 34.↵
Pattaro, C. et al. Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function. Nat Commun 7, 10023 (2016).
OpenUrl CrossRef PubMed

[35] 35.↵
Canela-Xandri, O., Rawlik, K. & Tenesa, A. An atlas of genetic associations in UK Biobank. Nat Genet 50, 1593–1599 (2018).
OpenUrl CrossRef PubMed

[36] 36.↵
Guo, H.F. & Vander Kooi, C.W. Neuropilin Functions as an Essential Cell Surface Receptor. J Biol Chem 290, 29120–6 (2015).
OpenUrl Abstract/FREE Full Text

[37] 37.↵
Lieb, W. et al. Genome-wide meta-analyses of plasma renin activity and concentration reveal association with the kininogen 1 and prekallikrein genes. Circ Cardiovasc Genet 8, 131–40 (2015).
OpenUrl Abstract/FREE Full Text

[38] 38.↵
Li, J., Matye, D.J. & Li, T. Insulin resistance induces posttranslational hepatic sortilin 1 degradation in mice. J Biol Chem 290, 11526–36 (2015).
OpenUrl Abstract/FREE Full Text

[39] 39.↵
Ehret, G.B. et al. The genetics of blood pressure regulation and its target organs from association studies in 342,415 individuals. Nat Genet 48, 1171–1184 (2016).
OpenUrl CrossRef PubMed

[40] 40.↵
Sung, Y.J. et al. A Large-Scale Multi-ancestry Genome-wide Study Accounting for Smoking Behavior Identifies Multiple Significant Loci for Blood Pressure. Am J Hum Genet 102, 375–400 (2018).
OpenUrl CrossRef

[41] 41.↵
Maass, P.G. et al. PDE3A mutations cause autosomal dominant hypertension with brachydactyly. Nat Genet 47, 647–53 (2015).
OpenUrl CrossRef PubMed

[42] 42.↵
Landa, L.R., Jr.. et al. Interplay of Ca2+ and cAMP signaling in the insulin-secreting MIN6 beta-cell line. J Biol Chem 280, 31294–302 (2005).
OpenUrl Abstract/FREE Full Text

[43] 43.↵
Shin, S. et al. CREB mediates the insulinotropic and anti-apoptotic effects of GLP-1 signaling in adult mouse beta-cells. Mol Metab 3, 803–12 (2014).
OpenUrl

[44] 44.↵
Omar, B., Banke, E., Ekelund, M., Frederiksen, S. & Degerman, E. Alterations in cyclic nucleotide phosphodiesterase activities in omental and subcutaneous adipose tissues in human obesity. Nutr Diabetes 1, e13 (2011).
OpenUrl PubMed

[45] 45.↵
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–73 (2010).
OpenUrl CrossRef PubMed Web of Science

[46] 46.↵
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
OpenUrl CrossRef PubMed

[47] 47.↵
Loh, P.R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 48, 1443–1448 (2016).
OpenUrl CrossRef PubMed

[48] 48.↵
Das, S. et al. Next-generation genotype imputation service and methods. Nat Genet 48, 1284–1287 (2016).
OpenUrl CrossRef PubMed

[49] 49.↵
Abraham, G. & Inouye, M. Fast principal component analysis of large-scale genome-wide data. PLoS One 9, e93766 (2014).
OpenUrl CrossRef PubMed

[50] 50.↵
Fang, H. et al. Harmonizing genetic ancestry and self-identified race/ethnicity in genome-wide association studies. Am J Hum Gen 105, 763–772 (2019).
OpenUrl

[51] 51.↵
Tirschwell, D.L. & Longstreth, W.T., Jr.. Validating administrative data in stroke research. Stroke 33, 2465–70 (2002).
OpenUrl Abstract/FREE Full Text

[52] 52.↵
Levey, A.S. et al. Definition and classification of chronic kidney disease: a position statement from Kidney Disease: Improving Global Outcomes (KDIGO). Kidney Int 67, 2089–100 (2005).
OpenUrl CrossRef PubMed Web of Science

[53] 53.↵
Chang, C.C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).
OpenUrl CrossRef PubMed

[54] 54.↵
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48, 1279–83 (2016).
OpenUrl CrossRef PubMed

[55] 55.↵
Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–1 (2010).
OpenUrl CrossRef PubMed Web of Science

[56] 56.↵
Gao, F. et al. XWAS: A Software Toolset for Genetic Data Analysis and Association Studies of the X Chromosome. J Hered 106, 666–71 (2015).
OpenUrl CrossRef PubMed

[57] 57.↵
Yang, J., Lee, S.H., Goddard, M.E. & Visscher, P.M. GCTA: a tool for genome-wide complex trait analysis. Am J Hum Genet 88, 76–82 (2011).
OpenUrl CrossRef PubMed

[58] 58.↵
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol 17, 122 (2016).
OpenUrl CrossRef PubMed

[59] 59.↵
Consortium, G.T. The Genotype-Tissue Expression (GTEx) project. Nat Genet 45, 580–5 (2013).
OpenUrl CrossRef PubMed

[60] 60.↵
Ko, Y.A. et al. Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease. Am J Hum Genet 100, 940–953 (2017).
OpenUrl CrossRef PubMed

[61] 61.↵
Ackermann, A.M., Wang, Z., Schug, J., Naji, A. & Kaestner, K.H. Integration of ATAC-seq and RNA-seq identifies human alpha cell and beta cell signature genes. Mol Metab 5, 233–244 (2016).
OpenUrl

[62] 62.↵
Gamazon, E.R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat Genet 47, 1091–8 (2015).
OpenUrl CrossRef PubMed

[63] 63.↵
Hormozdiari, F. et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet 99, 1245–1260 (2016).
OpenUrl CrossRef PubMed

[64] 64.↵
Verma, A. & Ritchie, M.D. Current Scope and Challenges in Phenome-Wide Association Studies. Curr Epidemiol Rep 4, 321–329 (2017).
OpenUrl

[65] 65.↵
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene- disease associations. Bioinformatics 26, 1205–10 (2010).
OpenUrl CrossRef PubMed Web of Science

[66] 66.↵
Carroll, R.J., Bastarache, L. & Denny, J.C. R PheWAS: data analysis and plotting tools for phenome- wide association studies in the R environment. Bioinformatics 30, 2375–6 (2014).
OpenUrl CrossRef PubMed Web of Science

[67] 67.↵
Lin, D.Y. et al. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet 95, 675–88 (2014).
OpenUrl

[68] 68.↵
Richardson, D.B. & Kaufman, J.S. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol 169, 756–60 (2009).
OpenUrl CrossRef PubMed Web of Science

[69] 69.↵
Choi, S.W. & O’Reilly, P.F. PRSice-2: Polygenic Risk Score software for biobank-scale data. Gigascience 8(2019).

[70] 70.↵
Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 47, 1236–41 (2015).
OpenUrl CrossRef PubMed

[71] 71.↵
Pers, T.H. et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6, 5890 (2015).
OpenUrl CrossRef PubMed

Discovery of 318 novel loci for type-2 diabetes and related micro- and macrovascular outcomes among 1.4 million participants in a multi-ethnic meta-analysis

Abstract

Results

Study Populations

Single-variant autosomal analyses

Chromosome-X analyses

Effect heterogeneity between Europeans and African Americans

Secondary signal analysis

Rare coding variant mapping

Rare coding variant PheWAS

Transcriptome-wide association analyses

Assessment of gene–drug relationships

Pathway and functional enrichment analysis

Genetic correlation between T2D and other phenotypes

T2D-related micro- and macrovascular outcomes

Polygenic risk scores and T2D-related micro- and macrovascular outcomes

Discussion

Online Methods

Discovery cohort

Genotyping

Genotype quality-control

Race and ethnicity

Phenotype classification

MVP analysis

Meta-analysis

Novel loci

Chromosome-X analysis

Secondary signal analysis

Coding variant mapping

S-PrediXcan and colocalization analyses

Phenotypic variance explained by SNPs

Phenome-wide association analysis

Analysis of T2D-related outcomes

Polygenic risk scores and risk of T2D and related outcomes

Heritability estimates and genetic correlations with other complex traits and diseases

Enrichment and pathway analyses

Evaluation of drug classes for genes with associations with gene expression

Data Availability

Competing Interests

Ethics statement

Consortia

Contributions

Reporting Summary

Data availability

Code availability

Acknowledgements

References

Citation Manager Formats

Subject Area