Discovery of 318 novel loci for type-2 diabetes and related micro- and macrovascular outcomes among 1.4 million participants in a multi-ethnic meta-analysis

Marijana Vujkovic*, Ph.D., MSCE, Jacob M. Keaton*, Ph.D., Julie A. Lynch, Ph.D., R.N., Donald R. 6 Miller, Sc.D., Jin Zhou, Ph.D., Catherine Tcheandjieu, Ph.D., Jennifer E. Huffman, 7 Ph.D., Themistocles L. Assimes, M.D., Ph.D., Renae L. Judy, Jie Huang, Ph.D., Kyung Min Lee, 8 Ph.D., Derek Klarin, M.D., Saiju Pyarajan, John Danesh, M.D., Ph.D., Olle Melander, M.D., 9 Ph.D., Asif Rasheed, Ph.D., Nadeem Qamar, M.D., Ph.D., Saqib S. Sheikh, M.D., Ph.D., Shahid 10 Hameed, M.D., Ph.D., Irshad H. Qureshi, M.D., Ph.D., Muhammad N. Afzal, M.D., Uzma 11 Jahazaib, M.D., Anjum Jalal, M.D., Ph.D., Shahid Abbas, Ph.D., Xin Sheng, Ph.D., Long Gao, 12 Ph.D., Klaus H. Kaestner, Ph.D., Katalin Susztak, M.D., Yan V. Sun, Ph.D., Scott L. Duvall, 13 Ph.D., Kelly Cho, Ph.D., MPH, Jennifer S. Lee, M.D., Ph.D., John M. Gaziano, M.D., Lawrence 14 S. Philips, M.D., James B. Meigs, M.D., Peter D. Reaven, M.D., Peter W. Wilson, 15 M.D., Todd L. Edwards, Ph.D., Daniel J. Rader, M.D., Scott M. Damrauer, M.D., Christopher J. 16 O'Donnell, M.D., Philip S. Tsao, Ph.D., HPAP Consortium, Regeneron Genetics Center, VA Million 17 Veteran Program, Kyong-Mi Chang*, M.D., Benjamin F. Voight*, Ph.D., Danish Saleheen*, M.D., 18 Ph.D. 19

the large sample size and ancestral diversity. Genome-wide chip heritability analysis explained 19% of 147 T2D risk 4 . 148 In an analysis of African American participants (Table 1), we observed a total of 21 loci 149 associated with T2D susceptibility at genome-wide significance, 16 of which were in strong LD with 150 established T2D variants. Three variants were novel and the effect on T2D appeared specific to  Americans. The frequency of the T2D risk-increasing allele is notably higher in African reference 152 populations than European for these SNPs, with the ancestral allele the major allele in African 153 populations and the derived allele being the major allele in European populations. Single variant analysis 154 in the Hispanics subset identified 2 SNPs, both of which tagged previously reported T2D loci 155 (Supplemental Table 10). No novel variants were observed in the subjects of Asian ancestry 156 (Supplemental Table 11). 157 Chromosome-X analyses. 158 A total of 10 association signals for T2D were identified on chromosome X in a multi-ethnic 159 meta-analysis, of which 7 were novel (Table 2a- analysis identified 4 chromosome-X loci associated with T2D risk, all of which were identified in the 161 multi-ethnic meta-analysis. One novel chromosome-X locus was associated with T2D that is specific to 162 African Americans. Of interest is one novel, multi-ethnic locus near the androgen receptor gene that was 163 in strong LD with a previously reported variant (rs4509480) known to be associated with male-pattern 164 baldness (EUR r 2 =0.98, rs200644307). 165

Effect heterogeneity between Europeans and African Americans. 166
For most loci, we found no evidence of significant heterogeneity of effect estimates between Europeans 167 and African Americans, whereas 44 (7.9%) had significantly different effect estimates (Supplemental 168   Table 20). Overall, the strength of effect was found to be highest in Europeans, an expected result given 169 the abundance of individuals of that ancestry in our study. However, 4 loci nearby the genes SLC30A8, 170 PTPRQ, GRB10, and COLB showed higher effect sizes for T2D at stronger levels of significance in African 171 Americans compared with Europeans. Of these loci, associations with loss of function variants in the 172 SLC30A8 gene were previously reported in Europeans, African Americans and South Asians 12,13 . 173

Secondary signal analysis. 174
We detected a total of 233 conditionally independent SNPs that flank 49 novel and 108 previously 175 reported lead SNPs in Europeans (Supplementary Table 7 and 9), and identified 9 independent variants 176 in Europeans and 3 in African Americans flanking 3 novel and 6 previously reported sentinel SNPs. We 177 observed no novel conditionally independent variants in participants of South Asian, East Asian and 178 Hispanic ancestry. 179 Rare coding variant mapping. 180 To identify coding variants that may influence T2D risk by changing protein structures, we investigated 181 predicted loss of function (pLoF) and missense variants in LD with the identified T2D lead variants 182 (Supplementary Table 12). We identified 2 pLoF (LPL and ANKDD1B) and 45 missense variants in 47 183 genes that were in LD with at-least one of the T2D lead SNPs (r 2 > 0.5, MVP reference panel in 184 Europeans) and were associated at a P-value < 1.0x10 -4 . We also identified 9 missense variants that were 185 either within 5 Mb of sentinel SNPs or in moderate to low LD (r 2 < 0.5) and were associated with T2D at a 186 P-value < 1.0x10 -4 . Of the 56 pLoF and missense variants, 14 missense variants were found to be the 187 sentinel T2D SNPs and 19 variants were in LD with novel lead SNPs, and 37 variants were previously 188 reported. 189 We then performed a genome-wide screen of all pLoF's (not bound by proximity to sentinel T2D 190 lead variants) and missense variants available for analysis (Supplemental Table 13) and identified one 191 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint additional pLoF variant in the CCHCR1 gene, whereas 37 novel missense variants were associated with 192 T2D at a P-value threshold of 5x10 -8 . 193 Rare coding variant PheWAS. 194 To understand biological pleiotropy and underlying mechanisms, we performed a PheWAS of the 3 pLoF 195 variants associated with T2D in MVP participants of European ancestry (Table 5). These loci included 196 ANKDD1B p.Trp480*, CCHCR1 p.Trp78*, and LPL p.*474Ser in the MVP, and were significantly associated 197 with metabolic and inflammatory conditions. For example, ANKDD1B p.Trp480* was associated with 198 dyslipidemia, hypercholesterolemia, and diabetic neurological manifestations. CCHCR1 p.Trp78* was 199 associated with type 1 diabetes, epistaxis, celiac disease, microscopic hematuria, and psoriatic 200 arthropathy. Finally, the recent report by Klarin  Common variants from the European T2D GWAS meta-analysis were used to evaluate the association of 205 genetically predicted gene expression levels with T2D risk across 52 tissues including kidney and islet 206 cells using S-PrediXcan 15 (Supplemental Table 8, Supplemental Figure 3). We identified 4,468 statistically 207 significant gene-tissue combination pairs genetically predictive of T2D risk, of which 4,211 transcript 208 eQTLs were in LD (r 2 > 0.5) with T2D signals. We identified 873 genes in this analysis that would not have 209 been identified by nearest-gene annotation alone. The strongest gene-tissue combination signals were 210 for NRAP in the cerebellum and TCF7L2 in the aortic artery. 211 We additionally used COLOC to identify the subset of significant genes where there was a high 212 posterior probability that the set of model SNPs in the S-PrediXcan analysis for each gene were both 213 causal for the expression and T2D. This analysis refined the results of the TWAS and excluded some 214 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)  results that might be the consequence of LD between causal SNPs for gene expression and T2D. We  215   detected 3,568 gene-tissue pairs where there was statistically significant association with T2D risk and  216 high posterior probability (P4>0.5) of colocalization. The colocalized and significant results included 804 217 genes. When comparing the 804 genes to the GWAS catalog mapped and reported genes for all prior 218 studies of diabetes or diabetes complications, 687 had not been identified previously. 219 Hypergeometric enrichment analysis showed that most enriched gene expression signals were 220 in the following tissues: cervical spinal cord, basal ganglia, glomerular kidney, tibial nerve, transformed 221 fibroblasts, and skeletal muscle (Supplemental Table 15 (anti-cancer drugs), F2 targeted by 11 compounds (anti-coagulants), and BLK targeted by 9 compounds 238 (protein kinase inhibitors). 239 The gene most significant in S-PrediXcan targeted by a non-antiglycemic medication was PPARG 240 in EBV-transformed lymphocytes (Supplemental Table 8). The gene most significant in S-PrediXcan 241 targeted by a drug with report of ADE in diabetic patients was TH in non-sun-exposed suprapubic skin. 242 The gene most significant in S-PrediXcan targeted by an established anti-glycemic medication was 243 KCNJ11 in sun-exposed skin from the lower leg. 244

Pathway and functional enrichment analysis. 245
To explore whether our results recapitulate the pathophysiology of T2D, we performed gene-set 246 enrichment analysis with all the variants using DEPICT (P < 1x10 -5 , Supplemental showed that the most significant gene-set involved the AKT2 subnetwork, lung cancer, the GAB1 250 signalosome, protein kinase binding, signal transduction, and EGFR signaling (Supplemental Table 17). 251

Genetic correlation between T2D and other phenotypes. 252
Genome-wide genetic correlations of T2D were calculated with a total of 774 complex traits and 253 diseases by comparing allelic effects using LD score regression (Online Methods). A total of 270 254 significant associations were observed (P < 5.0x10 -8 , Supplementary Table 19). The strongest positive 255 correlations were observed with waist circumference, overall health, BMI, and fat mass of arms, legs, 256 body and trunk, hypertension, coronary artery disease, dyslipidemia, alcohol intake, years of education, 257 wheezing, and cigarette smoking. There was also a strong negative correlation with years of education. 258 259 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint T2D-related micro-and macrovascular outcomes. 260 Using a genome-wide approach, we investigated SNP-T2D interaction effects associated with T2D-261 related vascular outcomes among European-descent MVP participants (P < 5x10 -8 , Online Methods, 262 Table 3a-b). The analysis included a total case count of 67,403 for CKD, 56,285 for CHD, 35,882 for PAD, 263 11,796 for acute ischemic stroke, 13,881 for retinopathy, and 40,475 for neuropathy. Several genome-264 wide significant interactions were identified where the genetic associations with T2D-related vascular 265 outcomes were modified by T2D (Table 3a-b). We identified 2 loci for CHD (rs1831733 in 9p21 and 266 rs602633 near SORT1) and 1 for CKD (rs34857077 in UMOD) for which the difference in the effect 267 estimates between T2D strata was genome-wide significant (P < 5.0x10 -08 ) and at least one T2D-stratum 268 was genome-wide significant as well. We additionally identified 1 locus for CHD (rs71039916 near 269 PDE3A), 1 for CKD (rs2177223 near TENM3), 1 for PAD (rs3104154 in PTDSS1), 1 for neuropathy 270 (rs78977169 near NRP2), 4 for retinopathy (rs76754787 nearby GJA8, rs10733997 in SVILP2, rs2255624 271 near SLC18A2, and rs4132670 in TCF7L2) and 2 for acute ischemic stroke (rs491203 near TMEM51, and 272 rs2134937 near TRIQK) that showed a genome-wide significance for difference in effect estimates 273 between the T2D strata and nominal significance (P < 0.001) for at least one T2D stratum. 274

Polygenic risk scores and T2D-related micro-and macrovascular outcomes 275
Genome-wide polygenic risk scores (gPRS) for T2D were calculated in Europeans based on the T2D effect 276 estimates from the previously reported DIAMANTE consortium 4 and then categorized into deciles (Table  277 4a-b). As expected, participants with the highest T2D gPRS scores (90-100% T2D gPRS percentile) 278 showed the highest risk for T2D (OR = 5.21, 95% CI 4.94-5.49, Supplemental Figure 5) when compared to 279 the reference group (0-10% T2D gPRS percentile) in a cross-sectional study design. 280 We evaluated whether the T2D gPRS was associated with the risk of micro-and macrovascular 281 outcomes in an analysis restricted to participants with T2D. The P-value are calculated using gPRS as a 282 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)  Table 4a-b). The T2D gPRS was more strongly associated with microvascular than macrovascular 284 outcomes, in particular with retinopathy, but also with neuropathy and CKD. For macrovascular 285 outcomes, T2D gPRS was associated with the risk of PAD, but not with the risk of CHD or acute ischemic 286 stroke. 287

Discussion 288
We report the discovery of 318 novel autosomal and X-chromosome variants associated with T2D 289 susceptibility in a large multi-ethnic T2D study. Furthermore, we observed 13 variants associated with 290 differences in T2D-related micro-and macrovascular outcomes between diabetic and non-diabetics. The 291 substantial locus discovery was achieved by combining data from several large-scale biobanks and 292 consortia, where the MVP data constituted over 40% of all T2D cases. Furthermore, we present the 293 largest cohort of African Americans including over 56K subjects, substantially greater than any African-294 specific T2D study published to date. 295 Analyses of coding variants identified 44 variants for T2D, including three pLoF variants in LPL, 296 ANKDD1B and CCHCR1. 804 putative causal genes at both novel and previously reported loci were 297 identified and 54 genes were found to be possible targets for FDA-approved drugs and chemical 298 compounds. Our SNP-T2D interaction analyses identified several loci where the association between a 299 genetic variant and a vascular outcome differed between people with T2D as compared to those 300 without. We further found that a high polygenic risk for T2D strongly increased the risk for retinopathy 301 in patients with T2D, and also for CKD, neuropathy, and PAD. 302 T2D is highly prevalent in people of African ancestry; however there are a total of three 303 published T2D GWAS reports in this ancestral group with only 4 definitely detected loci 18,19,20 . In our 304 study with over 56K participants of recent African ancestry, we report 4 novel loci for T2D that are solely 305 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint observed in this ancestral group, including one that is located on chromosome X. Of the previously 306 reported loci, only rs3842770 (INS-IGF2) was replicated here. The reported HLA-B variant rs2244020 18 307 did not replicate in our study; but we did observe a significant association with another SNP in the HLA 308 region (rs10305420, OR 1.15, P = 8.5x10 -9 ). We did not replicate the association with the variant 309 rs7560163 20 . Finally, 1 locus (rs73284431) that was recently identified in a large study conducted in Sub-310 Saharan African countries 19 failed to replicate in our study consisting of African Americans despite 311 comparable minor allele frequencies (9% vs 10%). Finally, we observed that one SLC30A8 variant 312 showed a very prominent T2D effect in African Americans that was absent in other populations; mouse 313 studies have shown that the human R138X LoF variant in SLC30A8 results in increased insulin secretory 314 capacity 21 . 315 The sole presence of a coding variant near a tagging SNP does not constitute enough evidence 316 to infer a causal association. However, recent exome-array genotyping of over 350K subjects identified 317 40 coding variants associated with T2D, of which 26 mapped to near known risk-associated loci 22 . 318 Similarly, an exome sequencing study in over 40K participants reported 15 variants associated with T2D, 319 of which only 2 were not previously reported by GWAS 13 . Sequencing efforts are indispensable for 320 identifying causal variants and genes related to disease, as well as providing insight into the 321 contributions of ultra-rare alleles, but have also substantiated the value of array-based association 322 studies. 323 Our transcriptome-wide analyses identified 804 putatively causal genes, including 54 genes that 324 appear to be regulated by approved drugs and 687 genes that have not been previously reported. Some 325 of these genes are already well established for T2D etiology (e.g. KCNJ11). Except for skeletal muscle, 326 the tissues that showed strongest associations are not known to be of importance in T2D etiology. This 327 raises the hypothesis that the eQTL associations are universal and not tissue dependent; if the gene is 328 expressed in the respective tissue at reasonable levels an association will be detected. We did not 329 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint observe any significant association in the alpha and beta islet cells, which could be the result of the small 330 sample size (e.g. 30 alpha cells and 19 beta cells). In addition, whole islet transcriptomes are notoriously 331 variable due to the large differences in islet composition among humans, and a few transcripts make up 332 half the transcriptome 23 . 333 Of particular clinical importance, we identified several genes which are therapeutic targets for 334 medications in patients that are treated for cardiometabolic conditions. We identified two genes, SCN3A 335 and SV2A whose expression is modified by anti-epileptic agents, and evidence exists showing that anti-336 epileptic agents may influence glucose regulation. A randomized-controlled trial has reported that the 337 anticonvulsant valproic acid lowers blood glucose concentrations 24 . The information from the gene-drug 338 screen may facilitate future drug repurposing screens. 339 DEPICT enrichment analysis identified AKT2 subnetwork gene set to be show the highest 340 association with T2D. Of interest is that AKT2 is critical to insulin signaling and not beta-cell function. 341 The majority of the early era T2D GWAS studies 25 predominantly identified genetic variants that alter 342 T2D risk through reduced beta-cell function, such as KCNJ11, HHEX, SLC30A8, CDKAL1, TCF7L2, 343 CDKN2A/2B, and IGF2BP2. However, as the study sample size have substantially increased and the 344 number of T2D-associated variants is approaching the 1.000 number mark, the cumulative effect of 345 genetic variants on T2D is pointing towards a predominant role for insulin resistance. 346 When we evaluated polygenic risk scores and T2D-related outcomes, we observed that among 347 vascular outcomes, the T2D gPRS was most significantly associated with retinopathy. It is possible that 348 the use of the T2D gPRS provides an opportunity to identify patients who are at the highest risk of 349 developing microvascular complications, such as retinopathy. Using the T2D gPRS, we observed 350 significant associations with other T2D-related outcomes such as CKD, PAD, and neuropathy. Studies at 351 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint specific loci using both common and rare coding variants will be required to understand pathways 352 leading to T2D-related vascular outcomes. 353 In a SNP-T2D interactions analysis on T2D-related vascular outcomes we identified 13 loci where 354 the effect on outcome was different between the strata of T2D, of which 3 occurred at previously 355 established variants and 10 had not been previously reported. Our findings have clinical translational 356 potential for risk-stratification and identify diabetic patients who are predisposed to develop 357 subsequent vascular outcomes and present therapeutic opportunities to attenuate the risk of diabetes 358 progression in individuals with T2D. 359 For T2D-related retinopathy, four variants were found to have different effect sizes between 360 people with and without T2D. The strongest signal for interaction in relation to retinopathy was 361 observed for GJA8. Deletion of this gene has been associated with eye abnormalities and retinopathy of 362 prematurity in premature infants, inherited cataracts, visual impairment and cardiac defects and eye 363 abnormalities 26-28 . TCF7L2 is a known diabetes locus and its association with progression to retinopathy 364 has been previously established 29 . SLC18A2 is expressed in adult retina and retinal pigment epithelium 365 tissues 30 ; this gene is involved in the transport of monoamines into secretory vesicles for exocytosis 31 . 366 SVILP1 has been previously shown to be associated with thiamine (vitamin B1) prescription, which is 367 frequently prescribed to people with blurry vision 32 . 368 For chronic kidney disease, two loci, UMOD and TENM3, were identified for gene-T2D 369 interaction effects. UMOD encodes for uromodilin which is exclusively produced by the kidney tubule 370 where it plays an important role in kidney and urine function 33 . A large-scale study in over 133K 371 participants has shown that the serum creatinine-lowering allele in UMOD (rs12917707) is more 372 prevalent in diabetic individuals with CKD as compared to diabetic participants without CKD 34 . Variation 373 in TENM3 has been associated with cholangitis and kidney disorders in the UK Biobank 35 . 374 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint SNP-T2D interaction analysis of neuropathy identified one locus, NRP2. NRP2 encodes for 375 neuropilin-2 which is an essential cell surface receptor involved in VEGF-dependent angiogenesis and 376 sensory nerve regeneration 36,37 . 377 For coronary heart disease, we identified several SNP-T2D interactions. Variation at 9p21 has 378 been reported previously for CHD and T2D. SORT1 is a lipid-associated locus; in our analyses, allelic 379 variation at this locus that decreases CHD risk and decreases lipids conferred a stronger protection in 380 people with T2D compared to those without T2D. Coupled with findings in mice that identified SORT1 as 381 a novel target of insulin signaling 38 , our findings raise the hypothesis that SORT1 may contribute to 382 altered hepatic apoB metabolism under insulin-resistant conditions. 383 The SNP rs71039916 is located nearby PDE3A, and collocates with a SNP (rs3752728, D' = 0.867, 384 r 2 = 0.08) that is associated with diastolic blood pressure 39,40 . As a phosphodiesterase that reduces cAMP 385 levels, the PDE3A protein limits protein kinase A/cAMP signaling, and has been shown to affect We have conducted the largest discovery study for T2D to date, including over 220K T2D cases 393 and 1.2 million controls. Over 21% of our discovery sample comprised of non-European participants; our 394 African American sample included over 56K subjects. A unique strength in our study is the 395 comprehensive individual level phenotype data available for analyzing T2D and related micro-and 396 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint macrovascular outcomes. This allowed us to test for genetic interaction instead of evaluating effect 397 differences between strata using summary statistics. 398 In summary, we have identified 318 novel genetic variants associated with T2D risk and T2D-399   Phenotype classification: ICD-9-CM diagnosis codes from electronic health care records were available 471 for MVP participants from as early as 1998. Participants were classified as a T2D case if they had 2 or 472 more T2D-related diagnosis codes (ICD-9-CM 250.2x) from VA or fee basis inpatient stays or face-to-face 473 primary care outpatient visits in the 731 days before the enrollment date up to July 1 st of 2017, 474 excluding those with co-occurring diagnosis codes for T1D (250.1x), secondary or other diabetes or a 475 medical condition that may cause diabetes (249.xx). Participants were selected as controls if they had no 476 ICD-9-CM diagnosis code for type 1, type 2, or secondary diabetes mellitus up to July 2017. 477 For T2D-related vascular outcomes, the following definitions were used: CHD, at least one admission to 478 a VA hospital with discharge diagnosis of admission for myocardial information, or at least one 479 procedure code for revascularization (coronary artery bypass grafting, percutaneous coronary 480 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) Other studies: Summary statistics available from previous reports in the DIAMANTE Consortium 4 , 500 Biobank Japan 7 , Penn Medicine Biobank 5 , Malmö Diet and Cancer Study 8 , MedStar/PennCath studies 9 , 501 and the Pakistani Genomic Resource 6 were obtained for meta-analysis (Supplemental Table 2). All 502 cohort were imputed using the 1000 Genomes Project phase 3, version 5 reference panel with exception 503 of the DIAMANTE consortium where genotype calls were imputed using the Haplotype Reference 504 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint Consortium reference panel 54 . Briefly, following central and study-specific quality control protocols, an 505 additional 917,392 Europeans, 2,647 African Americans, and 213,834 Asians were available for T2D 506 meta-analysis. Only SNPs with ancestry-specific MAF > 1% in these studies were used. Within each study 507 a logistic regression model was used where T2D was set as the dependent variable, imputed SNP as the 508 independent variable, and analyses were adjusted for age, gender, and the top ten principal 509 components of genetic ancestry. 510 Meta-analysis: We aggregated association summary statistics from the above mentioned studies to 511 perform ancestry-specific and multi-ethnic meta-analysis. The association summary statistics for each 512 analysis were meta-analyzed in a fixed-effects model using METAL with inverse-variance weighting of 513 log odds ratios 55 . From the meta-analysis, all variants were extracted that passed quality control. 514 Between-study allelic effect size heterogeneity was assessed with Cochran's Q statistic as implemented 515 in METAL. Variants were considered genome-wide significant if they passed the conventional p-value 516 threshold of 5.0x10-8. We excluded variants with a high amount of heterogeneity (I 2 statistic > 75%) 517 across the four evaluated ancestral groups. 518 Novel loci: Novel loci were defined as those in which the lead variant was not in LD or physically nearby 519 (500 Kb) to a previously reported lead GWAS variant(s). For each locus, we first searched ± 500 Kb 520 surrounding the lead SNP to ensure that potential long-range genetic influences were assessed. 521 Chromosome-X analysis: X chromosome genotypes were processed separately from autosomal 522 genotypes. During prephasing and imputation an additional flag of -chrX was added. In post-imputation 523 stage, the XWAS QC pipeline was employed to remove variants in the pseudo-autosomal regions, 524 variants that were not in HWE in females (P > 1.0x10 -6 ), variants with differential allele frequencies, and 525 variants with differential missingness (p<10−7) between males and female controls (Supplemental 526 Figure 2) 56 . For each ancestry-specific subset, we performed sex-stratified analysis where dosages 527 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint (number of chromosome copies) for the X-chromosome in T2D cases are equivalent to controls within 528 each sex stratum. For the meta-analysis of sexes, differences in dosage compensation should be 529 considered between males (with one X chromosome) and females (with two X chromosomes). In a 530 separate verification analysis we performed a chromosome-X association analysis on dosage data using 531 PLINK (--xchr-model) which produced consistent results compared with analysis of best-guess data 532 where the X chromosome was coded as 0/2 for males instead of 0/1. The output from sex-stratified 533 chromosome-X analyses were first meta-analyzed into a multi-ethnic sex-stratified analysis. Then, the 534 multi-ethnic results from males and multi-ethnic results from females were meta-analyzed, where none 535 of the analyzed variants was detected using the Cochrane test for heterogeneity (P < 5.0x10 -8 ). Results 536 presented include overall effect estimates, male-specific effect estimates, female-specific estimates, and 537 the heterogeneity p-value (Supplementary Table 13 conditionally independent variants that reached locus-wide significance (P < 1.0x10 -5 ) were considered 546 as secondary signals of distinct association. If the minimum distance between any distinct signals from 547 two separate loci was less than 500 Kb, we performed additional conditional analysis including both 548 regions and reassessed the independence of each signal. Finally, the predicted conditionally 549 independent variants were tested in a logistic regression model in the MVP study only to empirically 550 validate the signal, and results are shown in Supplemental Tables 7 and 10. 551 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) LoF and missense variants were extracted. The LD was calculated with established variants, and in a 554 second round of analysis the effect of the missense variant was calculated conditioning on the lead SNP 555 to assess how much residual variance the SNP explains in T2D risk. A p-value of 0.05 was considered as 556 statistically significant. 557

S-PrediXcan and colocalization analyses. 558
Genetically predicted gene expression and its association with T2D risk was estimated using S-559 PrediXcan 15 . Input included meta-analyzed summary statistics from the European T2D GWAS for 560 common variants and reference eQTL summary statistics for 52 tissues including 48 tissues from GTEx 59 , 561 2 cell types in kidney tissue (glomerulus and tubulus) 60 , and 2 cell types in pancreatic islet tissue (alpha 562 and beta) 61 . Analyses incorporated genotype covariance matrices based on 1000 Genomes 46 European 563 populations to account for LD structure. Colocalization analysis was performed to address the issue of 564 LD-contamination in S-PrediXcan analyses 62 . Input data were identical to those evaluated by S-PrediXcan 565 and colocalization was restricted to only variants included in the S-PrediXcan gene 63 . The output is 566 shown in Supplemental Table 8. 567

Phenotypic variance explained by SNPs. 568
The MVP European cohort (69,869 T2D cases and 127,197 controls) was used to calculate the variance 569 explained by the 558 genome-wide-significant sentinel SNPs from the multi-ethnic meta-analysis. A 570 logistic regression model was performed assuming additive effects where T2D was set as the dependent 571 variable, and the sentinel SNPs as independent variables, adjusting for sex, age, and 10 principal 572 components. The baseline model included age, gender and 10 principal components. The variance 573 explained was calculated with the R 2 statistic, calculated as 1 -(residual deviance / null deviance). 574 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint

Phenome-wide association analysis. 575
For the three LoF variants that were identified using coding variant analysis, we performed a PheWAS 576 study 64 to fully leverage the diverse nature of the Million Veteran Program as well as the full catalog of 577 relevant ICD-9-CM diagnosis and CPT procedure codes (Supplementary Table 18). Of genotyped 578 veterans, participants were included in the PheWAS analysis if their respective electronic health record 579 reflected two or more separate encounters in the VA Healthcare System in each of the two years prior 580 to enrollment in MVP. Using this method, a total of 277,531 veterans spanning 21,209,658 available ICD-581 9 diagnosis codes were available for PheWAS analysis. We restricted our analysis on the subgroup of 582 197,066 European participants. Diagnosis and procedure codes were collapsed to clinical disease groups 583 and corresponding controls using predefined groupings 65 . Phenotypes were required to have a case 584 count over 25 in order to be included in the PheWAS analysis, and a multiple testing thresholds for 585 statistical significance was set to P < 2.8x10 -5 (Bonferroni method). Each of the previously unpublished 586 LoF variants (ANKDD1B p.Trp480* and CCHCR1 p.Trp78*) were tested using logistic regression adjusting 587 for age, sex, and 10 principal components in an additive effects model using the PheWAS R package in R 588 v3.2.0 66 . The results from these analyses are reported as odds ratios, in which the estimate is the 589 average change in odds of the PheWAS trait per weighted T2D-increasing allele (Supplementary Table  590 18, Supplementary Figure 4). 591

Analysis of T2D-related outcomes. 592
Genetic data on European participants was separately analyzed using outcomes as a binary outcome, 593 and T2D as an interaction variable with SNPs using interaction analysis with robust variance to reduce 594 effect heteroscedacity (e.g. unequal variance between strata) using SUGEN software (v8.8) 67 . We 595 evaluated the interaction between SNP and presence of T2D status using an interaction term for the two 596 independent variables. Due to the binary nature of the outcome, the standard output from the 597 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint interaction effect estimate are interpreted on a multiplicative scale, e.g., the departure from the 598 product of the two independent effects. To obtain interaction on an additive scale (e.g. departure from 599 the sum of the independent effects), we calculated the relative excess risk due to interaction (RERI) 600 metric, a well-known analytic method in classic epidemiologic literature 68 . In case-control studies using 601 the linear additive odds-ratio model as proposed by Richardson and Kaufman 68 in our study has the form 602 of: 603 In which the coefficient 3 measures the departure from additivity of exposure effect on an odds ratio 605 scale; that is 606 where ̅̅̅̅̅̅ and 2 ̅̅̅̅̅̅ note the absence of the respective covariates. 608 In contrast to the standard exponential odds model, where the effects of explanatory variables are 609 additive on an exponential scale, we performed analysis using a linear odds model, in order to quantify 610 the excess odds per unit of the given explanatory variables on the outcome. In this model, the excess 611 relative risk due to interaction is an estimate of the excess odds on a linear scale due to the interaction 612 between two explanatory variables (in our case, T2D state and genotype dosage of the tested variant). 613 In the SNPxT2D interaction analysis we used a significance threshold of P < 5.0x10 -08 to denote variants 614 that statistically different effect sizes between participants with T2D as compared those without. An 615 additional filter was applied, and variants for which the effect size in at least one of the two T2D strata 616 was nominally significant at P < 0.001 were included. Manhattan plots and the table are used to 617 represent the interaction coefficients on this scale. 618 619 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint Polygenic risk scores and risk of T2D and related outcomes. 620 We constructed a genome-wide polygenic risk score (gPRS) for T2D in the MVP participants of European 621 ancestry by calculating a linear combination of weights derived from the Europeans in the DIAMANTE 622 Consortium 4 using the prune and threshold method in PRSice software 69 . After an initial sensitivity 623 analysis, the r 2 threshold for pruning was set to 0.8, and the P-value for significance threshold was set to 624 0.05. The gPRSs were divided into deciles and the risk of T2D was assessed using a logistic regression 625 model using the lowest decile as a reference (e.g. the 10% of participants with lowest of T2D gPRS), 626 together with the potential confounding factors of age, gender, BMI, and the first 10 principal 627 components of European ancestry. An additional outcomes analysis was performed to evaluate to what 628 extent a T2D gPRS is predictive of T2D-induced morbidities. The dataset was restricted to subjects with 629 T2D, and a stratum-restricted T2D gPRS deciles were generated. Logistic regression models were applied 630 where the micro-and macrovascular conditions were modeled as outcomes, and independent variables 631 included strata-restricted gPRS deciles, age, gender, and the first 10 principal components of European 632 ancestry. The data were visualized using shape-plots. 633 Heritability estimates and genetic correlations with other complex traits and diseases. 634 LD-score regression was used to estimate the heritability coefficient, and subsequently population and 635 sample prevalence estimates were applied to estimate heritability on the liability scale 70 . A genome-636 wide genetic correlation analysis was performed to investigate possible co-regulation or a shared 637 genetic basis between T2D and other complex traits and diseases. Pairwise genetic correlation 638 coefficients were estimated between the meta-analyzed T2D GWAS summary output in Europeans and 639 each of 774 precomputed and publicly available GWAS summary statistics for complex traits and 640 diseases by using LD score regression through LD Hub v1.9.3 (http://ldsc.broadinstitute.org). Statistical 641 significance was set to a Bonferroni-corrected level of P < 6.5x10 -5 . 642 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) Bonferroni-corrected threshold for significance was P < 0.001 considering evaluation of 52 tissues. 646 Enrichment analyses in DEPICT 71 were conducted using genome-wide significant (P < 5.0x10 -8 ) T2D 647 GWAS lead SNPs. DEPICT is based on predefined phenotypic gene sets from multiple databases and 648 Affymetrix HGU133a2.0 expression microarray data from >37k subjects to build highly-expressed gene 649 sets for Medical Subject Heading (MeSH) tissue and cell type annotations. Output includes a P-value for 650 enrichment and a yes/no indicator of whether the FDR q-value is significant (P < 0.05). Tissue and gene-651 set enrichment features were considered. 652

Evaluation of drug classes for genes with associations with gene expression. 653
To identify drug-gene pairs that may be leads for repurposing or may be attractive leads for novel 654 inhibitory drugs, we identified drugs targeting genes whose predicted expression was significantly 655 associated with T2D risk in S-PrediXcan analyses and which we predicted would lower blood glucose 656 based on direction of effect on T2D risk with increasing gene expression and drug action (activator or 657 inhibitor). Medications with a primary indication for diabetes and medications with adverse drug events 658 for diabetic patients were evaluated using the SIDe Effect Resource (SIDER) 16  None of the sponsors of the following authors had a role in the design and conduct of the study; 686 collection, management, analysis, and interpretation of the data; and preparation, review, or approval 687 of the manuscript. D.S. has received support from the British Heart Foundation, Pfizer, Regeneron, 688 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review)  The full summary level association data from the trans-ancestry, European, African American, Hispanic, 727 and Asian meta-analysis from this report will be available through dbGAP (Accession codes will be 728 available before publication). 729 730 731 732 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint The graph represents a circos plot. The outer track corresponds to -log10 (P) for association with T2D in 884 the trans-ethnic meta-analysis (y axis truncated at 30), by chromosomal position. The red line indicates 885 genome-wide significance (P=5.0×10−08). Purple gene labels indicated genes identified in skeletal 886 muscle eQTLs by S-PrediXcan analysis, red-labeled gene names in adipose eQTLs, black-labeled gene 887 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint names in pancreas eQTLs, and blue-labeled gene names were identified in eQTLs from arteries. The 888 green band corresponds to measures of heterogeneity related to the index SNPs associated with T2D. 889 Dot sizes are proportional to I2 or ancestry-related heterogeneity. The inner track corresponds to -890 log10(P) for association with skeletal muscle, adipose, pancreas, and artery tissue eQTLs from S-891 PrediXcan analysis (y axis truncated at 20), by chromosomal position. The red line indicates genome-892 wide significance (P=5.0×10−08). Inset, effects of all 318 index SNPs on T2D by minor allele frequency, 893 stratified and colored by ancestral group. 894 895 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) (which The copyright holder for this preprint this version posted November 25, 2019. ; https://doi.org/10.1101/19012690 doi: medRxiv preprint