Genome-wide association study of over 40,000 bipolar disorder cases provides novel biological insights

Bipolar disorder (BD) is a heritable mental illness with complex etiology. We performed a genome-wide association study (GWAS) of 41,917 BD cases and 371,549 controls, which identified 64 associated genomic loci. BD risk alleles were enriched in genes in synaptic and calcium signaling pathways and brain-expressed genes, particularly those with high specificity of expression in neurons of the prefrontal cortex and hippocampus. Significant signal enrichment was found in genes encoding targets of antipsychotics, calcium channel blockers and antiepileptics. Integrating eQTL data implicated 15 genes robustly linked to BD via gene expression, including druggable genes such as HTR6, MCHR1, DCLK3 and FURIN. This GWAS provides the best-powered BD polygenic scores to date, when applied in both European and diverse ancestry samples. Together, these results advance our understanding of the biological etiology of BD, identify novel therapeutic leads and prioritize genes for functional follow-up studies.


Introduction
Bipolar disorder (BD) is a complex mental disorder characterized by recurrent episodes of (hypo)mania and depression. It is a common condition affecting an estimated 40 to 50 million people worldwide 1 . This, combined with the typical onset in young adulthood, an often chronic course, and increased risk of suicide 2 , make BD a major public health concern and a major cause of global disability 1 . Clinically, BD is classified into two main subtypes: bipolar I disorder, in which manic episodes typically alternate with depressive episodes, and bipolar II disorder, characterized by the occurrence of at least one hypomanic and one depressive episode 3 . These subtypes have a lifetime prevalence of ~1% each in the population 4,5 . Family and molecular genetic studies provide convincing evidence that BD is a multifactorial disorder, with genetic and environmental factors contributing to its development 6 . On the basis of twin and family studies, the heritability of BD is estimated at 60-85% 7,8 . Genome-wide association studies (GWAS) [9][10][11][12][13][14][15][16][17][18][19][20][21][22][23] have led to valuable insights into the genetic etiology of BD. The largest such study has been conducted by the Psychiatric Genomics Consortium (PGC), in which genome-wide SNP data from 29,764 BD patients and 169,118 controls were analyzed and 30 genome-wide significant loci were identified (PGC2) 24 . SNP-based heritability (ℎ "#$ % ) estimation using the same data, suggested that common genetic variants genome-wide explain ~20% of BD's phenotypic variance 24 . Polygenic risk scores generated from the results of this study explained ~4% of phenotypic variance in independent samples. Across the genome, genetic associations with BD converged on specific biological pathways including regulation of insulin secretion 25,26 , retrograde endocannabinoid signaling 24 , glutamate receptor signaling 27 and calcium channel activity 9 .

Figure 1: Manhattan plot of genome-wide association meta-analysis of 41,917 bipolar disorder cases and 371,549 controls
The x-axis shows genomic position (chromosomes 1-22 and X) and the y-axis shows statistical significance as -log10(P value). The red line shows the genome-wide significance threshold (P<5E-08). SNPs in genomewide significant loci are colored green for loci previously associated with bipolar disorder and yellow for novel associations from this study. The genes labeled are those prioritized by integrative eQTL analyses or notable genes in novel loci (MHC, CACNB2, KCNB1).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint

Enrichment analyses
Genome-wide analyses using MAGMA 35 indicated significant enrichment of BD associations in 234 genes (Table S4) and 10 gene sets, all related to synaptic functioning, calcium signaling or neurogenesis (Table  S5). The BD association signal was enriched amongst genes expressed in different brain tissues (Table S6), especially genes with high specificity of gene expression in neurons (both excitatory and inhibitory) of cortical and subcortical brain regions in mice (Supplementary Figure 2) 36 . In human brain samples, signal enrichment was also observed in hippocampal pyramidal neurons and interneurons of the prefrontal cortex and hippocampus (Supplementary Figure 2).
In a gene-set analysis of the targets of individual drugs (from the Drug-Gene Interaction Database DGIdb v.2 37 and the Psychoactive Drug Screening Database Ki DB 38 ), no individual set was significantly enriched (Table S7). Grouping drugs according to their Anatomical Therapeutic Chemical (ATC) classes 39 , there was significant enrichment in the targets of four broad drug classes (Table S8): psycholeptics (drugs with a calming effect on behavior) (especially antipsychotics, hypnotics and sedatives, anxiolytics, and benzodiazepines), calcium channel blockers (especially those with mainly vascular effects and dihydropyridine derivatives), antiepileptics and drugs for functional gastrointestinal disorders (which include anticholinergics and serotonin receptor antagonists) (Table S8).

eQTL integrative analyses
A transcriptome-wide association study (TWAS) was conducted using FUSION 40 and eQTL data from the PsychENCODE Consortium (1,321 brain samples) 41 . BD-associated alleles significantly influenced expression of 77 genes in the brain (Table S9, Supplementary Figure 3). These genes encompassed 40 distinct regions. TWAS fine-mapping was performed using FOCUS 42 to model the correlation among the TWAS signals and prioritize the most likely causal gene(s) in each region. Within the 90%-credible set, FOCUS prioritised 22 genes with a posterior inclusion probability (PIP) > 0.9 (encompassing 20 distinct regions) and 32 genes with a PIP > 0.7 (29 distinct regions) (Table S10).
Summary data-based Mendelian randomization (SMR) 43,44 was used to identify putative causal relationships between SNPs and BD via gene expression by integrating the BD GWAS results with brain eQTL summary statistics from the PsychENCODE 41 Consortium and blood eQTL summary statistics from the eQTLGen Consortium (31,684 whole blood samples) 45 . The eQTLGen results represent the largest existing eQTL study and provide independent eQTL data. Of the 32 genes fine-mapped with PIP > 0.7, 15 were significantly associated with BD in the SMR analyses and passed the HEIDI (heterogeneity in dependent instruments) test 43,44 , suggesting that their effect on BD is mediated via gene expression in the brain and/or blood (Table S11). The genes located in genome-wide significant loci are labeled in Figure 1. Other significant genes included HTR6, DCLK3, HAPLN4 and PACSIN2.

MHC locus
Variants within and distal to the major histocompatibility complex (MHC) locus were associated with BD at genome-wide significance. The most highly associated SNP was rs13195402, 3.2 megabases distal to any HLA gene or the complement component 4 (C4) genes (Supplementary Figure 4). Imputation of C4 alleles using SNP data uncovered no association between the five most common structural forms of the C4A/C4B locus (BS, AL, AL-BS, AL-BL, and AL-AL) and BD, either before or after conditioning on rs13195402 (Supplementary Figure 5). While genetically predicted C4A expression initially showed a weak association with BD, this association was non-significant after controlling for rs13195402 (Supplementary Figure 6).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint

Polygenic risk scoring
The performance of polygenic risk scores (PRS) based on these GWAS results was assessed by excluding cohorts in turn from the meta-analysis to create independent test samples. PRS explained ~4.75% of phenotypic variance in BD on the liability scale (at GWAS P value threshold (pT) < 0.1, BD population prevalence 2%), based on the weighted mean R 2 across cohorts ( Figure 2, Table S12). This corresponds to a weighted mean area under the curve (AUC) of 66%. Results per cohort and per wave of recruitment to the PGC are in Tables S12-S13 and Supplementary Figure 7. At pT < 0.1, individuals in the top 10% of BD PRS had an odds ratio of 3.62 (95% CI 1.7-7.9) of being affected with the disorder compared with individuals in the middle decile (based on the weighted mean OR across PGC cohorts), and an odds ratio of 9.5 (95% CI 5.4-20) compared with individuals in the lowest decile. The generalizability of PRS from this meta-analysis was examined in several non-European cohorts. PRS explained up to 2.3% and 1.9% of variance in BD in two East Asian samples, and 1.2% and 0.4% in two admixed African American samples ( Figure 2, Table S14). The variance explained by the PRS increased in every cohort with increasing sample size of the PGC BD European discovery sample (Supplementary Figure 8, Table S14).

Figure 2: Phenotypic variance in bipolar disorder explained by polygenic risk scores
Variance explained is presented on the liability scale, assuming a BD population prevalence of 2%. For European ancestry, the results shown are the weighted mean R 2 values across all cohorts analyzed, calculated weighted by the effective N per cohort. The numbers of cases and controls are shown under the barplot for each study. GWAS pT -the color of the bars represents the P value threshold used to select SNPs from the discovery GWAS. GAIN-AA -Genetic Association Information Network African American cohort, AA-GPC -African American Genomic Psychiatry Cohort.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint

Genetic architecture of BD and other traits
The genome-wide genetic correlation (rg) of BD with a range of diseases and traits was assessed on LD Hub 46 . After correction for multiple testing, BD showed significant rg with 16 traits among 255 tested from published GWAS (Table S15). Genetic correlation was positive with all psychiatric disorders assessed, particularly schizophrenia (rg = 0.68) and major depression (rg=0.44), and to a lesser degree anorexia, attention deficit/hyperactivity disorder and autism spectrum disorder (rg≈0.2). We found evidence of positive rg between BD and smoking initiation, cigarettes per day, problematic alcohol use and drinks per week ( Figure 3). BD was also positively genetically correlated with measures of sleep quality (daytime sleepiness, insomnia, sleep duration) ( Figure 3). Among 514 traits measured in the general population of the UK Biobank, there was significant rg between BD and many psychiatric-relevant traits or symptoms, dissatisfaction with interpersonal relationships, poorer overall health rating and feelings of loneliness or isolation (Table S16).
Bivariate gaussian mixture models were applied to the GWAS summary statistics for BD and other complex traits using the MiXeR tool 47,48 to estimate the number of variants influencing each trait that explain 90% of ℎ "#$ % and their overlap between traits. MiXeR estimated that approximately 8.6 k (SE=0.2 k) variants influence BD, which is similar to the estimate for schizophrenia (9.7 k, SE=0.2 k) and somewhat lower than that for major depression (12.3 k, SE=0.6 k) (Table S17, Supplementary Figure 9). When considering the number of shared loci as a proportion of the total polygenicity of each trait, the vast majority of loci influencing BD were also estimated to influence major depression (97%) and schizophrenia (96%) (Table S17, Supplementary Figure 9). Interestingly, within these shared components, the variants that influenced both BD and SCZ had high concordance in direction of effect (80%, SE=2%), while the portion of concordant variants between BD and MDD was only 69% (SE=1%) ( Table S17).

Genetic and causal relationships between BD and modifiable risk factors
Ten traits associated with BD from clinical and epidemiological studies were investigated in detail for genetic and potentially causal relationships with BD via LDSC 33 , generalized summary statistics-based Mendelian randomization (GSMR) 49 and bivariate gaussian mixture modeling 47 . BD has been strongly linked with sleep disturbances 50 , alcohol use 51 and smoking 52 , higher educational attainment 53,54 and mood instability 55 (which is a defining characteristic of the disorder). Most of these traits had modest but significant genetic correlations with BD (rg -0.05-0.35) ( Figure 3). Examining the effects of these traits on BD via GSMR, smoking initiation was associated with BD, corresponding to an OR of 1.49 (95% CI 1.38-1.61) for developing the disorder (P=1.74E-22) ( Figure 3). Testing the effect of BD on the traits, BD was significantly associated with reduced likelihood of being a morning person and increased number of drinks per week (P<1.47E-03) ( Figure 3). Positive bi-directional relationships were identified between BD and longer sleep duration, problematic alcohol use, educational attainment (EA) and mood instability ( Figure  3). Notably, the instrumental variables for mood instability were selected from a GWAS conducted in the general population, excluding individuals with psychiatric disorders 56 . For all of the aforementioned BDtrait relationships, the effect size estimates from GSMR were consistent with those calculated using the inverse variance weighted regression method, and there was no evidence of bias from horizontal pleiotropy. Full MR results are in Tables S18-19. Bivariate gaussian mixture modeling using MiXeR, indicated large proportions of variants influencing both BD and all other traits tested, particularly educational attainment, where approximately 98% of variants influencing BD were estimated to also influence EA. While cigarettes per day was a trait of interest, MiXeR could not model these data due to low polygenicity and heritability, and the effect of cigarettes per day on BD was inconsistent between MR methods, suggesting a violation of MR assumptions (Tables S18-20).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint

Figure 3: Relationships between bipolar disorder and modifiable risk factors based on genetic correlations, generalized summary statistics-based Mendelian randomization and bivariate gaussian mixture modeling
Venn diagrams depict MiXeR results of the estimated number of influencing variants shared between BD and each trait of interest (grey), unique to BD (blue) and unique to the trait of interest (orange). The number of influencing variants and standard error are shown in thousands. The size of the circles reflects the polygenicity of each trait, with larger circles corresponding to greater polygenicity. The estimated genetic correlation (rg) between BD and each trait of interest and standard error from LD Score regression is shown below the corresponding Venn diagram, with an accompanying scale (-1 to +1). The arrows above and below the Venn diagrams indicate the results of generalized summary statistics-based Mendelian randomization (GSMR) of BD on the trait of interest, and the trait of interest on BD, respectively. The GSMR effect size and standard error is shown inside the corresponding arrow. Solid arrows indicate a significant relationship between the exposure and the outcome, after correction for multiple comparisons (P<1.47E-03) and dashed arrows indicate a non-significant relationship.

Discussion
In a GWAS of 41,917 BD cases, we identify 64 associated genomic loci, 33 of which are novel discoveries. With a 1.5-fold increase in effective sample size compared with the PGC2 BD GWAS, this study more than doubled the number of associated loci, representing an inflection point in the rate of risk variant discovery. We observed consistent replication of known BD loci, including 28/30 loci from the PGC2 GWAS 24 and several implicated by other BD GWAS 15,16,17 , including a study of East Asian cases 57 .
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint The 33 novel loci discovered here encompass genes of expected biological relevance in BD, such as the ion channels CACNB2 and KCNB1. Amongst the 64 BD loci, 17 have previously been implicated in GWAS of schizophrenia 58 , and seven in GWAS of major depression 59 , representing the first overlap of genomewide significant loci between the mood disorders. Bivariate gaussian mixture modeling estimated that across the entire genome, almost all variants influencing BD also influence major depression and schizophrenia, albeit with variable effects 60 . SNPs in and around the MHC locus reached genome-wide significance for BD for the first time. However, unlike in schizophrenia, we found no influence of C4 structural alleles or gene expression 61 . Rather the association was driven by variation outside the classical MHC locus, with the index SNP (rs13195402) being a missense variant in BTN2A1, a brain-expressed gene 62 encoding a plasma membrane protein.
The genetic correlation of bipolar disorder with other psychiatric disorders was consistent with previous reports 63,64 . Our results also corroborate previous genetic and clinical evidence of associations between BD and sleep disturbances 65 , problematic alcohol use 66 and smoking 67 . While the genome-wide genetic correlations with these traits were modest (rg -0.05-0.35), MiXeR estimated that for all traits, more than 55% of trait-influencing variants also influence BD ( Figure 3). Taken together, these results point to shared biology as one possible explanation for the high prevalence of substance use in BD. However, excluding genetic variants associated with both traits, MR analyses suggested that smoking is also a putatively "causal" risk factor for BD, while BD has no effect on smoking, consistent with a previous report 68 . [We use the word "causal" with caution here as we consider MR an exploratory analysis to identify potentially modifiable risk factors which warrant more detailed investigations to understand their complex relationship with BD.] In contrast, MR indicated that BD had bi-directional "causal" relationships with problematic alcohol use, longer sleep duration and mood instability. Insights into the relationship of such behavioral correlates with BD may have future impact on clinical decision making in the prophylaxis or management of the disorder. Higher educational attainment has previously been associated with BD in epidemiological studies 53,54 , while lower educational attainment has been associated with schizophrenia and major depression 69,70 . Here, educational attainment had a significant positive effect on risk of BD and vice versa. Interestingly, MiXeR estimated that almost all variants that influence BD also influence educational attainment.
The integration of eQTL data with our GWAS results yielded 15 high-confidence genes for which there was converging evidence that their association with BD is mediated via gene expression. Amongst these were HTR6, a serotonin receptor targeted by antipsychotics and antidepressants 71 and MCHR1 (melaninconcentrating hormone receptor 1), a target of the antipsychotic haloperidol 71 . We note that for both of these genes, their top eQTLs have opposite directions of effect on gene expression in the brain and blood, possibly playing a role in the tissue-specific gene regulation influencing BD 72 . BD was associated with decreased expression of FURIN, a gene with a neurodevelopmental function which has already been the subject of functional genomics experiments in neuronal cells, following its association with schizophrenia in GWAS 73 . The top association in our GWAS was in the TRANK1 locus on chromosome 3, which has previously been implicated in BD 12,18,57 . Although BD-associated SNPs in this locus are known to regulate TRANK1 expression 74 , our eQTL analyses support a stronger but correlated regulation of DCLK3, located 87 kb upstream of TRANK1 41,75 . Both FURIN and DCLK3 also encode druggable proteins (although they are not targets for any current psychiatric medications) 71,76 . These eQTL results provide promising BD candidate genes for functional follow-up experiments. While several of these are in genome-wide significant loci, many are not the closest gene to the index SNP, highlighting the value of probing underlying molecular mechanisms to prioritize the most likely causal genes in the loci.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint GWAS signals were enriched in the gene targets of existing BD pharmacological agents, such as antipsychotics, mood stabilizers, and antiepileptics. However, enrichment was also found in the targets of calcium channel blockers, which are used to treat hypertension, and drugs for treating functional gastrointestinal disorders, many of which target cholinergic and serotonergic receptors (Table S8). Indeed, calcium channel antagonists have long been used in the treatment of BD, without becoming an established therapeutic approach and there is evidence that some antiepileptics have calcium channelinhibiting effects 77,78 . These results underscore the opportunity for repurposing some classes of drugs, particularly calcium channel antagonists, as potential BD treatments 79 . While there was no significant enrichment in the targets of any individual drug tested, this analysis may currently be limited by statistical power.
BD associations were enriched in gene sets involving calcium signaling, neuronal and synaptic pathways. Neuronal and synaptic pathways have been described in cross-disorder GWAS of multiple psychiatric disorders including BD [80][81][82] . Dysregulation of such pathways has also been suggested by previous functional and animal studies 83 . Analysis of single-cell gene expression data revealed enrichment predominantly in neurons (both excitatory and inhibitory), of many brain regions, in particular the cortex and hippocampus. These findings are similar to those reported in GWAS data of schizophrenia 84 .
Polygenic risk scores (PRS) for BD explained on average 4.75% of phenotypic variance (liability scale) across European cohorts, although this varied in different waves of the BD GWAS, ranging from 6.6% in the PGC1 cohorts to 2.9% in the External biobank studies (Supplementary Figure 7, Table S12). These results are in line with the ℎ "#$ % of BD per wave, which ranged from 24.6% (SE=0.01) in PGC1 to 11.9% (SE=0.01) in External studies (Table S3). Some variability in ℎ "#$ % estimates may arise from the inclusion of cases from population biobanks ascertained using ICD codes, who may have more heterogeneous clinical presentations or less severe illness than BD patients ascertained via inpatient or outpatient psychiatric clinics. Across the waves of clinically ascertained samples within the PGC, ℎ "#$ % and the R 2 of PRS also varied, likely reflecting clinical and genetic heterogeneity in the type of BD cases ascertained; the PGC1 cohorts consisted mostly of BD I cases 9 , known to be the most heritable of the BD subtypes 11,24 , while later waves included more individuals with BD II 24 . Overall, the ℎ "#$ % of BD calculated from the meta-analysis summary statistics was 18% on the liability scale, a decrease of ~2% compared with the PGC2 GWAS 24 , which was driven by the addition of cohorts with lower ℎ "#$ % estimates (Table S3). However, despite differences in ℎ "#$ % and R 2 of PRS per wave, the genetic correlation of BD between all waves was high (weighted mean rg=0.94, SE=0.03), supporting our rationale for combining cases with different BD subtypes or ascertainment to increase power for discovery of risk variants. In Europeans, individuals in the top 10% of PRS had an OR of 3.6 for BD, compared with individuals with average PRS (middle decile), which translates into a modest absolute lifetime risk of the disorder (7.2% based on PRS alone). While PRS are invaluable tools in research settings, the current BD PRS lack sufficient power to separate individuals into clinically meaningful risk categories, and therefore have no clinical utility at present 85,86 . PRS from this European BD meta-analysis yield higher R 2 values in diverse ancestry samples than PRS based on any currently available BD GWAS within the same ancestry 57 . However performance still greatly lags behind that in Europeans, and there is a pressing need for more and larger studies in other ancestry groups to ensure that any future clinical utility is broadly applicable. Exploiting the differences in LD structure between diverse ancestry samples will also assist in the fine-mapping of risk loci for BD.
Our analyses confirmed that BD is a highly polygenic disorder, with an estimated 8.6 k variants explaining 90% of its ℎ "#$ % . Hence, many more SNPs than those identified here are expected to account for the common variant architecture underlying BD. This GWAS marks an inflection point in risk variant discovery and we expect that from this point forward, the addition of more samples will lead to a dramatic increase . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint in genetic findings. Nevertheless, fewer genome-wide significant loci have been identified in BD than in a schizophrenia GWAS of comparable sample size 58 . This may be due to the clinical and genetic heterogeneity that exists in BD.
In summary, these new data advance our understanding of the biological etiology of bipolar disorder and prioritize a set of candidate genes for functional follow-up experiments. Several lines of evidence converge on the involvement of calcium channel signaling, providing a promising avenue for future therapeutic development.

Sample description
The meta-analysis sample comprises 57 cohorts collected in Europe, North America and Australia, totaling 41,917 bipolar disorder cases and 371,549 controls of European descent (Table S1). The total effective N, equivalent to an equal number of cases and controls in each cohort (4*Ncases*Ncontrols/(Ncases+Ncontrols)), is 101,962. For 52 cohorts, individual-level genotype and phenotype data were shared with the PGC. Cohorts have been added to the PGC in five waves (PGC1 9 , PGC2 24 , PGC PsychChip, PGC3 and External Studies); all cohorts from previous PGC BD GWAS were included. The source and inclusion/exclusion criteria for cases and controls for each cohort, are described in the Supplementary Note. Cases were required to meet international consensus criteria (DSM-IV, ICD-9 or ICD-10) for a lifetime diagnosis of BD, established using structured diagnostic instruments from assessments by trained interviewers, clinician-administered checklists or medical record review. In most cohorts, controls were screened for the absence of lifetime psychiatric disorders and randomly selected from the population. For five cohorts (iPSYCH 28 , deCODE genetics 29 , Estonian Biobank 30 , Trøndelag Health Study (HUNT) 31 and UK Biobank 32 ), GWAS summary statistics for BD were shared with the PGC. In these cohorts, BD cases were ascertained using ICD codes or self-report during a nurse interview, and the majority of controls were screened for the absence of psychiatric disorders via ICD codes. Follow-up analyses included four non-European BD case-control cohorts, two from East Asia (Japan 57 and Korea 87 ), and two admixed African American cohorts 22,88 , providing a total of 5,847 cases and 65,588 controls. These BD cases were ascertained through psychiatric interviews (Supplementary Note).
Genotyping, quality control and imputation PGC cohorts were genotyped following local protocols, after which standardized quality control, imputation and statistical analyses were performed centrally using RICOPILI (Rapid Imputation for COnsortias PIpeLIne) 89 , separately for each cohort. Briefly, the quality control parameters for retaining SNPs and subjects were: SNP missingness < 0.05 (before sample removal), subject missingness < 0.02, autosomal heterozygosity deviation (Fhet < 0.2), SNP missingness < 0.02 (after sample removal), difference in SNP missingness between cases and controls < 0.02, SNP Hardy-Weinberg equilibrium (P > 10 −10 in psychiatric cases and P > 10 -6 in controls). Relatedness was calculated across cohorts using identity by descent and one of each pair of related individuals (pi_hat > 0.2) was excluded. Genotype imputation was performed using the pre-phasing/ imputation stepwise approach implemented in Eagle v2.3.5 90 and Minimac3 91 to the Haplotype Reference Consortium (HRC) reference panel v1.0 92 . The five external cohorts were processed by the collaborating research teams using comparable procedures and imputed to the HRC or a custom reference panel as appropriate. Full details of the genotyping, quality control and imputation for each cohort are available in the Supplementary Note. Identical individuals between PGC cohorts and the Estonian Biobank and UK Biobank cohorts were detected using genotype-based checksums . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Genome-wide association study
For PGC cohorts, GWAS were conducted within each cohort using an additive logistic regression model in PLINK v1.90 93 . Ancestry informant principal components (PCs) were calculated within each cohort and included as covariates, as required. Association analyses of the X chromosome were conducted in males and females separately using the same procedures, with males coded as 0 or 2 for 0 or 1 copies of the reference allele. Results from males and females were then meta-analyzed within each cohort. For external cohorts, GWAS were conducted by the collaborating research teams using comparable procedures (Supplementary Note). To control test statistic inflation at SNPs with low minor allele frequency (MAF) in small cohorts, SNPs were retained only if cohort MAF was > 1% and minor allele count was > 10 in either cases or controls (whichever had smaller N). There was no evidence of stratification artifacts or uncontrolled inflation of test statistics in the results from any cohort (λGC 0.97-1.05). Metaanalysis of GWAS summary statistics was conducted using an inverse variance-weighted fixed effects model in METAL 94 . A genome-wide significant locus was defined as the region around a SNP with P<5E-08, with linkage disequilibrium (LD) r 2 > 0.1, within a 3000 kilobase (kb) window.

Overlap of loci with other psychiatric disorders
Genome-wide significant loci for BD were assessed for overlap with genome-wide significant loci for other psychiatric disorders, using the largest available GWAS results for major depression 59 , schizophrenia 58 , attention deficit/hyperactivity disorder 95 , post-traumatic stress disorder 96 , lifetime anxiety disorder 97 , Tourette's Syndrome 98 , anorexia 99 , alcohol use disorder or problematic alcohol use 66 , autism spectrum disorder 100 , mood disorders 101 and the cross-disorder GWAS of the Psychiatric Genomics Consortium 64 . The boundaries of the genome-wide significant loci were calculated in the original publications. Overlap of loci was calculated using bedtools 102 .

Enrichment analyses
P values quantifying the degree of association of genes and gene sets with BD were calculated using MAGMA v1.07 35 , implemented in FUMA v1.3.5d 62,103 . MAGMA uses Brown's method to combine SNP P values, while accounting for LD. Gene-based tests were performed for 18,721 genes (Bonferronicorrected P value threshold = 2.67E-06). A total of 15,481 curated gene sets from MSigDB V7.0 were tested for association with BD (Bonferroni-corrected P value threshold = 3.23E-06). Competitive gene-set tests were conducted correcting for gene size, variant density and LD within and between genes. Tissueset enrichment analyses were also performed using MAGMA implemented in FUMA, to test for enrichment of association signal in genes expressed in 54 tissue types from GTEx V8 (Bonferronicorrected P value threshold = 9.26E-04) 62,103 .
For single-cell enrichment analyses, publicly available single-cell RNA-seq data were compiled from five studies of the human and mouse brain 84,104-107 . Using a previously described method 36 , a metric of gene expression specificity was calculated by dividing the expression of each gene in each cell type by the total expression of that gene in all cell types, leading to values ranging from 0 to 1 for each gene (0 meaning that the gene is not expressed in that cell type and 1 meaning that all of the expression of the gene is in that cell type). MAGMAv1.06b 35 was used to test whether the 10% most specific genes for each cell type were enriched for genetic associations with BD, including relevant covariates. The P value threshold for significance was P < 9.1E-03, representing a 5% false discovery rate (FDR) across datasets. Full details are in the Supplementary Note. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint Further gene-set analyses were performed restricted to genes targeted by drugs, assessing individual drugs and grouping drugs with similar actions. This approach has been described previously 39 . Two groups of gene sets were defined. The first group comprised the targets of each drug in the Drug-Gene Interaction Database DGIdb v.2 37 and in the Psychoactive Drug Screening Database Ki DB 38 , both downloaded in June 2016 39 . The second set were drug sets grouped according to Anatomical Therapeutic Chemical (ATC) classes 39 . All analyses were performed using competitive gene-set analyses in MAGMA v1.06 35 . Multiple testing was controlled using a Bonferroni-corrected significance threshold of P < 2.86E-05 for drug-set analysis and P < 3.40E-04 for drug-class analysis, accounting for 1,746 drug-sets and 147 drug classes tested.

eQTL integrative analysis
A transcriptome-wide association study (TWAS) was conducted using the precomputed gene expression weights from PsychENCODE data (1,321 brain samples) 41 , available online with the FUSION software 40 . For genes with significant cis-SNP heritability (13,435 genes), FUSION software was used to test whether SNPs influencing gene expression are also associated with BD (Bonferroni-corrected P value threshold < 3.72E-06). For regions including a TWAS significant gene, TWAS fine-mapping of the region was conducted using FOCUS (fine-mapping of causal gene sets) 42 . A posterior inclusion probability (PIP) was assigned to each gene for being causal for the observed TWAS association signal and used to generate the 90%-credible gene set for each region 42 .
Summary data-based Mendelian randomization (SMR) 43,44 was applied to further investigate putative causal relationships between SNPs and BD via gene expression. SMR was performed using eQTL summary statistics from the eQTLGen (31,684 blood samples) 45 and PsychENCODE 41 consortia. SMR analysis is limited to transcripts with at least one significant cis-eQTL (P < 5E-08) in each dataset (15,610 in eQTLGen; 10,871 in PsychENCODE). The Bonferroni-corrected significance threshold was P < 3.20E-06 and P < 4.60E-06 for eQTLGen and PsychENCODE respectively. The significance threshold for the HEIDI test (heterogeneity in dependent instruments) was PHEIDI ≥ 0.01 44 . While the results of TWAS and SMR indicate an association between BD and gene expression, a non-significant HEIDI test additionally indicates either a direct causal role or a pleiotropic effect of the BD-associated SNPs on gene expression.

C4 imputation
To investigate the major histocompatibility complex (MHC; chr6:24-34 Mb on hg19), the alleles of complement component 4 genes (C4A and C4B) were imputed in 47 PGC cohorts for which individuallevel genotype data were accessible, totaling 32,749 BD cases and 53,370 controls. The imputation reference panel comprised 2,530 reference haplotypes of MHC SNPs and C4 alleles, generated using a sample of 1,265 individuals with whole-genome sequence data, from the Genomic Psychiatry cohort 108 . Briefly, imputation of C4 as a multi-allelic variant was performed using Beagle v4.1 109,110 , using SNPs from the MHC region that were also in the haplotype reference panel. The output consisted of dosage estimates for each of the common C4 structural haplotypes (e.g., AL-BS, AL-AL, etc.) for each individual. The five most common structural forms of the C4A/C4B locus (BS, AL, AL-BS, AL-BL, and AL-AL) could be inferred with reasonably high accuracy (generally 0.70 < r2 < 1.00). The imputed C4 alleles were tested for association with BD in a joint logistic regression that included (i) terms for dosages of the five most common C4 structural haplotypes (AL-BS, AL-BL, AL-AL, BS, and AL), (ii) rs13195402 genotype (top lead SNP in the MHC) and (iii) PCs as per the GWAS. The genetically regulated expression of C4A was predicted from the imputed C4 alleles using a model previously described 61 . Predicted C4A expression was tested for association with BD in a joint logistic regression that included (i) predicted C4A expression, (ii) rs13195402 genotype (top lead SNP in the MHC) and (iii) PCs as per the GWAS. Full details are in the Supplementary Note. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint Polygenic risk scoring PRS from our GWAS meta-analysis were tested for association with BD in individual cohorts, using a discovery GWAS where the target cohort was left out of the meta-analysis. Briefly, the GWAS results from each discovery GWAS were pruned for LD using the P value informed clumping method in PLINK v1.90 93 (r 2 0.1 within a 500 kb window) based on the LD structure of the HRC reference panel 92 . Subsets of SNPs were selected from the results below nine increasingly liberal P value thresholds (pT) (5E-08, 1E-04, 1E-03, 0.01, 0.05, 0.1, 0.2, 0.5, 1). Sets of alleles, weighted by their log odds ratios from the discovery GWAS, were summed into PRS for each individual in the target datasets, using PLINK v1.90 implemented via RICOPILI 89,93 . PRS were tested for association with BD in the target dataset using logistic regression, covarying for PCs as per the GWAS in each cohort. PRS were tested in the external cohorts by the collaborating research teams using comparable procedures. The variance explained by the PRS (R 2 ) was converted to the liability scale to account for the proportion of cases in each target dataset, using a BD population prevalence of 2% and 1% 111 . The weighted average R 2 values were calculated using the effective N for each cohort. To assess cross-ancestry performance, PRS generated from the meta-analysis results were tested for association with BD using similar methods in a Japanese sample 57 , a Korean sample 87 and two admixed African American samples 22,88 . Full details of the QC, imputation and analysis of these samples are in the Supplementary Note. LD score regression LD Score regression (LDSC) 33 was used to estimate the ℎ "#$ % of BD from GWAS summary statistics. ℎ "#$ % was converted to the liability scale, using a lifetime BD prevalence of 2% and 1%. LDSC bivariate genetic correlations attributable to genome-wide SNPs (rg) were estimated with 255 human diseases and traits from published GWAS and 514 GWAS of phenotypes in the UK Biobank from LD Hub (http://ldsc.broadinstitute.org) 46 . Adjusting for the number of traits tested, the Bonferroni-corrected P value thresholds were P < 1.96E-04 and P < 9.73E-05 respectively.

MiXeR
We applied bivariate gaussian mixture models to the GWAS summary statistics for BD and other complex traits, using MiXeR v1.3 47,48 . MiXeR provides univariate estimates of the proportion of non-null SNPs ("polygenicity") and the variance of effect sizes of non-null SNPs ("discoverability") in each phenotype. In the cross-trait analysis, MiXeR models additive genetic effects as a mixture of four components, representing SNPs not influencing either trait, SNPs influencing only one of the two traits, and SNPs influencing both traits. These components are then plotted in Venn diagrams. After fitting parameters of the model, the dice coefficient, which represents the proportion of overlapping variants, is calculated. Further details are provided in the Supplementary Note.

Mendelian randomization
Seventeen traits associated with BD in clinical or epidemiological studies were selected for Mendelian randomization (MR) to dissect their relationship with BD (Supplementary Note). Bi-directional generalized summary statistics-based Mendelian randomization (GSMR) 49 analyses were performed between BD and the traits of interest using GWAS summary statistics. The instrumental variables (IVs) were selected by a clumping procedure internal to the GSMR software with parameters: --gwas-thresh 5e-8 --clump-r2 0.01. Traits with less than 10 IVs available were excluded from the GSMR analyses, resulting in 10 traits tested (Bonferroni-corrected P value threshold < 2.5E-03). The HEIDI-outlier test (heterogeneity in dependent instruments) was applied to test for horizontal pleiotropy (PHEIDI < 0.01) 49 . For comparison, the MR analyses were also performed using the inverse variance weighted regression method, implemented via the TwoSampleMR R package, using the IVs selected by GSMR 112,113 . To further investigate horizontal . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint pleiotropy, the MR Egger intercept test was conducted using the TwoSampleMR package 112,113 and MR-PRESSO software was used to perform the Global Test and Distortion Test 114 .

Data availability
The PGC's policy is to make genome-wide summary results public. Summary statistics will be made available through the PGC website upon publication (https://www.med.unc.edu/pgc/results-anddownloads). Data are accessible with collaborative analysis proposals through the Bipolar Disorder Working Group of the PGC (https://www.med.unc.edu/pgc/shared-methods/how-to/). This study included some publicly available datasets accessed through dbGaP -PGC bundle phs001254.v1.p1. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.09.17.20187054 doi: medRxiv preprint