Abstract
Genetics have nominated many schizophrenia risk genes that lack functional interpretation. To empower such interpretation, we executed interaction proteomics for six risk genes in human induced neurons and found the resulting protein network to be enriched for common variant risk of schizophrenia in Europeans and East Asians. The network is down-regulated in layer 5/6 cortical neurons of patients and can complement fine-mapping and eQTL data to prioritize additional genes in GWAS loci. A sub-network centered on HCN1 is enriched for common variant risk and also contains proteins (HCN4 and AKAP11) enriched for rare protein-truncating mutations in patients with schizophrenia and bipolar disease. Our findings establish brain cell-type-specific interactomes as an organizing framework to facilitate interpretation of genetic and transcriptomic data in schizophrenia and psychiatric diseases.
One Sentence Summary Neuronal protein interactomes is an organizing framework for integrating genetic and transcriptomic data in schizophrenia.
Main Text
Schizophrenia is a debilitating psychiatric disorder with a strong genetic component occurring in ∼0.3% of the global population with huge implications for patients, families, and society (1, 2). The last years have seen large advances in mapping the genetic architecture of disease and in generating catalogs of risk variants and genes. These efforts have revealed a highly polygenic disorder with contributions from hundreds of genes and rare and common variants across a wide spectrum of allele frequencies (3–7). Genetic risk of schizophrenia correlates between diverse populations such as Europeans and East Asians (8), suggesting that a discrete and shared set of cellular networks and pathways at the protein level (we will collectively refer to these as networks, hereafter) in cell types of the human brain are involved in its biology. Developing paradigms for systematically using genetic information as an entry point to investigate these brain cell-type-specific networks is therefore a key goal of psychiatry and genetics. Here, we aimed to bring together the newest developments in genetics, single-cell RNA sequencing, neuronal cell modeling, and proteomics to better understand schizophrenia at the level of brain cell-type-specific protein interactomes.
Analyses of postmortem brains from patients with schizophrenia (9) and integration of genetic information with RNA sequencing data from human and mouse brains (10–12) have converged on cortical excitatory neurons as a key biological conduit of genetically encoded risk. While other brain cell types likely also play a role, cortical excitatory neurons have therefore emerged as a pivotal cell type in which to initiate systematic mapping of cellular protein networks implicated by genetics and transcriptomics in schizophrenia. A recent study showed that adding extrinsic neuronal patterning to pluripotent stem cells (PSCs) expressing NGN2 generates glutamatergic patterned induced neurons (iNs) that behave like cortical excitatory neurons at the molecular, morphological, and functional levels (13, 14). By adapting this protocol, we were able to develop a scalable workflow to routinely produce billions of iNs in homogeneous cell populations. Combined with the availability of robust and reproducible genetic datasets in schizophrenia, this creates a unique opportunity to systematically investigate intracellular protein interaction networks of schizophrenia risk genes in a relevant human cellular context. While other sources of neurons (e.g., murine brains, human postmortem brains, spheroids, or organoids) should be considered in the future, we started with iNs because this design choice minimizes experimental noise and biochemical artefacts introduced by a mixture of different cell types and produces a dataset that is maximally biologically and genetically interpretable.
To identify schizophrenia risk genes as the basis of our interactome experiments we designed and executed a three-step procedure (Fig. 1A and Data S1). First, we identified 445 genes in previously reported genome-wide significant loci from the Psychiatric Genomics Consortium (PGC) genome-wide association study (GWAS; phase 2) of 37K schizophrenia cases and 113K controls (3) (Set 1). Second, we filtered this set to 37 genes within single protein-coding gene loci (Set 2). Third, we integrated data from orthogonal genetic studies (e.g., high-density genotyping, exome sequencing, and earlier targeted studies of individual genes; Materials and Methods) to identify a subset of 10 genes supported by multiple independent lines of evidence (Set 3): CACNA1C, CACNB2, CSMD1, CUL3, GRIN2A, HCN1, RIMS1, SATB2, TCF4, and ZNF804A. We additionally included SYNGAP1 in the major histocompatibility complex (MHC) region in all three sets due to strong orthogonal evidence for its involvement in schizophrenia.
(A) Three-step procedure to identify Sets 1-3 by refining schizophrenia GWAS [PGC phase 2] data, where Set 3 was defined as ‘index genes’ and used as the basis for downstream experiments. (B) Cumulative density of gnomAD pLI scores for different gene sets. ‘Genome’ indicates genes in the pLI dataset [excluding Sets 1-3]; ‘Sets 1-3’ indicate genes in Sets 1-3 with available pLI scores; ‘SCHEMA 5% FDR’ and ‘SCHEMA EWS’ indicate genes with FDR < 0.05 or 3.7e-3 [exome-wide significance] in the SCHEMA exome sequencing analysis, respectively. (C) Frontal cortex RNA expression of gene sets across ten developmental stages. Median expression and standard error [SE] of each gene set were derived from the BrainSpan exon microarray dataset. ‘Random’ indicates genes randomly sampled from the BrainSpan dataset; ‘Sets 1-3’ indicate genes in Sets 1-3 with available BrainSpan data; ‘SCHEMA’ indicates exome-wide significant genes from SCHEMA. Shaded regions indicate median expression of genes with FDR < 3.7e-3 [exome-wide significance], 0.05, 0.25, or 0.5 in SCHEMA with darker grey indicating greater significance. (D) Western blot analysis of index proteins in iPSCs, iNs [at differentiation time points day 3, week 2, week 3, week 4, and week 7], three cancer cell lines [TF-1, K562, U937], HEK293 cells, and mouse cortex. SATB2 and ZNF804A are excluded from this panel due to lack of detectable expression in iNs.
Genes implicated in schizophrenia are under strong genetic selection and have elevated expression in the frontal cortex (15). Therefore, in order to assess the enrichment of schizophrenia risk genes in Sets 1-3, we compared their gnomAD (16) pLI scores (i.e., the probability of being loss-of-function intolerant, where genes under strong genetic selection have higher pLI scores) and BrainSpan (17) expression to schizophrenia risk genes from the recent Schizophrenia Exome Sequencing Meta-Analysis (SCHEMA) study (15). In terms of pLI scores, Sets 1-3 all have increasingly higher scores compared to other genes in the genome (one-tailed KS test P = 1.1e-3, 1.7e-3, and 1.3e-5 for Sets 1-3, respectively; Fig. 1B and Data S2). Set 3 scores are significantly higher than that of Set 2 (one-tailed KS test P = 5.8e-4), which in turn are higher than that of Set 1 (one-tailed KS test P = 0.019). Furthermore, Set 3 is under the same degree of genetic selection as the exome-wide significant (FDR < 3.7e-3) SCHEMA genes (two-tailed KS test P = 0.23). In terms of gene expression, the expression profile of Set 1 mirrors that of random genes during frontal cortical development up until adulthood; Set 2 has a postnatal expression profile that resembles SCHEMA genes with FDR of 0.25-0.5; and Set 3 has a postnatal expression profile that strongly mirrors the exome-wide significant SCHEMA genes (Fig. 1C and Data S3). Overall, these patterns support our three-step approach to gene selection based on refining GWAS data and indicate that Set 3 is enriched for bona fide schizophrenia risk genes. We proceeded to use Set 3 as the starting point of our experiments and refer to these 11 genes and their encoded proteins as ‘index genes’ and ‘index proteins’, respectively.
To study the expression patterns of the index proteins across different stages of iN maturation, we tested 58 commercially available antibodies and identified 31 with competency to detect the 11 index proteins (Data S4). We differentiated iPSCs to day three and weeks two, three, four, and seven, and confirmed protein expression of CACNA1C, CACNB2, CSMD1, CUL3, GRIN2A, HCN1, RIMS1, SYNGAP1, and TCF4 in neuron lysates by western blot (Fig. 1D and Fig. S1); SATB2 and ZNF804A lacked detectable neuronal expression or high-quality reagents and were excluded from further experiments. When index protein expression is compared between iNs, three cancer cell lines (i.e., TF-1, K562, and U937), HEK293 cells, and mouse cortex homogenate, CACNA1C, CSMD1, GRIN2A, HCN1, RIMS1, and SYNGAP1 display a neuron-specific expression profile in iNs compared to all other cell types.
Next, we tested 42 antibodies for their ability to immunoprecipitate (IP) the index proteins (Data S4) and were able to IP seven index proteins (CACNA1C, CACNB2, CUL3, HCN1, RIMS1, SYNGAP1, TCF4) in ∼seven billion iNs for subsequent mass spectrometry (MS) analyses. In total, we carried out 23 IP-MS experiments at five neuron differentiation time points (weeks one, two, three, four, and seven). We performed quality control (QC) and analyzed each experiment using Genoppi (18), calculating the log2 fold change (FC) and corresponding statistical significance for each protein identified in the index protein IPs compared to the controls, and then defining proteins with log2 FC > 0 and FDR ≤ 0.1 as the significant interaction partners of the index protein. We disregarded four IP experiments that did not meet our QC criteria (i.e., the log2 FC correlation between replicates is < 0.5 or the index protein is not positively enriched at FDR ≤ 0.1). The remaining 19 high-quality IPs of CACNA1C, CUL3, HCN1, RIMS1, SYNGAP1, and TCF4 have a median replicate log2 FC correlation of 0.87, with the index proteins enriched at a median FDR of 8.2e-4 (Fig. 2A-C, Fig. S2, Data S5 and S6). In previous work we have established that the validation rate of interactions identified using a similar IP-MS approach, cell model, and QC thresholds is 70-90% (18). Ten of the 19 IPs are further supported by the observation that they are enriched for known protein interactors derived from the InWeb_InBioMap (InWeb, hereafter) resource (19) (Data S5).
(A) Scatter plot showing log2 FC correlation between replicate 1 [x-axis] and replicate 2 [y-axis] of an IP of CACNA1C at week 3 of neuron differentiation [Pearson’s r = 0.74]. (B) Volcano plot showing log2 FC [x-axis] and -log10 P-value [y-axis] of the CACNA1C IP from (A). For (A) and (B), the index protein [CACNA1C] is shown in red, significant interaction partners [log2 FC > 0 and FDR ≤ 0.1] in green, and other detected proteins in blue. Known InWeb interactors are indicated by black border circles, with the subset that are significant in the IP highlighted in yellow [overlap P = 1.8e-2]. Calcium channel components [alpha, beta, and alpha2delta subunits] are in orange. (C) Representative volcano plots of IPs for six index proteins. The same coloring scheme from (A) and (B) is applied here. (D) The combined interaction network of six index proteins resulting from 19 individual IPs. Index proteins are indicated as red nodes and interaction partners are indicated as purple nodes. Size and color of the interaction partner nodes indicate the number of index proteins linked to each partner with larger and darker nodes representing more recurrent partners [distribution shown in upper right pie chart]. Edges represent protein interactions with colors indicating whether each interaction is known in InWeb [blue] or potentially novel [grey; distribution shown in lower right pie chart].
As an example, in a CACNA1C IP experiment in iNs at week 3 of differentiation, CACNA1C itself is the most enriched protein (log2 FC = 2.0 and FDR = 8.2e-4; Fig. 2A,B). The replicate log2 FC correlation is 0.74 and the significant interaction partners are enriched for known CACNA1C interactors in InWeb (P = 1.8e-2). Notably, the interaction partners include all known L-type calcium channel subunits: the extracellular CACNA2D1 and the intracellular CACNB1 and CACNB3 (20), strongly supporting the biological validity of our IP data. On the other hand, > 94% of the interaction partners are not found in InWeb nor in an IP of CACNA1C executed in mouse heart tissue with the same antibody (21), illustrating the potential for new discoveries using our data. We observed analogous results in an HCN1 IP performed at week 7 of iN differentiation, in which we also identified all known subunits of the channel (22) and an enrichment of known InWeb interactors among the significant interaction partners (P = 0.045; Fig. S3). In addition, we compared multiple IPs of the same index protein across neuron differentiation time points and cell lines (for CACNA1C, HCN1, and SYNGAP1), and observed agreement between these IPs in terms of their protein log2 FC correlations (the median correlation between time points is 0.75; Fig. S4). Together, these results illustrate the quality and consistency of the interactome data we have generated for schizophrenia-related index proteins in human iNs.
In addition to analyzing the 19 individual IPs, we integrated IP data across index proteins and time points to create nine additional consolidated datasets (Data S7). These datasets represent the combined interaction network of a single index protein across multiple experiments or time points (i.e., CACNA1C, HCN1, RIMS1 and SYNGAP1), the combined network of multiple index proteins at one time point (i.e., week 2, 3, 4, and 7), and the combined network of all six index proteins across all time points (i.e., ‘all combined’). The all combined network contains 1,238 interaction partners of the six index proteins and, similar to the CACNA1C IP highlighted above, > 96 % of the interactions in this network are potentially novel interactions not found in InWeb (Fig. 2D). Genes encoding these interaction partners have relatively high expression in the frontal cortex throughout brain development similar to schizophrenia risk genes reaching 5% FDR in SCHEMA (Fig. S5A and Data S3). SynGO (23) gene set analysis also found the network to be enriched for genes involved in various biological processes in the synapse (Fig. S5B and Data S8). Overall, we successfully mapped the neuronal protein interactomes of six proteins that are transmembrane (CACNA1C, HCN1), cytosolic (CUL3, SYNGAP1), and involved in multiple neuronal signaling processes (RIMS1, SYNGAP1, TCF4). The resulting interaction networks are high-quality, include a high percentage of newly reported interactions, and span many areas of the cell biology of cortical excitatory neurons.
To test the networks for association to schizophrenia, we assessed the enrichment of common variant risk across the networks using PGC GWAS data containing schizophrenia cases and controls of European (EUR) or East Asian (EAS) ancestry (3, 8). For these analyses we created a more conservative version of the combined networks to reduce false positive interactions. This was achieved by excluding proteins that showed up as non-interactors (i.e., detected proteins with log2 FC ≤ 0 or FDR > 0.1) in any of the source IPs that went into a corresponding network (see ‘stringent interactors’ defined in Data S7). In total, we performed genetic analyses for 11 interaction networks, including the nine combined networks and two individual IP networks for CUL3 and TCF4. Importantly, our analyses are conditional on the ‘non-interactors’ detected in our IP experiments, meaning that we tested whether interaction partners in the different networks are enriched for genetic risk compared to other proteins expressed in iNs. If so, this would suggest that our networks are relevant to schizophrenia over and above the background proteome of the neuronal cell model.
Using MAGMA (24), we found that many of the interaction networks are indeed enriched for schizophrenia risk when conditioned on other proteins expressed in iNs. Notably, the enrichment is overall consistent across EUR and EAS ancestries (Fig. 3A and Data S9). We observed nominal significance (P < 0.05) in both ancestries for the all combined network, the week 4 network (includes IPs of CACNA1C, HCN1, RIMS1, and SYNGAP1), the week 7 network (includes IPs of CACNA1C, CUL3, HCN1, and SYNGAP1), and the HCN1 network (includes IPs at weeks 2, 3, 4, and 7). Additionally, the week 3 and CACNA1C networks show nominal significance in EUR and the same direction of effect in EAS; and vice versa for the week 2 and SYNGAP1 networks. At a Bonferroni-corrected P-value threshold (P < 0.05/22, adjusting for 11 networks and two ancestries), many of the networks remain significant, including the week 4 network in both EUR and EAS ancestries and the all combined, week 2, week 7, and HCN1 networks in EAS. We also performed a cross-ancestry meta-analysis, in which the all combined (nominal P = 5.7e-7, adj. P = 1.3e-5), week 2 (nominal P = 1.5e-3, adj. P = 0.033), week 4 (nominal P = 4.5e-6, adj. P = 9.9e-5), week 7 (nominal P = 4.2e-6, adj. P = 9.3e-5), and HCN1 (nominal P = 1.6e-5, adj. P = 3.5e-5) networks are all significant at the same Bonferroni-corrected threshold. In parallel to the MAGMA analysis, we further validated these findings using a genetic risk score enrichment analysis method that estimates the genetic risk on holdout samples not included in GWAS and therefore is less sensitive to outliers (Materials and Methods). Most of the findings were replicated, including the significant results for the all combined, week 4, and HCN1 networks in both ancestries and their meta-analysis (Fig. S6B and Data S9).
(A) Common variant enrichment of schizophrenia [SCZ] or height in Europeans [EUR], East Asians [EAS], or their meta-analysis. Enrichment coefficients, standard errors [SE], and P-values were calculated using MAGMA. Nominal [P < 0.05] or Bonferroni [P < 0.05/22, adjusting for 11 networks and two ancestries] significance is highlighted in orange or red, respectively. (B) Common variant enrichment of SCZ, attention deficit hyperactivity disorder [ADHD], autism spectrum disorders [ASD], bipolar disorder [BIP], or major depressive disorder [MDD] calculated using MAGMA. Cross-ancestry meta-analysis results are shown for SCZ; EUR ancestry results are shown for all other traits. Nominal or Bonferroni [P < 0.05/22] significance is indicated by single or double asterisks, respectively. (C) Enrichment of cell-type-specific differentially expressed genes in the prefrontal cortex of schizophrenia patients compared to controls. P-values were calculated using one-tailed hypergeometric tests. Nominal or Bonferroni [P < 0.05/220, adjusting for 11 networks and 20 cell types] significance is indicated by single or double asterisks, respectively. Networks tested are the combined network of all IPs [All combined], the combined networks at each time point [Week 2 to Week 7], the combined networks for CACNA1C, HCN1, RIMS1, and SYNGAP1, and the individual IP networks for CUL3 and TCF4.
To further explore whether the genetic risk enrichment we observed in the interaction networks is specific to schizophrenia, we repeated the same analyses using GWAS data of other psychiatric disorders, including attention deficit hyperactivity disorder (ADHD) (25), autism spectrum disorders (ASD) (26), bipolar disorder (BIP) (27), and major depressive disorder (MDD) (28), as well as height (29, 30) as a negative control (Fig. 3A,B, Fig. S6A,B, and Data S9). Across these phenotypes, we only observed significant enrichment for ASD in the CUL3 network and the week 7 network, which contains IP data of CUL3 and SYNGAP1, both of which have been previously linked to ASD (31, 32). Importantly, the networks that show the most robust enrichment for schizophrenia (i.e., the all combined, week 4, and HCN1 networks) do not show enrichment for the other phenotypes. This supports that most of the index protein interaction partners we identified in iNs are specifically concentrating common variant risk of schizophrenia.
Besides analyzing data from common variants, we also tested whether the interaction networks are enriched for rare variant risk of schizophrenia using gene-based association statistics from SCHEMA, and only observed nominal significance for the CUL3 network (Fig. S6C and Data S9). This is in contrast with the robust common variant enrichment results we observed for these networks, and repeating the analysis using larger sequencing datasets in the future will be needed to determine whether the lack of agreement is due to true biological differences between common and rare variants associated with schizophrenia or due to current power limitations in the genetic data. In parallel, we performed analogous analyses using association statistics from sequencing studies of developmental disorders (DD) (33) and ASD (32), as well as gnomAD pLI scores. At a Bonferroni-corrected P-value threshold (P < 0.05/11, adjusting for 11 networks), we found the CUL3 network to be enriched for DD genes, which is in agreement with previously implicated role of CUL3 in developmental delay (31, 34). In addition, the week 4 and RIMS1 networks are significantly enriched for genes with high pLI scores, indicating that some members of these networks are likely intolerant to loss-of-function mutations and may be involved in essential cell functions.
Next, we explored whether the interaction networks are enriched for brain-layer-specific transcriptional perturbations observed in patients with schizophrenia. We analyzed data from a recent single-cell RNA sequencing study (12), which identified differentially expressed genes (DEGs) in schizophrenia patients versus controls in 20 annotated cell types in the prefrontal cortex. Many of our networks show nominally significant overlaps with the cell-type-specific DEGs in neuronal cell types, which include both upper and deep layer excitatory neurons and inhibitory interneurons (Fig. 3C and Data S10). A population of layer 5/6 cortico-cortical projection neurons (‘Ex-L5-6CCb’) shows the most robust enrichment, reaching Bonferroni significance (P < 0.05/220, adjusting for 11 networks and 20 cell types) for the all combined (nominal P = 3.5e-5, adj. P = 7.7e-4), week 4 (nominal P = 3.2e-5, adj. P = 7.0e-4), and week 7 (nominal P = 9.5e-5, adj. P = 2.1e-3) networks. Subsequently, we separately analyzed the up- and down-regulated DEGs in this cell type, showing that the enrichment signals are strongly driven by the down-regulated DEGs (Fig. S7 and Data S10). As DEGs in ‘Ex-L5-6CCb’ were also independently found to be enriched for genes implicated by schizophrenia GWAS (12), there is an intriguing convergence between our results and findings in patients that converge on deep layer cortical excitatory neurons as a relevant cell type for studying cellular networks involved in schizophrenia.
Functionally interpreting GWAS data to identify causal genes based on genome-wide significant SNPs is a major challenge in the field of genetics. Since our analyses indicate that the combined interaction network of all IPs is genetically and transcriptionally relevant in schizophrenia, we exploited this insight and used the network to prioritize genes in schizophrenia GWAS loci. We created a ‘social Manhattan plot’ by integrating our interaction data with the most recent PGC schizophrenia GWAS (phase 3) of 69K cases and 237K controls (35). We highlighted observed interactions between the index proteins and other proteins (locus proteins, hereafter) encoded by genes in the 270 reported genome-wide significant loci (Fig. 4A and Data S11). In total, we identified 114 locus proteins in 72 loci that are linked to ≥ 1 index protein in the social Manhattan plot. We further overlap these locus proteins with those prioritized by fine-mapping (FINEMAP) or eQTL co-localization (summary-based Mendelian randomization, SMR) analysis (35), pinpointing seven proteins that were also prioritized by FINEMAP (ACTR1B, EPN2, GABBR2, KIAA1549, MSI2, NEGR1, PDE4B) and three proteins that were also prioritized by SMR (SF3B1, PCDHA2, PCDHA8). In addition, our network was able to nominate candidate genes in 44 distinct loci that lack prioritization results from FINEMAP or SMR analysis.
(A) Social Manhattan plot of genes encoding the index proteins [red] and their interaction partners [purple] in genome-wide significant loci in PGC schizophrenia GWAS [phase 3]. Interaction partners prioritized by FINEMAP or SMR analysis are highlighted in magenta. Grey lines indicate observed protein-protein interactions in our data; interactions validated in (D) and (E) are highlighted in blue. (B) Cumulative density of gnomAD pLI scores for different gene sets. ‘Genome’ indicates genes in the pLI dataset [excluding PGC3 genes]; ‘PGC3’ indicates genes in PGC GWAS [phase 3] loci; ‘FINEMAP’, ‘SMR’, and ‘Network’ indicate PGC3 genes prioritized by FINEMAP, SMR, or our interactome data, respectively; ‘Overlap’ indicates genes prioritized by all three approaches; ‘SCHEMA 5% FDR’ and ‘SCHEMA EWS’ indicate genes with FDR < 0.05 or 3.7e-3 [exome-wide significance] in SCHEMA, respectively. (C) Frontal cortex RNA expression of gene sets across ten developmental stages. Median expression and standard error [SE] of each gene set were derived from the BrainSpan exon microarray dataset. ‘Random’ indicates genes randomly sampled from the BrainSpan dataset; ‘PGC3’, ‘FINEMAP’, ‘SMR’, ‘Network’, and ‘Overlap’ indicate gene sets as described in (B); ‘SCHEMA’ indicates exome-wide significant genes from SCHEMA. Shaded regions indicate median expression of genes with FDR < 3.7e-3 [exome-wide significance], 0.05, 0.25, or 0.5 in SCHEMA with darker grey indicating greater significance. (D) Western blot analysis on IPs of index proteins [named on the left] to detect the presence of selected locus proteins [Detection]. (E) Western blot analysis on reverse IPs of selected locus proteins [named on the left] to detect the presence of index proteins [Detection]. For (D) and (E), the input [IN] represents 10% of the protein lysate utilized for each IP; enrichment of the Detection protein is quantified as fold change [FC] of western blot signal in the IP lane over the IgG control lane.
In order to compare our network prioritization approach to FINEMAP or SMR prioritization, we looked at the pLI score distributions and BrainSpan expression profiles of several gene sets, including all genes in the GWAS loci (PGC3), genes prioritized by our network (Network), FINEMAP, or SMR, and genes prioritized by all three approaches (Overlap). We observed that the Network and FINEMAP genes have comparable pLI scores (two-tailed KS test P = 0.92), and both sets have significantly higher scores compared to all PGC3 genes (one-tailed KS test P = 1.0e-6 and 3.6e-5, respectively), PGC3 genes in the same loci as the prioritized genes (one-tailed KS test P = 5.7e-7 and 3.9e-6, respectively), and the SMR genes (two-tailed KS test P = 3.8e-2 and 8.9e-3, respectively; Fig. 4B and Data S2). The Overlap genes have even higher pLI scores that are comparable to the SCHEMA genes with FDR < 0.05 (two-tailed KS test P = 0.27), although the differences between the Overlap genes and other genes in the Network, FINEMAP, or SMR supersets are not statistically significant (one-tailed KS test P = 0.10-0.60). When looking at gene expression throughout frontal cortical development, the Network, FINEMAP, SMR, and Overlap genes all have elevated expression compared to all PGC3 genes. In particular, the Network and Overlap genes have higher expression than the FINEMAP and SMR genes, and show postnatal expression profiles close to that of SCHEMA genes with FDR < 0.05 (Fig. 4C and Data S3).
Finally, we compared the Network genes to other PGC3 genes encoding proteins detected in our neuronal cell model during IP-MS (i.e., non-interactors in Data S7) and observed that these two gene sets have comparable pLI scores and BrainSpan expression (Data S2 and S3). These results indicate that the proteome of the neuronal cell model may contribute to the pLI score and expression enrichment we observed for the Network genes. However, by first showing that our interaction network is significantly enriched for schizophrenia genetic risk and transcriptional perturbations over and above the non-interactors, and then showing that the network genes in GWAS loci have higher pLI scores and frontal cortical expression throughout development compared to other genes in the loci, we demonstrate how our interactome data can complement other methods such as statistical fine-mapping and eQTL co-localization analyses to nominate schizophrenia risk genes from GWAS data.
To demonstrate that many of the locus proteins we prioritized could be true interaction partners of the index proteins, we validated selected interaction partners with available antibodies by western blot (Fig. 4D and Fig. S8). We also executed reverse IPs for selected locus proteins followed by western blots for the index proteins, establishing that index-locus protein-protein interactions can be detected bidirectionally (Fig. 4E and Fig. S9). In summary, we confirmed interactions involving eight locus proteins (CKB, CTNND1, FXR1, INA, MARK3, MEF2C, SEMA4C, SRPK2), including multiple interactions for FXR1, MARK3, SEMA4C, and SRPK2. We note that the proteins we validated through western blots are representative of the rest of the network-prioritized proteins in terms of pLI scores and BrainSpan expression patterns (Data S2 and S3). Furthermore, we validated an interaction between two of the index proteins, HCN1 and SYNGAP1. These results illustrate the overall robustness and reproducibility of the interaction data we have generated and support the observed convergence between the index protein interactomes. Follow-up investigation on the locus proteins we have prioritized here can be informative for determining whether their corresponding genes are true schizophrenia risk genes responsible for the genetic signals observed in GWAS.
The last years have seen enormous progress in generating catalogs of genetic risk variants in schizophrenia, developing protocols to produce faithful stem cell models of human neurons relevant to disease, high-resolution proteomic approaches, and powerful computational frameworks to QC, validate, and integrate proteomic and genetic data. Here, we bring together these advances in a general framework to study cell-type-specific pathway relationships of schizophrenia risk genes in human cortical excitatory neurons. Our data illustrate the vast, and nearly entirely untapped, potential for biological pathway discovery based on high-quality protein-protein interaction networks generated from human brain cell types. The network datasets are validated by the convergence of common risk variants from both European and East Asian patients, which implicates the networks in schizophrenia across populations and indicates that they will be a rich substrate for targeted follow-up mechanistic discoveries. This convergence is specific to schizophrenia, as the networks are generally not enriched for common variant risk of other psychiatric disorders. In addition, we integrated the networks with single-cell RNA sequencing data to pinpoint deep layer cortical excitatory neurons as a key cell type for schizophrenia, and with GWAS data to complement statistical fine-mapping and eQTL data for systematic interpretation of candidate genes in GWAS loci.
In our analyses, the HCN1 network emerges as a particularly promising lead for follow-up investigation. This network is enriched for schizophrenia common variant risk in both Europeans and East Asians, suggesting that perturbed signaling through the hyperpolarization-activated cyclic nucleotide-gated potassium channel, a heterotetrameric complex consisting of HCN1-4 (22), could play a role in schizophrenia across populations. Indeed, two HCN1 interactor genes we identified, HCN4 and AKAP11, are also enriched for schizophrenia-associated protein-truncating variants (PTVs) in SCHEMA (FDR = 4.2e-3 and 1.3e-2, respectively) (15). In a recent meta-analysis of schizophrenia and bipolar disorder cases (36), AKAP11 further emerged as an exome-wide significant (P = 2.8e-9) gene enriched for ultra-rare PTVs. PTVs are among the most interpretable genetic variants as their effects on disease most commonly track with decreased function and expression of the gene. The observation that the HCN1 network is nominally enriched for down-regulated DEGs in layer 5/6 cortico-cortical projection neurons (‘Ex-L5-6CCb’) from schizophrenia patients (Fig. S7) also supports the hypothesis that members of the network may be involved in schizophrenia through loss-of-function or decreased expression. Together, strong support from common variants and suggestive convergence of PTVs and transcriptional perturbations in patients with schizophrenia highlight a network involved in neuronal potassium signaling, which also contains numerous drug targets that can be explored in follow-up studies (Fig. S10 and Data S12).
Another intriguing finding from our results is the recurrent interaction between CACNA1C and C4A (observed in four out of five CACNA1C IP-MS experiments; Data S6), suggesting that the L-type calcium channel may be a functional binding site of the complement cascade in synaptic pruning of the developing prefrontal cortex. This interaction is consistent with the emerging hypothesis that complement-mediated modulation of synapse stability or function contributes significantly to risk for schizophrenia (37, 38). As additional support, we also identified C3, another component of the complement cascade, as an interaction partner of CACNA1C in one of the CACNA1C IPs (Data S6). In order to follow up on our findings, we performed substantial QC to identify suitable immunoreagents for IP of C4A (Data S4). However, due to the complex post-translational modifications of C4A, we were unable to generate high-quality IP-MS data with the available antibodies in iNs. Thus, alternative approaches will be needed to further validate and explore the CACNA1C-C4A interaction.
In the future, with larger genetic datasets to identify disease risk genes and the ability to create stem cell models of other relevant brain cell types (e.g., inhibitory interneurons) at the scale required for systematic protein interactome experiments, we expect that the approach described here can be applied to uncover additional insights into the biology of schizophrenia and provide rich orthogonal information that is not captured by other approaches such as GWAS, exome sequencing, single-cell RNA sequencing, and whole-proteome analyses. More generally, our study establishes brain cell-type-specific protein interaction data as a key data type for studying psychiatric diseases, and provides an organizing framework to bring together genomic, transcriptomic, and proteomic data to model intracellular biochemical networks implicated in disease pathology. In fact, we have applied the same framework in a parallel study (co-submitted manuscript) to generate an iN-derived protein interaction network for ASD risk genes nominated by exome sequencing. We found the network to be enriched for rare variant risk of ASD and performed functional and mechanistic follow-up experiments to uncover ASD-related biology captured by either individual protein interactions or convergent signals in the network. Together, our schizophrenia and ASD studies showcase a framework that strongly empowers interpretation of data generated by both common and rare variant genetics across two different groups of disorders. This framework can therefore contribute to laying the foundation for functional discovery and, eventually, rationally designed medicines in psychiatric diseases.
Data Availability
Raw IP-MS data will be available through MassIVE upon publication of the manuscript.
Funding
This work was supported by grants from the Stanley Center for Psychiatric Research, the US National Institute of Mental Health (R01 MH109903 and U01 MH121499), the Simons Foundation Autism Research Initiative (awards 515064 and 735604), the Lundbeck Foundation (R223-2016-721 and R350-2020-963), the US National Institute of Diabetes and Digestive and Kidney Diseases (U01 DK078616 and T32 DK110919), and a Broad Next10 grant.
Author contributions
EN, NP, WC, JMS, GP, JR and JMM carried out tissue culture, tested antibodies for WB/IP experiments and ran WBs. EN, NP, WC, JMS and BT executed IP experiments. SBE, BT, CRS, AMA, MS and JJ ran MS experiments and analysis. SAC supervised the MS experiments. AK and EM analyzed and consolidated IP-MS data. YHH, KT and TL consolidated IP-MS data and ran BrainSpan analysis. RL ran common variant enrichment analysis. YHH ran the rest of the analyses. TS, TG and HH designed genetic risk enrichment analyses. NF supervised and managed the study. KCE and KL designed and supervised the study. KL initiated and led the study. YHH, EN, RL, KT, NF, HH, KCE and KL wrote the manuscript with input from co-authors.
Competing interests
KCE is a co-founder of Q-State Biosciences, Quralis, and Enclear, and currently employed at BioMarin Pharmaceutical. All the other authors declare no competing interests.
Data availability
Raw IP-MS data will be available through MassIVE upon publication of the manuscript.
Supplementary Materials
Materials and Methods
Text S1 to S3
Figs. S1 to S10
Data S1 to S12
References (39-59)
Acknowledgments
We thank Carole Manneville, Matthias Mueller, Katie Worringer, and Ajamete Kaykas at Novartis for sharing the hDFn cell line; Ellen Beauchamp, Andrew Guirguis, Rebecca Gorelov, and Zuzana Tothova for sharing cancer cell lines; Karl Clauser for managing and uploading the proteomics data; and Steve Hyman, Mark Daly, and Ben Neale for insightful scientific discussions.