Exome sequencing of 20,979 individuals with epilepsy reveals shared and distinct ultra-rare genetic risk across disorder subtypes

Abstract Identifying genetic risk factors for highly heterogeneous disorders like epilepsy remains challenging. Here, we present the largest whole-exome sequencing study of epilepsy to date, with >54,000 human exomes, comprising 20,979 deeply phenotyped patients from multiple genetic ancestry groups with diverse epilepsy subtypes and 33,444 controls, to investigate rare variants that confer disease risk. These analyses implicate seven individual genes, three gene sets, and four copy number variants at exome-wide significance. Genes encoding ion channels show strong association with multiple epilepsy subtypes, including epileptic encephalopathies, generalized and focal epilepsies, while most other gene discoveries are subtype-specific, highlighting distinct genetic contributions to different epilepsies. Combining results from rare single nucleotide/short indel-, copy number-, and common variants, we offer an expanded view of the genetic architecture of epilepsy, with growing evidence of convergence among different genetic risk loci on the same genes. Top candidate genes are enriched for roles in synaptic transmission and neuronal excitability, particularly postnatally and in the neocortex. We also identify shared rare variant risk between epilepsy and other neurodevelopmental disorders. Our data can be accessed via an interactive browser, hopefully facilitating diagnostic efforts and accelerating the development of follow-up studies.


Introduction
Epilepsy is a group of heterogeneous disorders, characterized by an enduring predisposition to generate epileptic seizures. 1Epilepsy has a prevalence of 4-10 per 1,000 individuals worldwide, making it one of the most common neurological conditions. 2 The role of genetic contributions to epilepsy causality has been long recognized, [3][4][5] yet delineating the full range of genetic causes of the epilepsies remains a core challenge.
Whole-exome sequencing (WES) has proven effective in gene discovery for Mendelian disorders, including familial and severe epilepsy syndromes.2][13] Most discoveries have been based on hypothesisdriven approaches, which are restricted to one or a few predefined candidate gene(s). 12,14Hypothesisfree, WES analyses are still in their relative early stages and are not yet powered to produce exome-wide significant results. 11,13,15]15 In this study, we present the largest WES analysis of epilepsy to date, from the Epi25 Collaborative, a global collaboration committed to sequencing and deep-phenotyping up to 25K individuals with epilepsy.Our previous data collection and analysis of ~17K and ~29K individuals in case-control cohorts have revealed the extent of rare coding variant risk for all three major types of non-lesional epilepsies (DEEs, GGE, and NAFE).Here, we expand the evaluation to ~54K individuals, comprising 20,979 cases and 33,444 matched controls spanning six genetic ancestries, with improved power for detecting 'ultra-rare' variant (URV) association.We apply a hypothesis-free approach to evaluate the excess of URVs (single nucleotide variants [SNVs] and short insertions/deletions [indels]) in cases versus controls, at both an individual-gene and a gene-set level, across the entire exome, and separately for each type of epilepsy.With the enlarged sample size, we discover exome-wide significant genes for different types of epilepsies, implicating both shared and distinct rare variant risk factors.Integrating these findings with associations implicated by copy number variants (CNVs) and genome-wide association study (GWAS), we identify convergence of different types of genetic risk factors in the same genes.More broadly, comparing results to other large-scale WES studies, we provide significant evidence for an overlapping rare variant risk between epilepsy and other neurodevelopmental disorders, although different variant effects may be implicated in a shared gene.Together, our WES analysis at the unprecedented scale makes an important step forward in discovering rare variant risk underlying a spectrum of epilepsy syndromes and offers a valuable resource for generating hypotheses about syndrome-specific etiologies.

Study overview
We performed WES and harmonized variant detection of an initial dataset of over 70,000 epilepsyaffected and control individuals recruited across 59 sites globally.After stringent quality control (QC; Methods), we included a total of 20,979 individuals with epilepsy and 33,444 controls without known neurological or neuropsychiatric conditions in our URV association analysis, roughly doubling the sample size in our last release of Epi25 WES study. 15The samples were predominantly of European genetic ancestries (76.6% non-Finnish and 2.7% Finnish), with smaller proportions of African (7.7%),East Asian (5.3%),South Asian (1.1%), and Admixed American (6.6%) genetic ancestries.Epilepsy cases were matched with controls of the same genetic ancestry as estimated by principal component analysis and samples were pooled for a joint burden analysis of URVs.In the primary analysis, we evaluated the excess of ultra-rare, deleterious SNVs and indels -protein-truncating/damaging missense (MPC [missense badness, PolyPhen-2, and regional constraint] 16 score≥2) variants observed at no more than five copies among the entire dataset (corresponding to a MAF<0.005%)-in individuals with epilepsy compared to controls, using a Firth logistic regression model with adjustment for sex and genetic ancestry (Methods).We performed the burden analysis at both an individual-gene and a gene-set level, across the entire exome, and separately for each epilepsy type -where 1,938 individuals were diagnosed with DEEs, 5,499 with GGE, and 9,219 with NAFE -as well as for all epilepsy-affected individuals combined (including an additional 4,323 with other epilepsy syndromes).Stringent Bonferroni correction was applied to adjust for 18,531 consensus coding sequence (CCDS) genes and 5,373 gene sets (in the gene-based and geneset-based burden analysis, respectively) and eight case-control comparisons across four epilepsy groups and two variant classes.To ensure our model was well calibrated, we used ultra-rare synonymous variants as a negative control for all tests (Extended Data Fig. 1).In parallel, we performed CNV discovery and burden analysis on the same dataset (see Methods for details), with a particular focus on the joint burden of rare CNV deletions and protein-truncating URVs.
In comparison to protein-truncating URVs, burden analysis of damaging missense URVs (MPC ≥2; Methods) did not identify individual genes at exome-wide significance.Nevertheless, the top associations captured known epilepsy genes -notably the SLC6A1 (MIM: 137165) and GABRB3 (MIM: 137192) genes, both involved in the GABAergic pathway 13 and showing enrichment across multiple epilepsy types (Fig. 1b).Most of the previously implicated variants in these two genes were also missense, 27,28 and our study discovered an additional 24 and 26 damaging missense URVs in SLC6A1 and GABRB3, respectively, increasing the existing candidates by ~50% (Supplementary Data 3).Another top hit -KDM4B (MIM: 609765) -was found specifically associated with DEEs, which has not been previously reported.

Gene-set-based burden analysis facilitates understanding of epilepsy etiology
To further investigate genes and biologically relevant pathways associated with epilepsy, we performed burden tests at a gene-set level, which essentially expanded the test from a single gene to a set of genes that share a particular function.Different from our prior Epi25 analyses, which focused on a few previously implicated gene sets, we systematically tested collections of gene entities that belong to a gene family 29 or encode a protein complex 30 (Supplementary Data 4; Methods), in search for novel associations.
The most pronounced signal, as described in the gene-based burden of protein-truncating URVs, was from the GATOR1 complex in NAFE (log[OR]=2.7,P<2.2×10 -16 ; Fig. 2a).We identified a total of 56 distinct protein-truncating URVs in the three GATOR1-encoding genes (38 in DEPDC5, 11 in NPRL3, and 7 in NPRL2; Supplementary Data 5), among which 45 appeared novel according to the most recent study of epilepsyrelated GATOR1 variants by Baldassari et al. 14 In contrast to Baldassari et al, where most (>70%) GATOR1 protein-truncating variant carriers were familial, only 20% of the carriers in our study cohort had a known family history of epilepsy.Both familial and non-familial cases showed significant burden of GATOR1 protein-truncating URVs (log[OR]=3.4 and 2.6, P=3.7×10 -14 and P<2.2×10 -16 , respectively; Supplementary Data 2), which reinforces the increasingly important role of GATOR1 genes in the etiology of focal epilepsy.
Several strong signals emerged in the analysis of damaging missense URVs, led by well-established ion channel protein complexes and gene families (Fig. 2b).The top association was the GABAA receptor complex, encoded by GABRA1 (MIM: 137160), GABRB2 (MIM: 600232), and GABRG2 (MIM: 137164; [α1]2[β2]2[γ2], the most abundantly expressed isoform in the brain), 31 which controls the majority of inhibitory signaling in the central nervous system.The complex showed extensive enrichment across all epilepsy types (DEEs: log[OR]=2.0,P=2.2×10 -7 , GGE: log[OR]=1.3,P=5.5×10 -5 , NAFE: log[OR]=1.1,P=1.1×10 -4 ), recapturing the pervasive role of GABAA receptors across the spectrum of severities in epilepsy (reviewed in 32 ).Further dissecting the signals with respect to the structural domain of the complex, we observed stronger enrichment in the transmembrane domains (TMDs) than the extracellular domain (ECD), particularly for DEEs and GGE; and DEEs presented a unique signal in the second TM helix lining the ion channel pore of the receptor 33 (Fig. 2c; Supplementary Data 6).To avoid potential bias introduced by MPC prioritizing domains of regional missense constraint, we recapitulated the results using other missense deleteriousness metrics (PolyPhen-2 34 and SIFT 35 ; Supplementary Data 6).These patterns collectively point to an association of damaging missense URVs in the pore-forming domain with a more severe form of epilepsy.

Protein structural analysis characterizes missense URVs in ion channel genes
The strong burden of damaging missense, but not protein-truncating, URVs in genes encoding ion channels suggests a pathophysiological mechanism of protein alteration (e.g., gain-of-function or dominant-negative effects) rather than haploinsufficiency.Given the specialized structure of ion channels, we sought to further characterize missense URVs at a protein structure level.Specifically, we leveraged experimentally resolved three-dimensional structures of ion channel proteins, most of which were cocrystallized heterotrimeric subunits, and applied Rosetta 36 to assess the energy changes (change in Gibbs free energy ΔΔG/ddG in abbreviation) of protein folding upon a particular missense URV; a decrease in Gibbs free energy of unfolding, i.e., a positive ddG value suggests a destabilizing effect of the variant on protein and a negative value suggests a stabilizing effect.We computed ddG for a total of 1,782 missense URVs -independent of MPC deleteriousness -across 16 ion channel protein complexes (Supplementary Data 7; Methods).There was, as expected, a positive correlation between ddG and MPC (=0.15,P=8.3×10 -11 ; Fig. 3a), indicating that missense URVs in missense-constrained regions are more likely to cause a change in protein stability.
Even with MPC being a strong indicator of damaging missense burden in epilepsies, incorporating ddG further stratified the association signals (Fig. 3b; Supplementary Data 8).Significant enrichment was found for both destabilizing (ddG≥1 kcal/mol) and stabilizing (ddG≤-1 kcal/mol) URVs, which suggests a diverging molecular basis for these missense URVs.To explore potential structural properties that are associated with the protein stability change, we again dissected the signals by the protein structural domains.Divergent distributions were found for destabilizing and stabilizing missense URVs, with the former enriched in the ECD of the complex and the latter in the TMD (Fig. 3c; Supplementary Data 9).While only functional tests can further elucidate the underlying molecular mechanisms, our analyses provide a set of missense URVs to test in epilepsy model systems, which might reveal the variable effects on protein function such mutations are creating and add to the explanation of how ion channel dysfunction could produce a broad spectrum of epilepsy syndromes.

Burden of CNV deletions converges with protein-truncating URVs
In parallel with SNVs and indels, we performed variant calling of CNVs on the same dataset (Methods).After sample QC, we examined gene burden of rare CNVs in 18,963 epilepsy cases -including 1,743 DEEs, 4,980 GGE, and 8,425 NAFE -versus 29,804 controls (~90% of initial; Methods).A gene was considered affected by a CNV if ≥10% of its coding exons were deleted or ≥75% were duplicated.
The most significant signal was from CNV deletions in the NPRL3 gene, with 11 deletions found in NAFE cases versus 0 in controls (log[OR]=4.1,P=9.4×10 -7 ; Supplementary Data 10).Notably, NPRL3 was also one of the top hits implicated by protein-truncating URVs in NAFE, and jointly analyzing the two rendered NPRL3 exome-wide significant (log[OR]=3.8,P=8.1×10 -12 ; Fig. 4a and Supplementary Data 10).Among the top ten genes with protein-truncating URV burden, about one-third (14/[10×4 epilepsy groups]) were found affected by a CNV deletion, and the vast majority (11/14) showed enrichment in epilepsy cases (log[OR]>0; Fig. 4b).These included DEPDC5, which together with NPRL3 reinforces a haploinsufficiency mechanism for GATOR1-related focal epilepsies (Fig. 4c).8][39][40][41][42][43] Analysis of CNV duplications did not show any individual genes close to exome-wide significance (Supplementary Data 10).Collectively, the joint burden analysis suggests at least partial convergence in the protein-truncating-effect caused by SNVs/indels and CNVs, and therefore, it may provide a strategy for improving the power of detecting rare, large-effect genetic risk factors for epilepsy.

Burden of URVs reveals shared genetic risk between common and rare variation for GGE
Similar to other common neurodevelopmental disorders, the common forms of epilepsy -GGE and NAFE -have both common and rare genetic risk factors.In partnership with the International League Against Epilepsy (ILAE) consortium, we performed GWAS meta-analysis of over 29K individuals with common epilepsies, 44 which revealed 26 genome-wide significant loci with markedly different genetic architectures between GGE (22 loci) and NAFE (0 loci).To investigate the overlap of epilepsy association between common and rare variation, we tested the burden of URVs in 23 genes that were prioritized as the likely causal genes underlying the 22 GGE loci.The analysis identified significant enrichment for proteintruncating URVs from GGE in the 23 GWAS genes, while in contrast, none for URVs from NAFE (Fig. 5a; Supplementary Data 11).This result has two-fold implication: first, there is emerging evidence of convergent common and rare variant risk in the same genes for epilepsy, and second, the convergence tends to be epilepsy type-specific.
At the individual gene level, 13 of the 23 GGE GWAS genes showed an excess of protein-truncating URVs (log[OR] 0.2-2.6;Supplementary Data 11).The lead gene was RYR2 (MIM: 180902), in which 14 proteintruncating URVs were observed in our GGE cohort (log[OR]=1.8,P=8.6×10 -6 ), and the reported GWAS hit was located in the intronic region (rs876793; Fig. 5b).RYR2 encodes a ryanodine receptor that mediates the release of Ca(2+) from endoplasmic/sarcoplasmic reticulum into cytoplasm for excitation-contraction coupling.Mutations in RYR2 have been well-known in the etiology of arrhythmogenic disorders, [45][46][47][48][49][50] while more recent studies reported that the same mutation can cause GGE independent of arrhythmias. 51,52Our finding, together with the GWAS result, adds weight to the hypothesis that RYR2 mutations likely constitute a neuro-cardiac calcium channelopathy, 51,52 where mutant receptors may induce either arrhythmias or GGE depending on their selective expression in the heart or in the brain.

Burden analysis implicates shared rare variant risk between epilepsy and other neurodevelopmental disorders
Recent WES studies have revealed substantial rare variant risk for neurodevelopmental disorders (NDDs).Analysis of de novo mutations in severe developmental disorders (DDs) has discovered 285 genes at exome-wide significance, 53 and more recent rare variant associations in autism spectrum disorder (ASD) 54 and schizophrenia (SCZ) 55 have implicated 185 and 32 genes at a false discovery rate of 5%, respectively.To explore how these and our findings may point to common genetic etiologies, we examined the burden of URVs from epilepsy in the established NDD genes (Supplementary Data 12).Significant enrichment was found for all three gene sets associated with DD, ASD, and SCZ (Fig. 6a), suggesting that there is shared genetic risk of rare variation among the broader spectrum of NDDs.DD and ASD presented stronger signals than SCZ, across all epilepsy types (being strongest in DEEs) and for both protein-truncating and damaging missense URVs.This pattern implies a larger overlapping genetic component between epilepsy and DD/ASD than SCZ, which is in line with the high comorbidity of DD/ASD and epilepsy, in particular DEEs.Meanwhile, given the known genetic overlapping between DD and ASD, we repeated the analyses on the subsets of mutually exclusive NDD genes (i.e., 196 DD-only, 99 ASD-only, and 22 SCZ-only genes, respectively).Although attenuated, there remained clear rare variant signals shared by epilepsy and other NDDs (Supplementary Data 12).About one-third (136/409) NDD genes showed nominally significant enrichment of deleterious URVs in at least one epilepsy type (Supplementary Data 12).The vast majority (128/136=94.1%)were DD/ASD genes, and only one gene -KDM6B (MIM: 611577) -was shared by all three NDD gene sets.7][58][59][60] Interestingly, URVs in KDM6B associated with epilepsy were exclusively missense (MPC≥2), whereas KDM6B variants implicated in DDs were predominately protein-truncating (Fig. 6b; Supplementary Data 13).All missense variants were clustered at the KDM6B catalytic domain (JmjC) and C-terminal helix/zinc motifs, which are important for enzyme-cofactor binding and protein stability. 61Protein structural analysis predicted that most of the damaging missense variants tend to have a destabilizing effect on the KDM6B protein (ddG>0; Supplementary Data 13), especially those in DDs, while diverging effects were observed for epilepsy and SCZ (Fig. 6b).These results suggest that, even converging in the same gene, rare variant risk may differ in its severity and/or the molecular mechanism that underlies specific phenotypes of NDDs.

Discussion
In the largest WES study of epilepsies to date, we characterize the contribution of ultra-rare genetic variation to a severity spectrum of epilepsy syndromes.This work, from the Epi25 Collaborative, involves global efforts in aggregating sequence data, deep-phenotyping epilepsy cohorts, harmonizing variant detection and quality control, and finally analyzing and interpreting the genetic data for etiological and clinical implications.
Our exome-wide burden analyses redemonstrated the role of known epilepsy genes with improved power and discovered potential novel rare variant risk factors for different types of epilepsies.Most associations were identified in a particular epilepsy type, implicating distinct genetic etiologies underlying different epilepsies.Protein-truncating URVs presented the strongest signal, with six individual genes surpassing the stringent exome-wide significance threshold.Five genes (NEXMIF, SCN1A, SYNGAP1, STX1B, and WDR45) were associated with the severe group of DEEs, while notably, the most significantly-associated gene -DEPDC5 -was found in NAFE.The implication of GATOR1 with the enlarged sample size has particular clinical applications -given that GATOR1 functions as a negative regulator of the mTORC1 pathway, mTORC1 inhibitors may offer a promising treatment strategy for patients carrying deleterious GATOR1 variants. 62In comparison to protein-truncating URVs, analysis of damaging missense URVs remained underpowered to identify individual genes at exome-wide significance.Yet, strong associations emerged when aggregating sets of genes that share common functions.The top associations were predominantly genes encoding ion channel complexes, such as Nav/Kv channels and GABAA receptors.These gene sets did not show significant enrichment of protein-truncating URVs, which suggests a more diverse pathophysiological mechanism than haploinsufficiency.We further explored this through protein structural analysis and indeed observed diverging effects of missense URVs on ion channel protein stability.In particular, the enrichment of stabilizing URVs in the pore-forming domain for inhibitory neurotransmission appeared intriguing given that the pathophysiological condition of epilepsy is hyperexcitability.One hypothesis fitting with functional studies in channelopathies is that the overstabilization of a particular structural conformation would interfere with the conformational dynamics required for ion channel gating -for instance, recent structural studies have established a 'dual-gate' model: 63 upon sustained agonist binding the ion channel will gradually transit from the active agonistbound conformation to an agonist-bound shut state refractory to activation (i.e., desensitization); mutant ion channels favoring a desensitized conformation may consequently reduce the efficacy of GABAergic inhibition and lead to an elevated excitability.As both loss-and gain-of-function mechanisms underlying conformational changes in ligand/voltage-gated ion channels being increasingly described in rare epilepsies with distinguishable clinical features, 28,[64][65][66] our results may add to the molecular mechanisms that explain the varying types of epilepsies associated with ion channel dysfunction.Meanwhile, we emphasize the necessity of dedicated functional investigation for specific missense variants.In this study, we deliberately separated the analysis of protein-truncating and damaging missense URVs with a view to delineating the differing mechanisms; assuming a protein-truncating-like effect for all damaging missense URVs identified no additional significant genes but weakened our analytical power -most (~90%) genes enriched for protein-truncating URVs had either no damaging missense URV or decreased enrichment when the two variant classes were combined (Extended Data Fig. 2).
Potential novel associations were identified or strengthened in several genes and gene sets.Top candidates were predominately implicated in DEEs, including ANKRD11 gene and the NSL complex with protein-truncating URVs and KDM4B gene with damaging missense URVs.Numerous experiments have demonstrated the importance of NSL complex in regulating core transcriptional and signaling networks required for normal development (reviewed in 67 ), and mutations or deregulation of NSL complex genes has been associated with neurodevelopmental disorders. 671][72] The KDM4B gene encodes a demethylase enzyme that regulates gene expression in the brain from embryonic stages. 73Neuron-specific kdm4b-deficient mice were shown to display spontaneous epileptic-like seizures, and more recent data implicated KDM4B rare variants in global developmental delay. 74Collectively, these genes have an already established role in neurodevelopmental disorders that present shared clinical characteristics with DEEs.This clinical overlap lends support to these newly implicated associations, while requiring advanced understanding of the crosstalk between epilepsy and developmental encephalopathies within DEEs (e.g. for a specific case, whether developmental encephalopathy is a direct sequential consequence of epileptic seizures or, they share a common genetic etiology but different pathological pathways and occur in parallel [reviewed in 75 ]).Such complexity was also reflected by the substantial excess of DEEs-URVs in DD-associated genes.Another new candidatethe PDE gene family -was found associated with NAFE.PDE enzymes catalyze the hydrolysis of cAMP and cGMP, two key second messengers modulating a variety of neuronal pathways [76][77][78] , in particular with a dual regulatory function to increase the strength of excitatory neural circuits and decrease inhibitory synaptic plasticity; 79,80 it is thus plausible that loss of PDE catalytic activity may result in a net excess of neural excitation and an increased susceptibility to epilepsy.In support of this, previous studies have reported that administration of PDE10A inhibitors induced epileptic seizures. 81,82Lastly, a particularly noteworthy finding was the RYR2 gene associated with GGE, which was prioritized from combining evidence of rare and common genetic variation.This result provides an example of convergent epilepsy generic risk across the allele frequency spectrum and also represents an instance of epilepsy subtypespecific association, motivating the generation of more specific mechanistic hypotheses.While we only highlighted GGE in the present analysis, we note that we have previously observed the effect in both GGE and NAFE, using a relatively lenient inclusion criterion (by aggregating URVs across the top 100 genes from GWAS). 83Together, we would suggest that the convergence of rare and common variant risk may also exist in NAFE, though being much stronger in GGE.
Besides nominating new genes, identifying new candidate variants in known epilepsy genes will also facilitate the characterization of specific mechanisms, especially given the highly heterogeneous nature of epilepsy.Over the past five-year efforts from Epi25 WES, there has been a steady increase in the number of deleterious URVs discovered in epilepsy-associated genes (Extended Data Fig. 3a); almost all (130/134) genes with a known monogenic cause have been identified with at least one deleterious URVs (Extended Data Fig. 3b), providing a valuable resource for downstream functional analysis.Interestingly, while the number of damaging missense URVs increases at a higher rate than protein-truncating URVs, the number of additional genes identified with a missense URV grows more slowly (Extended Data Fig. 3b).Such a pattern reflects an accumulation of candidate missense URVs in the same set of genes, which highlights the need of effective approaches to characterize and/or categorize the function of missense variants.[86][87][88] The global collaborative efforts of large-scale sequencing and deep-phenotyping have been gaining power to discover ultra-rare genetic risk factors underlying specific epilepsy syndromes.Compared to our prior URV results, 15 the top genes that maintained or obtained stronger association in this enlarged study are all known epilepsy genes (Extended Data Fig. 3c).This trend demonstrates a high replicability of existing gene findings, and likewise, calls for larger sample sizes to confirm the present results.Substantial sample sizes will be needed for the common complex forms of epilepsies; as projected in our initial WES study, 13 with >9,000 cases and >20,000 controls we now begin to identify the first exome-wide significant gene for NAFE.The challenge comes from both the heterogeneity in the electroclinical syndromes within each epilepsy subtype (e.g., childhood/juvenile absence epilepsy and juvenile myoclonic epilepsy in GGE) and the heterogeneity in their genetic etiologies, for which there is inevitably a compromise between larger sample size and finer sample classification.A promising strategy to accelerate gene discovery is to integrate results of URVs with other types of genetic variation (e.g., CNVs, common variants); as there is growing evidence that different genetic risk factors converge at least partially in the same genes, an extended model that jointly analyzes these variants would likely provide the most powerful and informative results beyond any single approach.Overall, the ongoing sequencing and genotyping efforts, together with the ever-increasing scale of genetic association studies, will continue to expand and/or refine our understanding of the genetic architecture of epilepsy, continue to delineate specific underlying pathophysiological processes, and hopefully enable a move towards more targeted treatment approaches through both precision diagnosis and the development of precision, or gene-based, therapies.

Study design and participants
We collected DNA and detailed phenotyping data of individuals with epilepsy from 59 participating Epi25 sites in Europe, North America, Australasia, and Asia (Supplementary Information).In total, we analyzed 20,979 epilepsy cases -including 1,938, 5,499, and 9,219 individuals with DDEs, GGE, and NAFE, respectively, and 4,323 with other epilepsies (mostly lesional focal epilepsy [2,495] and febrile seizures [FS]/FS+ [327]) -and 33,444 controls.Control individuals were aggregated from a subset of Epi25 sites, local collections at the Broad Institute, or dbGaP and were not screened for neurological or neuropsychiatric conditions (see Supplementary Table 2).

Phenotyping procedures
Epilepsies were diagnosed by epileptologists on clinical grounds (see below for specific criteria for DEEs, GGE, and NAFE) in accordance with the International League Against Epilepsy (ILAE) classification at the time of diagnosis and recruitment. 1,13,15Phenotyping data were entered into the Epi25 Data repository (https://github.com/Epi25/epi25-edc)via case record forms hosted on the REDCap platform 89 .The data fields do not contain protected health information (PHI).Data collected from previous coordinated efforts with phenotyping on databases (e.g., the Epilepsy Phenome/Genome Project 90 and the EpiPGX project [Web Resources]) were integrated via scripted transformations.All phenotyping data underwent review for uniformity among sites and quality control (QC) by automated data checking and manual review as required; the process was overseen by a phenotyping committee with clinical expertise.

Epilepsy case definitions
Epilepsy diagnoses and classification for Epi25 have been described previously. 13,15In brief, diagnosis of DEEs required severe refractory epilepsy of unknown etiology, with developmental plateau or regression, and with epileptiform features on electroencephalogram (EEG).Diagnosis of GGE required a history of generalized seizure types (generalized tonic-clonic, absence, or myoclonic seizures) with generalized epileptiform discharges on EEG; exclusion criteria included focal seizures, moderate-to-severe intellectual disability, and epileptogenic lesions on neuroimaging if available.Diagnosis of NAFE required a history of focal seizures with either focal epileptiform discharges or normal findings on EEG; exclusion criteria included primary generalized seizures, moderate-to-severe intellectual disability, and neuroimaging lesions (except hippocampal sclerosis).

Informed consent
Adult participants, or the legal guardian of child participants, provided signed informed consent at participating centers based on the local ethical requirements at the time of collection.The consent was required not to exclude data sharing to be included in the study.Consent forms for samples collected after January 25, 2015 required specific language according to the National Institutes of Health's Genomic Data Sharing Policy (see web resources).

Whole-exome sequencing data generation
All samples were sequenced at the Broad Institute of MIT and Harvard on the Illumina HiSeq X or NovaSeq 6000 platforms with 150 bp paired-end reads.Exome capture was performed using multiple kits: the Illumina Nextera Rapid Capture Exomes or TruSeq Rapid Exome enrichment kit (target size 38 Mb) and the Twist Custom Capture (target size 37 Mb).Sequence data in the form of BAM files were generated via the Picard data-processing pipeline and well-calibrated reads were aligned to the human reference GRCh38.Variants were jointly called across all samples via the Genome Analysis Toolkit (GATK) bestpractice pipeline 91 and were annotated using Variant Effect Predictor (VEP) 92 with custom annotations, including LOFTEE (Loss-Of-Function Transcript Effect Estimator) 93 and MPC (missense badness, PolyPhen-2, and regional constraint), 16 using Hail. 94riant and sample QC Initial variant QC criteria included: (1) genotype quality (GQ) ≥20, (2) read depth (DP) ≥20, (3) allele balance (AB) ≥0.2 and ≤0.8, (4) passing the GATK Variant Quality Score Recalibration (VQSR) filter, (5)   residing in GENCODE coding regions that were well-covered by both capture platforms, where 80% of the Illumina or Twist sequenced samples had at least 10x coverage, and ( 6) outside of the low-complexity (LCR) regions.95 Additional variant QC were applied after sample QC (see below for details): (1) call rate ≥0.98, (2) case-control call rate difference ≤0.02, and (3) Hardy-Weinberg Equilibrium (HWE) test p value ≥10 -6 .
Sample QC criteria, on the basis of all sequenced samples and the initial QC-ed variants, included: (1)   mean call rate ≥0.90, (2) mean GQ ≥57, (3) mean DP ≥25, (4) freemix contamination estimate ≤2.5%, (5) percent chimeric reads ≤2%, and ( 6) the genetically imputed sex matching with self-reported sex.We performed principal component analysis (PCA) to classify samples into genetic ancestral groups, using a random forest model trained on the 1000 Genomes data; samples with a probability ≥0.9 to be one of the six populations -Non-Finnish European (NFE), Finnish (FIN), African (AFR), East Asian (EAS), South Asian (SAS), Ad Mixed American (AMR) -were retained.Within each ancestral group, we examined cryptic relatedness based on identity-by-descent (IBD) estimates and excluded one sample from each pair of related individuals with an IBD>0.2.Additional sample QC were applied on a population-and cohortspecific basis, which excluded outliers with >4 standard deviations from the mean of (1) transition/transversion ratio, (2) heterozygous/homozygous ratio, and (3) insertion/deletion ratio.To control for residual population stratification, we further excluded samples and/or cohorts that show extreme counts of synonymous singletons.The number of samples passed QC at each step is detailed in Supplementary Information.

Exome-wide burden analysis
To evaluate the excess of rare, deleterious protein-coding variants in individuals with epilepsy, we performed burden analysis across the entire exome, at both an individual-gene and a gene-set level."Ultra-rare" variants (URVs) were defined as variants observed no more than five copies among the combined case-control cohort, which corresponded to a minor allele frequency (MAF) <0.005%.Deleterious variants were defined and categorized into two classes: (1) protein-truncating annotated by LOFTEE and (2) damaging missense with an MPC score ≥2.We tested the burden of each URV class by regressing the case-control status on the URVs aggregated across a target gene or gene set in an individual, using a Firth regression model adjusting for sex and ancestry (the PCA-predicted genetic ancestral group and the top ten PCs).We further included the exome-wide count of synonymous singletons as an additional covariate to better control for residual population stratification not captured by PCs. 13 We performed the burden analyses for each of the three major epilepsy types -DEEs, GGE, and NAFEand for all epilepsy-affected individuals combined.At the individual-gene level, we tested all proteincoding genes with at least one epilepsy or control carrier (protein-truncating: N=15,083, 15,236, 15,398, and 15,903 for the analysis of DEEs, GGE, NAFE, and all-epilepsy combined, respectively; damaging missense: N=4,013, 4,057, 4,105, and 4,194; synonymous: N=17,460, 17,463, 17,465, and 17,472).At the gene-set level, we tested collections of gene entities that belong to the same gene family 29 or encode a particular protein complex 30 and have at least one epilepsy or control carrier (protein-truncating: N=5,080, 5,070, 5,091, and 5,126 for the analysis of DEEs, GGE, NAFE, and all-epilepsy combined, respectively; damaging missense: N=3,256, 3,279, 3,298, and 3,343; synonymous: N=5,209).Exome-wide significance was determined by Bonferroni correction accounting for 18,531 consensus coding sequence (CCDS) genes or 5,373 gene sets -across four epilepsy groups and two variant classes -at P=3.4×10 -7 and P=1.2×10 -6 for the gene-and gene-set-based burden analysis, respectively.

Protein structural analysis
We applied a metric 36 that assesses the change in Gibbs free energy (ΔΔG/ddG in abbreviation) of protein folding induced by a mutation to characterize missense URVs identified in ion channel genes.In total, we computed ddG for 1,782 missense URVs on 16 ion channel protein complexes with experimentally resolved three-dimensional structures available (Supplementary Data 7).A positive ddG value suggests a decrease in Gibbs free energy of protein unfolding, i.e., a destabilizing effect of the mutation on protein, and a negative ddG value suggests a stabilizing effect.In the relevant burden analysis, we used |ddG|≥1 kcal/mol to prioritize variants that are likely to cause a change in protein stability.

Copy number variant (CNV) calling and burden analysis
To call CNVs from the raw exome data, GATK-gCNV 96 was used.In brief, GATK-gCNV is a Bayesian CNV caller, which adjusts for biases (i.e.GC content) introduced through capture kits and sequencing, while simultaneously accounting for systematic and technical differences.The raw sequencing files were compressed into counts and used as input across the annotated exons, and a subsequent principal component analysis-based method was used on the observed read counts to differentiate capture kits.This was followed by a hybrid distance-and density-based clustering approach to curate batches of samples to process in parallel.After, the caller was iteratively run for each batch and metrics produced by the Bayesian model were used to account for positive predictive value and sensitivity.GATK-gCNV exome QC filters were previously benchmarked in 8,439 matching genome and exome samples, as described in 54 .
Samples where GATK-gCNV made more than 100 unfiltered calls or more than 10 filtered calls were considered outlier samples and were removed.This resulted in 48,767 samples (~90% of initial) for the downstream burden analysis, which comprises 18,963 epilepsy cases (including 1,743 DEEs, 4,980 GGE, and 8,425 NAFE) and 29,804 controls.To mitigate false positives, we used previously benchmarked filtering thresholds, where CNVs had to span >4 callable exons and had a site frequency <0.1% and a quality score >200.In the gene-based burden analysis of CNVs, we considered CNVs to affect a gene if ≥ 10% of the non-redundant exon-basepairs overlapped with the deletion (Ngene= 4,213, 4,417, 4,733, and 6,045 for the analysis of DEEs, GGE, NAFE, and all-epilepsy combined, respectively), or if ≥75% of the nonredundant exon-basepairs overlapped with the duplication (Ngene= 7,064, 7,282, 7,564, and 8,793 for the analysis of DEEs, GGE, NAFE, and all-epilepsy combined, respectively).When evaluating the joint burden of CNV deletions and protein-truncating SNVs/indels, only the subset of samples passing CNV calling QC were considered.

Figure legends
Fig. 1: Results from gene-based burden analysis of URVs.a,b, Burden of protein-truncating (a) and damaging missense (b) URVs in each protein-coding gene with at least one epilepsy or control carrier.The observed −log10-transformed P values are plotted against the expectation given a uniform distribution.For each variant class, burden analyses are performed across four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.P values are computed using a Firth logistic regression model with adjustment for sex and ancestry; the red dashed line indicates exome-wide significance P=3.4×10 -7 after Bonferroni correction (see Methods).Top ten genes with URV burden in epilepsy are labeled.Fig. 2: Results from gene-set-based burden analysis of URVs.a,b, Burden of protein-truncating (a) and damaging missense (b) URVs in each gene set (gene family/protein complex) with at least one epilepsy or control carrier.The observed −log10-transformed P values are plotted against the expectation given a uniform distribution.For each variant class, burden analyses are performed across four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.P values are computed using a Firth logistic regression model with adjustment for sex and ancestry; the red dashed line indicates exome-wide significance P=1.2×10 -6 after Bonferroni correction (see Methods).Top five gene sets with URV burden in epilepsy are labeled.c, Burden of damaging missense URVs in the (α1)2(β2)2(γ2) GABAA receptor complex with respect to its structural domain.Left, forest plots showing the stronger enrichment of damaging missense URVs in the transmembrane domain (TMD) than the extracellular domain (ECD), and the unique signal from DEEs in the second TMD (TMD-2) that forms the ion channel pore.The dot represents the log odds ratio and the bar represents the 95% confidence intervals of the point estimates.Right, a co-crystal structure (PDB ID: 6X3Z) showing the pentameric subunits of the receptor and highlighting the two protein-truncating URVs from DEEs located in the pore-forming domain.
Fig. 3: Protein structural analysis of missense URVs in ion channel genes.a, Correlation between ddG and MPC in measuring the deleteriousness of missense URVs.ddG values are computed for 1,782 missense URVs across 16 ion channel protein complexes (see Methods).A higher absolute ddG value suggests a more deleterious effect on protein stability; positive (orange) and negative (blue) values suggest destabilizing and stabilizing effects, respectively.b, Burden of damaging missense URVs stratified by ddG.Stronger enrichment is observed when applying |ddG|≥1 to further prioritize damaging missense URVs with MPC≥2.c, Burden and distribution of destabilizing (ddG≥1) and stabilizing (ddG≤-1) missense URVs on the (α1)2(β2)2(γ2) GABAA receptor complex with respect to its structural domain.Top, forest plots showing the stronger enrichment of destabilizing missense URVs (orange) in the extracellular domain (ECD) and stabilizing missense URVs (blue) in the transmembrane domain (TMD).Bottom, schematic plots displaying the distribution of destabilizing and stabilizing missense URVs on GABAA receptor proteins.URVs found in epilepsy cases are plotted above the protein and those from controls are plotted below the protein.The number of epilepsy and control carriers are listed in the table above.In the forest plots in c and d, the dot represents the log odds ratio and the bar represents the 95% confidence intervals of the point estimates.calling QC (see Methods), across four groups -1,743 DEEs, 4,980 GGE, 8,425 NAFE, and 18,963 epilepsy-affected individuals combined -versus 29,804controls; for genes that do not have a CNV deletion called, results from the burden analysis of protein-truncating URVs on the full sample set are shown.P values are computed using a Firth logistic regression model with adjustment for sex and ancestry; the red dashed line indicates exome-wide significance P=3.4×10 -7 after Bonferroni correction (see Methods).Top ten genes with variant burden in epilepsy are labeled.b, Joint burden of CNV deletions and proteintruncating URVs in the top ten genes ranked by protein-truncating URV burden.For comparison, the burden of protein-truncating URVs (SNVs/indels; red), CNV deletions (gray), and the joint (purple) are analyzed on the same sample subset as described in a.The dot represents the log odds ratio and the bar represents the 95% confidence intervals of the point estimates; only enrichment in epilepsy (log odds ratio>0) are shown.c, Genomic location and distribution of CNV deletions and protein-truncating URVs with respect to the NPRL3 and DEPDC5 genes.Variants found in epilepsy cases (red) are plotted above the schematic gene plots and those from controls (gray) are plotted below the gene.The number of epilepsy and control carriers are listed in the table above.Fig. 5: Shared genetic risk between common and rare variation for GGE.a, Burden of URVs in genes implicated by GGE GWAS loci.Burden analyses are performed across four variant classes (colored by the inferred consequence) and two epilepsy groups -5,499 GGE and 9,219 NAFE -versus 33,444 controls.Significant enrichment is only observed for protein-truncating URVs from GGE but not any from NAFE.The dot represents the log odds ratio and the bar represents the 95% confidence intervals of the point estimates.b, Genomic location and distribution of common variant (GWAS) association and proteintruncating URVs on the RYR2 gene.Top, a LocusZoom plot displaying the GGE GWAS hit (rs876793) located in the intron of RYR2.Bottom, a schematic protein plot displaying the distribution of protein-truncating URVs on RYR2.URVs found in epilepsy cases (red) are plotted above the protein and those from controls (gray) are plotted below the protein.Burden analyses are performed across four variant classes (colored by the inferred consequence) and four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.Overall, DD/ASD-associated genes show stronger enrichment of epilepsy URVs than SCZ.b, Distribution of rare variants from GGE and other NDDs on the KDM6B gene.Top, a schematic protein plot displaying the distribution of protein-truncating (darker red) and damaging missense (lighter red) variants on KDM6B.Bottom, a schematic protein plot displaying the distribution of damaging missense variants with a likely destabilizing (ddG>0; orange) and stabilizing (ddG<0; blue) effect on KDM6B.In both plots, variants found in GGE are plotted above the protein and those from other NDDs are plotted below the protein (in the order of DD, ASD, and SCZ as labeled); the number of variant carriers are listed accordingly on the right.Data Fig. 2: Results from burden analysis of protein-truncating and damaging missense URVs combined.a, Joint burden of protein-truncating and damaging missense URVs at the individual-gene level.The observed −log10-transformed P values are plotted against the expectation given a uniform distribution.Burden analyses are performed across four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.P values are computed using a Firth logistic regression model with adjustment for sex and ancestry; the red dashed line indicates exome-wide significance P=3.4×10-7 after Bonferroni correction (see Methods).b, Comparison of the joint burden in a with the burden of protein-truncating URVs.The odds ratio (OR) of protein-truncating plus damaging missense URVs (y-axis) and that of protein-truncating URVs alone (x-axis) are compared.Each dot represents a gene with nominally significant enrichment (OR>0 and P<0.05) of either proteintruncating URVs or the two variant classes combined.

Extended Data
Extended Data Fig. 3: URV discovery and burden results across Epi25 data collection.a, Increase in the number of protein-truncating and damaging missense URVs discovered in epilepsy genes with a known monogenic cause.b, Increase in the number of monogenic epilepsy genes identified with a proteintruncating or damaging missense URV.In a and b, variant/gene count is plotted against the year of Epi25 data collection; the total number of epilepsy cases analyzed in each year is indicated in parenthesis.c, URV burden of previously top-ranked genes in this study.The odds ratio of protein-truncating URVs in genes from this study (y-axis) and the prior Epi25 publication (x-axis) are compared.Each dot represents one of the top ten genes implicated by our previous burden analysis (across three epilepsy types).Genes with a known monogenic/X-linked cause are labeled and colored in purple.Burden of protein-truncating URVs Burden of protein-truncating URVs Burden of protein-truncating URVs Burden of protein-truncating URVs Gene burden in this study (Year5) Epilepsy gene

Fig. 4 :
Fig.4: Convergence of CNV deletions and protein-truncating URVs in gene-based burden.a, Joint burden of CNV deletions and protein-truncating URVs in each protein-coding gene with at least one epilepsy or control carrier.The observed −log10-transformed P values are plotted against the expectation given a uniform distribution.Joint burden analyses are performed on the subset of samples that passed CNV

Fig. 6 :
Fig.6: Shared rare variant risk between epilepsy and other NDDs.a, Burden of URVs in genes implicated by WES of severe developmental disorders (DD), autism spectrum disorder (ASD), and schizophrenia (SCZ).Burden analyses are performed across four variant classes (colored by the inferred consequence) and four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.Overall, DD/ASD-associated genes show stronger enrichment of epilepsy URVs than SCZ.b, Distribution of rare variants from GGE and other NDDs on the KDM6B gene.Top, a schematic protein plot displaying the distribution of protein-truncating (darker red) and damaging missense (lighter red) variants on KDM6B.Bottom, a schematic protein plot displaying the distribution of damaging missense variants with a likely destabilizing (ddG>0; orange) and stabilizing (ddG<0; blue) effect on KDM6B.In both plots, variants found in GGE are plotted above the protein and those from other NDDs are plotted below the protein (in the order of DD, ASD, and SCZ as labeled); the number of variant carriers are listed accordingly on the right.

Fig. 1 :
Results from burden analysis of synonymous URVs.a,b, Burden of synonymous URVs at the individual-gene (a) and the gene-set (b) level.The observed −log10-transformed P values are plotted against the expectation given a uniform distribution.Burden analyses are performed across four epilepsy groups -1,938 DEEs, 5,499 GGE, 9,219 NAFE, and 20,979 epilepsy-affected individuals combined -versus 33,444 controls.P values are computed using a Firth logistic regression model with adjustment for sex and ancestry; the red dashed line indicates exome-wide significance P=3.4×10 -7 after Bonferroni correction (see Methods).