Abstract
Epileptic encephalopathy is a devastating epilepsy with etiologies that remain largely elusive, despite recent whole-gene/exon sequencing of large cohorts. This study targeted childhood epileptic encephalopathy, typically Lennox-Gastaut syndrome, which is characterized by age-dependent onset and characteristic clinical manifestations. Trio-based whole-exome sequencing was performed in 235 unrelated cases with individualized analysis by explainable inheritance origin with stratified frequency filtration and specified statistical analysis, including setting controls for compound heterozygous variants. We identified three novel causative genes, including SBF1 with de novo variants in three cases, CELSR2 with recessive variants in eight cases, and TENM1 with X-linked recessive variants in six cases. Significantly higher excesses of de novo variants in SBF1 and biallelic variants in CELSR2 and aggregated frequencies of variants in SBF1, CELSR2, and TENM1 were detected. The frequency of compound heterozygous/homozygous CELSR2 variants in the cases was significantly higher than that in 1942 asymptomatic parent controls. In Drosophila, knockdown of the three genes showed increased seizure-like behavior and increased firing of excitatory neurons. Sbf1 knockout zebrafish showed seizure-like behavior, premature death, and increased firing of excitatory neurons. Celsr2 knockout mice showed spontaneous seizures with epileptiform discharges. These results indicate that SBF1, CELSR2, and TENM1 are pathogenic genes of Lennox-Gastaut syndrome.
Introduction
Epileptic encephalopathies, also referred to as developmental and/or epileptic encephalopathies (DEEs)1, represent a clinically and genetically heterogeneous group of devastating epilepsies characterized by refractory seizures, severe electroencephalography (EEG) abnormalities, and neurodevelopment delay or decline. Epileptic encephalopathy is a common clinical entity and accounts for approximately 40% of newly diagnosed epilepsies before the age of three years2. According to onset age, epileptic encephalopathies can generally be divided into two major groups1: early infantile epileptic encephalopathy (EIEE) with onset in neonates or infants, such as West syndrome and Dravet syndrome, and childhood-onset epileptic encephalopathy, typically Lennox-Gastaut syndrome (LGS). LGS is estimated to account for 1%-2% of all patients with epilepsy and up to 10% of patients with childhood epilepsy3–5. It usually begins between 1 and 8 years old (before age 18 years) and is characterized by multiple seizure types that must include tonic seizures and/or consistent paroxysmal fast activity/polyspike on EEG5. The etiologies of epileptic encephalopathies remain largely elusive. Recent studies have demonstrated an increased number of genes associated with epileptic encephalopathies. To date, more than 100 genes have been reported to be associated with EIEE/DEEs (https://omim.org), explaining 22% to 40% of cases with EIEE6,7. In contrast, only a few genes, such as CHD2, DNM1, GABRB3, and SYNGAP1, have been identified in patients with LGS, accounting for a small proportion of cases8–11. Furthermore, recent studies with large cohorts have encountered challenges in discovering novel EIEE/DEE genes12–14. However, the age-dependent feature, specific clinical characteristics, and distinct EEG pattern highly suggest an underlying genetic mechanism in the epileptogenesis of LGS, which remains to be determined.
In the present study, we performed trio-based whole-exome sequencing (WES) in a cohort of patients with LGS without acquired etiologies. The potential pathogenic variants were screened by an individualized analysis protocol, including individualized analysis on each trio by explainable inheritance origin with stratified frequency filtration and on each gene from four aspects that include gene expression in the brain, previously reported phenotypes, probability of being intolerant to heterozygous/homozygous variants of loss-of-function (pLI/pRec), and phenotypes produced by knockout/knockdown. Novel genes with repetitively identified de novo variants, biallelic variants, or hemizygous variants were selected for specified statistical analysis, including comparing the compound heterozygous variants with those in asymptomatic parent controls. Three genes were identified as novel candidate genes in this cohort, including SBF1 with an autosomal dominant inheritance pattern (AD, de novo variants), CELSR2 with autosomal recessive inheritance (AR), and TENM1 with X-linked recessive inheritance (XLR). Animal models were used to validate the roles of the novel candidate genes in epilepsy. This study suggests that SBF1, CELSR2, and TENM1 are pathogenic genes of LGS and highlights the implication of phenotype subclassification, individualized analysis of trios in combination with specified statistical analysis, and setting controls for compound heterozygous variants.
Methods
Subjects
A total of 235 unrelated cases (234 trios and 1 singleton) with Lennox-Gastaut syndrome (LGS) were enrolled from 2005 to 2021. Among these cases, 194 were long-term follow-up patients in the Epilepsy Center of the Second Affiliated Hospital of Guangzhou Medical University. The other 41 patients were from 20 epilepsy centers of the First Hospital of Jilin University, Affiliated Foshan Maternity and Child Healthcare Hospital, Guangdong 999 Brain Hospital, Henan Province People’s Hospital, Shenzhen Children’s Hospital, Hunan Provincial Children’s Hospital, The Second Affiliated Hospital of Xi’an Jiaotong University, Foresea Life Insurance Shaoguan Hospital, Children’s Hospital of Nanjing Medical University, People’s Hospital of Xinjiang Uygur Autonomous Region, The Second Hospital of Hebei Medical University, Affiliated Hospital of Guangdong Medical University, Guangzhou First People’s Hospital, The First Affiliated Hospital of Jinan University, Xuzhou Center Hospital, Qingdao Women and Children’s Hospital, The First Affiliated Hospital of Zhengzhou University, The Second Hospital of Shandong University, Shantou Chaonan Minsheng Hospital, and Qilu Children’s Hospital of Shandong University.
Detailed clinical information was collected, including sex, seizure onset age, seizure type and frequency, course of seizure, response to antiepileptic drugs, family history, and findings from general and neurological examinations. Brain MRI scans were performed to identify abnormalities in brain structure. Patients with acquired etiologies were excluded. Video EEG was examined, and the results were reviewed by two qualified electroencephalographers. Epileptic seizures and LGS were diagnosed and classified in accordance with the Commission on Classification and Terminology of the International League Against Epilepsy criteria (1989, 2001, 2010, 2017, and 2022). All patients were diagnosed with LGS with frequent seizures characterized by two or more of the following features: (1) multiple seizure types including tonic seizures, (2) generalized polyspikes or fast rhythms during sleep (especially required when daily tonic seizures are obscure), and (3) generalized slow (< 2.5 Hz) spike-wave complex on EEG. The patients were followed up for at least one year. All subjects were Han Chinese with four Han Chinese grandparents and were born to nonconsanguineous Chinese parents.
This study adhered to the guidelines of the International Committee of Medical Journal Editors regarding patient consent for research or participation and received approval from the ethics committee of the Second Affiliated Hospital of Guangzhou Medical University. Written informed consent was obtained from the patients or their legal guardians.
Whole-exome sequencing
Genomic DNA was extracted from the peripheral blood obtained from the patients and their parents (trios) using the QIARamp Blood Mini Kit (Qiagen, Hilden, Germany). Trio-based whole-exome sequencing (WES) was performed on a HiSeq 2000 system (Illumina, San Diego, CA, USA) as previously reported11,15. The sequencing data were generated by massively parallel sequencing with > 100 times average depth and > 98% coverage of the capture regions, and the high-quality reads were mapped to the Genome Reference Consortium Human Genome build 37 (GRCh37) by Burrows-Wheeler alignment. Single-nucleotide variants and insert/deletion variants were called with the Genome Analysis Tool Kit.
Variant filtration and evaluation
To identify candidate causative variants in each trio, we adopted an individualized analytical framework in the following five steps (Fig. 1).
General filtration. The potentially pathogenic variants, including nonsense, frameshift, canonical splice site, initiation codon, in-frame insert/deletion, missense, and synonymous variants predicted to impact splicing, were retained and then filtered with minor allele frequency (MAF) < 0.005 in the Genome Aggregation Database (gnomAD).
Variant filtration by explainable inheritance origin in each trio. The origin of variants presents the genetic difference between the affected child and the parents and thus explains the occurrence of phenotype in a given family (trio), i.e., de novo, recessive from the two asymptomatic parents each, or hemizygous. Variants with an explainable inheritance origin in each trio were selected.
The variants were then filtered with stratified MAF criteria. The MAF of de novo variants, hemizygotes, and homozygotes is set as absent in the control populations in gnomAD, and for compound heterozygous variants, the product of multiplying the frequencies of two alleles in gnomAD is < 1×10-6, which is 7 times less than the probability of one individual in the current population of gnomAD (1/141456 = 7 × 10-6).
Variant filtration with criteria on the gene profile. The genes with selected variants were classified into epilepsy-associated genes (977 genes by Wang et al.16 and updated from OMIM) and genes with undefined gene-disease/epilepsy/seizures associations. For variants in epilepsy-associated genes, their pathogenicities were evaluated by the American College of Medical Genetics and Genomics Standards and Guidelines (ACMG)17. For those with undefined gene-disease/epilepsy/seizure associations, the following four aspects were assessed: (1) Tissue-specific expression: a candidate pathogenic gene should be expressed in the brain (inclusion criterion), with careful consideration of the possibility of other explainable pathogenic mechanisms, such as ectopic expression and remote toxic effects of abnormal metabolic products. (2) Excluding the disease-causing genes with previously defined gene-disease associations (including genotype-phenotype correlation) against the possibility of epilepsy as a phenotype (exclusion criterion). (3) The probability of being intolerant to heterozygous/homozygous variants of loss-of-function (pLI/pRec), genes of pLI ≥ 0.9 with repetitive de novo variants, and genes of pLI ≥ 0.9/pRec ≥ 0.9/pNull ≤ 0.1 with repetitive recessive variants were considered. (4) Whether the genetic knock-out/knock-down produce phenotypes of the brain.
The genes with repetitively identified (in two or more cases) de novo, recessive, and hemizygous variants were selected as novel candidate genes and were subjected to further statistical analysis, including excesses of de novo variants8 and recessive variants18, aggregate frequency of variants19, and the frequency of the compound heterozygous variants compared with that in an asymptomatic parent control. In brief, for excess de novo variants, the P-value was calculated as [1-Poisson cumulative distribution function (χ-1, λ)], where χ is the observed de novo variant number for a specific gene, and λ is calculated as 2×234×1.2×10-8×effective transcript length of the gene on the autosome. For excess biallelic variants, the P-value was calculated as [1-cumulative binomial probability [(n-1, N, R)], where n is the observed biallelic variant number for a specific gene, N is the number of trios (234 in this cohort), and R is the rate of variants in the gene by chance in the population. The variants were filtered by different MAF cutoffs based on the class of variants, i.e., biallelic loss-of-function (LOF), one LOF and one damaging missense, and biallelic damaging missense. Considering that all patients were Han Chinese, the cutoff was set according to the MAF in the ExAC-East Asian population. For the aggregated frequency of variants, the controls include a cohort of 296 healthy Chinese volunteers as in our previous report15 and the control populations in gnomAD. For compound heterozygous variants (two variants in trans), we established a control cohort that included 1942 asymptomatic parents from trios, in whom the compound heterozygous variants were identified by detecting one of the paired variants in the child, given that one of the paired variants in a parent would transmit to the child.
Abbreviations: ACMG, American College of Medical Genetics and Genomics guideline; KO/KD, knockout/knockdown; MAF, minor allele frequency.
All candidate variants were validated by Sanger sequencing. Qualified novel candidate genes were subjected to further bioinformatics and experimental studies to define the gene-disease associations.
Protein structure modeling
The tertiary structure of the protein encoded by the novel candidate genes was modeled using trRosetta (https://yanglab.nankai.edu.cn/trRosetta/) to predict the effect of missense variants on protein structure. Hydrogen bonding was analyzed and visualized using PyMOL 2.3 software (https://pymol.org/2/).
Functional studies on SBF1, CELSR2, and TENM1
The use of animals and all experimental procedures were approved by the Committee for Animal Care of the Second Affiliated Hospital of Guangzhou Medical University.
Fly stocks
Flies were fed standard cornmeal and maintained in an incubator at 25 ℃ and 60%-70% humidity on a 12:12-hr light/dark cycle as previously described15. The Drosophila homologs of the human SBF1, CELSR2, and TENM1 genes are Sbf (CG6939), Flamingo (CG11895), and Ten-m (CG5723), respectively. UAS-Sbf-RNAi (FBgn0025802), UAS-Flamingo-RNAi (FBgn0024836), and UAS-Ten-m-RNAi (FBgn0004449) were purchased from the Tsing Hua Fly Center (Tsinghua University, Beijing, China). The Gal4 driver lines tub-Gal4 and Canton-S were gifts from Prof. LIU Ji-Yong (Guangzhou Medical University, Guangzhou, China). Virgin tub-GAL4 females were mated with males of each RNAi line to establish flies with specific gene knockdown. The RNAi efficiency of the three genes in flies was detected by reverse transcription quantitative PCR (RT-qPCR). Canton-S was used as the WT line in this study.
Seizure behavior test in flies
The tub-Gal4 line was crossed with Sbf-, Flamingo-, and Ten-m-RNAi to establish globally corresponding gene knockdown flies (tub-Gal4>Sbf-RNAi, tub-Gal4>Flamingo-RNAi, and tub-Gal4>Ten-m-RNAi, respectively)20. The bang-sensitivity (BS) test was carried out on flies 3-5 days after eclosion, and seizure-like behavior was evaluated as described in previous studies15,21. Briefly, flies were anesthetized with CO2 and transferred into a new clean food vial one day before testing. Three to six flies were placed in one vial and mechanically stimulated with a vortex mixer (VWR, Radnor, PA, USA) at maximum speed for 20 sec. The percentage and duration of BS paralysis in flies were recorded.
Electrophysiological assay on flies
Attached recording was performed in the knockdown and control flies using an established electrophysiological method22. Fly brains were dissected as previously described23, transferred to a recording chamber with fly external solution, and immobilized with a C-sharp holder. The anterior side of the brain was positioned facing upward to enable observation of the soma of projection neurons (PNs) on the brain surface. The standard external solution contained 101 mM NaCl, 1 mM CaCl2, 4 mM MgCl2, 3 mM KCl, 5 mM glucose, 1.25 mM NaH2PO4, and 20.7 mM NaHCO3 (pH 7.2, 250 mOsm). Attached recording was done by the Axon MultiClamp 700B amplifier with internal solution (contained 102 mM K-gluconate, 0.085 mM CaCl2, 1.7 mM MgCl2, 17 mM NaCl, 0.94 mM EGTA, and 8.5 mM HEPES (pH 7.2, 235 Osm)) filled glass pipettes (10-14 MΩ). Data were acquired with a Digidata 1440B digital–analog converter (Molecular Devices, San Jose, CA, USA) and pClamp v10.5 software (Molecular Devices, San Jose, CA, USA).
Generation of sbf1 knockout zebrafish using CRISPR-Cas9
The AB strain zebrafish (provided by Dr. Yang Xin-Ping of Southern Medical University) embryos and larvae were maintained on a 14:10-hr light/dark cycle and raised as previously described24. The larvae were allowed to develop at 28.5 ℃ and were staged by hours or days after fertilization under standard conditions. CRISPR-Cas9 genomic editing was used to generate sbf1 knockout zebrafish as described previously25. The CRISPR target sequence (5’-AGTGGATGGCAGCTGGCTCC-3’) and oligonucleotide were designed using ZiFit software26. The sbf1-targeting single-guide RNA (sgRNA) template plasmid was generated by annealing oligonucleotides and ligation of the double-stranded DNA in the plasmid PMD-18T (Takara, 6011) and then transcribed in vitro with T7 RNA polymerase (Thermo Fisher, # EP0111). A 1 nL mixture containing approximately 300 ng/µL Cas9 protein (New England Biolabs, Beverly, MA) and 15 ng/µL sgRNA was injected into one-cell stage embryos using a Picospritzer III pressure ejector. Germline mutations with a 4-nt deletion were introduced in the third coding exon of the sbf1 gene. The mutant was verified by PCR-Sanger sequencing with dF1/dR1 primers.
The expression level of sbf1 was detected in 5 days postfertilization (d.p.f) larvae by quantitative real-time PCR (RT-qPCR) using primers gF1/gR1 and dF2/dR2 and SYBR Premix Ex Taq (Takara, Japan) on an Mx3005P qPCR System (Agilent, USA). The primer sequences are listed in Supplementary Table 13.
Morphological, survival, and behavioral analysis of sbf1 knockout zebrafish
The morphological and survival features of zebrafish from 0 to 30 d.p.f. were observed and photographed using an Olympus microscope (BX51, Olympus, Japan). Abnormal morphological phenotypes, including curved body and abdominal swelling, were observed. The survival percentage of zebrafish of each genotype was calculated using Kaplan-Meier survival analysis. For locomotion tracking, 6-9 d.p.f. larvae were placed in individual wells of a 96-well flat-bottomed culture dish (Corning, #07-200-565). Each well contained approximately 200 µL Ringer’s solution and one larva. Behavior was monitored at 28.5 °C using a ZebraBox system (ViewPoint Behavior Technology) and was then analyzed using Zebralab locomotion tracking software (ViewPoint Behavior Technology). After 10 min of habituation, each larva was recorded for a total of 5 min with 5 light/dark cycles (each consisting of 10 sec on light and 50 sec off light), followed by 60 min of recording. We then calculated the traveled distance, duration, and numbers of rapid whirlpool-like circling swimming with high speed (> 20 mm/sec, characterized as seizure behavior)27. Videos were analyzed blindly for the classification of seizure behavior.
Electrophysiological assay on zebrafish
Electrophysiological recordings in zebrafish were obtained using the methods described in the literatures27,28. Briefly, 7-11 d.p.f. zebrafish larvae were embedded in a 1.2% agarose gel prepared by dissolving low-melting temperature agarose in artificial cerebrospinal fluid (aCSF) (NaCl 134 mM, KCl 2.9 mM, MgCl2 1.2 mM, CaCl2 2.1 mM, glucose 10 mM, and HEPES 10 mM) with anesthetic agent (α-bungarotoxin, 1 mg/mL, Sigma). Then, fish were placed on the upright stage of an Olympus microscope and perfused with standard recording solution. A glass microelectrode (∼1 µm tip diameter, 2-7 MΩ) was placed in the optic tectum of the fish under visual guidance. Electrodes were filled with 2 M NaCl, and electrical activity was recorded using a 700B amplifier. Traces were bandpass filtered (0.2-1 kHz). A detection threshold of three times background noise was applied to extract the events on a 5-minute-long window. Detected events were manually sorted. Two or more consecutive spikes with a threshold of three times the background noise were considered seizure-like events in a 20-minute-long window. The time and duration of each event were extracted to evaluate seizure-like electrophysiology.
Behavior monitoring and electroencephalography (EEG) recording in Celsr2-/- mice
Celsr2-/- mice were generated as previously described29 and provided by Dr. Yibo Qu (Guangdong-Hong Kong-Macau Institute of CNS Regeneration, Jinan University). All mice were obtained by breeding Celsr2+/- males with Celsr2+/- females and maintained on a 12-hr light/dark cycle (lights on at 8:00 am, off at 8:00 pm) with food and water ad libitum. At three weeks old, the mice were placed into opaque home cages under 24-hr video monitoring for 7 days.
EEGs were recorded in Celsr2-/- and wild-type mice with free behavior for analysis of epileptic seizures, according to a previous study30. Briefly, the mice were anesthetized by 2% isoflurane (vol/vol) and implanted with a Teflon-coated silver wire electrode in the cortex (1.5 mm posterior and 1.2 mm lateral to the bregma) for signal capture. The EEG signal was synchronized with a USB video camera to analyze epileptic seizure-like behavior. EEG signals were amplified with a preamplifier and primary amplifier (EX1, Dagan) and sampled at 10000 Hz (gain, 1000; bandpass filter, 0.1 to 500 Hz). Epileptic seizures were identified by checking both the EEG signal and the behavior of the animals. The incidence of seizures was calculated as previously described31.
Statistical analysis
The two-tailed Fisher’s exact test was used to compare the allele frequencies between the case and control groups. Data from experiments were first tested for Gaussian distribution. For normally distributed data, Student’s t test and one-way analysis of variance (ANOVA) with Tukey’s post hoc test were used to compare differences among two or more groups. For the data that were nonnormally distributed, the Mann-Whitney test and Kruskal-Wallis test were used to compare differences among two and multiple groups, respectively. Kaplan-Meier survival curves were compared using log-rank statistics. All testing was conducted by experimenters blinded to animal genotypes and analyzed by using GraphPad Prism 8.2 (La Jolla, CA, USA). P values less than 0.05 were considered statistically significant. Analysis of the excess of de novo and biallelic variants in a specific gene was performed according to previous reports8,18 (detailed above in Variants filtration and evaluation). Bonferroni correction for P-values was applied for analysis of the excesses of variants. Given the general number of encoding genes in the human genome (19,711), a corrected P value (Pc) less than 2.54 × 106 (0.05/19,711) indicated a significant genome-wide difference.
Results
General information from WES
We recruited 234 unrelated trios and a singleton (adopted) from 21 hospitals (Supplementary Table 1). The 235 patients included 164 boys and 71 girls, with an average onset age of 4.6 years old (4.6 ± 3.8, mean ± SD). On average, 5.9 Gb of the sequence was produced within the exome-targeted regions in each individual, with an average coverage of 120 depths (Supplementary Table 2). To identify potential novel pathogenic variants, we employed an individualized analytical framework that includes analyses on each trio by explainable inheritance origin and stratified minor allele frequency (MAF) filtration and analyses on the gene profile in four aspects (Fig. 1). After variant filtration by inheritance origin and MAF (steps 2 and 3 in Fig. 1), we obtained 1112 qualified variants in 930 genes, including 177 single de novo (15.9%), 77 homozygous (6.9%), 725 compound heterozygous (65.2%), and 133 hemizygous variants (12%) (Supplementary Table 3 and Extended Data Fig. 1a). The average number of variants per case was 4.8, including 0.8 de novo (ranging from 0 to 4), 0.3 homozygous (ranging from 0 to 5), and 3.1 compound heterozygous variants (ranging from 0 to 10); the number of hemizygous variants per male case was 0.8 (133/163, ranging from 0 to 5) (Extended Data Fig. 1b).
a, Pie chart of the inheritance origin of the 1112 qualified variants. b, The distribution of the number of de novo, homozygous, compound heterozygous, and hemizygous variants per case.
Potential pathogenic variants in epilepsy-associated genes
For known epilepsy-associated genes, variants were identified in 53 cases with 31 genes involved (Table 1 and Supplementary Table 4). Among these variants, 47 single de novo variants in 26 genes were evaluated as pathogenic or likely pathogenic by using American College of Medical Genetics and Genomics (ACMG) standards, explaining 20% of cases (47/235) in this cohort. These genes are associated with epilepsy/DEE or developmental disorders with seizures (Table 1). The CHD2 gene appeared to be the top gene, with 10 patients affected (4.3%, 10/235).
It was noted that the pathogenic/likely pathogenic variants estimated by ACMG were all de novo. In contrast, 6 recessive variants were evaluated as uncertain significance by ACMG (Table 1), including 2 homozygous, 1 compound heterozygous, and 3 hemizygous variants. Given that ACMG evaluates each of the paired biallelic variants separately and does not consider the specific dose effect of hemizygous variants, the pathogenicity of these variants was re-evaluated. These variants were considered to be possibly pathogenic by highly consistent clinical phenotype, as previously reported, absence or very low MAF in the general population (criteria in step 3 in Fig. 1), and explainable inheritance origin (Supplementary Table 4).
Repetitiveness of potential pathogenic variants in novel candidate genes
For variants in genes that have not been previously reported to be associated with epilepsy/seizures, further filtration of the gene profile was performed (step 4 in Fig. 1). Repetitive variants were identified in 29 genes with 73 cases involved (Table 1 and Supplementary Table 5). These genes met the inclusion criterion and the exclusion criterion as novel candidate genes, among which 12 genes met the additional two criteria of a high probability of being intolerant to heterozygous/homozygous variants of loss-of-function (pLI/pRec) and abnormal phenotypes produced by knockout/knockdown (KO/KD). In addition, variants were identified in 26 genes that also met the four criteria on the gene profile but appeared in only a single case in this cohort (Supplementary Table 6). These patients had no other pathogenic or likely pathogenic variants in known or novel candidate epilepsy genes.
For genes with de novo variants, the Poisson cumulative distribution function was calculated8. SBF1 presented a significant excess of de novo variants in this cohort, which was significant after Bonferroni corrections (Pc = 0.012, Table 1 and Supplementary Table 7). In addition, among the 26 epilepsy-associated genes with de novo variants, significant excesses were detected in CHD2 and SETD1B (Pc < 0.001 and Pc = 1.34 × 10-5, respectively, Table 1 and Supplementary Table 7).
For genes with biallelic variants, the cumulative binomial probability was calculated18. Biallelic variants in CELSR2 were identified in 8 cases, which was significantly more than expected by chance after Bonferroni correction (1% MAF cutoff for biallelic damaging missense genotypes in CELSR2, 8 versus 0.0029, Pc = 0.011). Biallelic SPTBN5 variants were identified in 5 cases, which was not significantly more than expected by chance after Bonferroni correction (0.5% MAF cutoff for biallelic damaging missense genotypes in SPTBN5, 5 versus 0.0016, Pc = 0.87).
To analyze the significance of biallelic variants in the cases, we established a control cohort that included 1942 asymptomatic parents from trios. The frequencies of biallelic variants in 8 genes (CELSR2, ADGRA1, CCDC66, AKAP13, CRMP1, MICAL2, SYNJ2, and TOGARAM2) were significantly higher than those in asymptomatic parent controls (Fig. 2 and Supplementary Table 8), among which 4 genes (CELSR2, CRMP1, ADGRA1, and AKAP13) met all four criteria on the gene profile (step 4 in Fig. 1).
The orange dots indicate the frequencies of genes with biallelic variants in cases. The blue dots indicate the frequencies of genes with biallelic variants in asymptomatic parent controls. The genes in red font indicate that the frequencies of biallelic variants in cases were significantly higher than those in the asymptomatic parent controls. Data were analyzed by two-tailed Fisher’s exact test.
Additionally, 5 hemizygous variants in TENM1 were identified in 6 unrelated cases (Table 1). These variants did not present as hemizygotes or homozygotes in the controls of the general population.
Genetic-clinical features of the patients with variants in novel candidate genes
SBF1
Four missense variants in SBF1 (MIM*603560) were identified in four unrelated cases, including three de novo variants (c.337C>A/p.Gln113Lys, c.4459G>A/p.Gly1487Ser, and c.5424G>C/p.Trp1808Cys) and one with unknown origin (adopted, c.2272A>G/p.Ile758Val) (Fig. 3a and the DNA sequencing chromatograms in Extended Data Fig. 2a). The four variants were not present in the controls of the gnomAD-all population or in our 296 healthy controls. In addition to the significant excesses of de novo variants (Supplementary Table 7), the aggregate frequencies of the variants in the cases were significantly higher than those in the controls (4/468 vs. 0/109408 in the controls of the gnomAD-all population, P = 3.25 × 10-10; vs. 0/9046 in the controls of the gnomAD-East Asian population, P = 5.78 × 10-6; vs. 0/592 in our 296 normal controls, P = 0.038, Supplementary Table 9).
a, Sanger sequencing verification of four cases with SBF1 variants. b, Modeling of SBF1 missense variants indicated abnormal hydrogen bonds. Wild-type and mutant residues are colored green and light magenta, respectively. Hydrogen bonds are colored orange. The variant Lys113 rebuilds two hydrogen bonds with Asp100 and Glu99. Ser1487 rebuilds one with His1488. The variant 1808Cys destroyed one hydrogen bond with His1823. Variants that altered the stability of the global conformation of the SBF1 protein were indicated by the value of free energy stability changes (ΔΔG, Kcal/mol) using online tools (http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi). An absolute value of ΔΔG > 0.5 indicates a large decrease in stability. The variants in red font (the top) indicate that the variant changed hydrogen bonds with surrounding amino acid residues. The ΔΔG in red font (bottom) indicates that the variant significantly changed the protein stability.
Pedigrees of the unrelated cases with SBF1 (a), CELSR2 (b), and TENM1 (c) variants (top) and the localization of the variants in the protein domains (bottom). The gray dotted lines in (b) indicate a pair of compound heterozygous variants of CELSR2. Abbreviations: CELSR, cadherin EGF LAG seven-pass G-type receptor; DENN, differentially expressed in normal and neoplastic cells; EGF, epidermal growth factor; GPS, G-protein-coupled receptor proteolytic site; GRAM, glucosyltransferases, Rab-like GTPase activators and myotubularins; NHL, ncl-1, HT2A and lin-41; PH, pleckstrin homology; YD, Tyr and Asp dipeptide.
Three de novomissense variants (p.Gln113Lys, p.Gly1487Ser, and p.Trp1808Cys) were located at the functional domains, i.e., the tripartite differentially expressed in normal and neoplastic cells (DENN) domain, the myotubularin phosphatase domain, and the pleckstrin homology (PH) domain, respectively (Fig. 3a). They were predicted to alter hydrogen bonding with surrounding residues (Extended Data Fig. 2b). The variant p.Ile758Val was located in the region between DENN and Glucosyltransferases, Rab-like GTPase activators and Myotubularins (GRAM) domain (Fig. 3a) without hydrogen bond changes by protein modeling but was predicted to decrease the protein stability (Extended Data Fig. 2b).
The clinical features of the four cases are summarized in Table 2. The three patients (LG90, LG57, and LG241) with de novo variants located in the functional domains exhibited intractable and frequent seizures even under a combination of three or four anti-epileptic drugs (AEDs). They presented both generalized and multifocal discharges in EEGs. The patient (LG206) with the variant in the region between the DENN and GRAM domains (p.Ile758Val) presented a relatively late onset age (10-14 years) with only generalized discharges in EEG. He achieved seizure-free with monotherapy of lamotrigine, despite previous daily myoclonic seizures.
CELSR2
Biallelic variants in CELSR2 (MIM*604265) were identified in eight unrelated cases, including a homozygous missense variant (c.7227C>A/p.His2409Gln) in two cases and six compound heterozygous missense variants (c.1763C>G/p.Pro588Arg & c.8545G>A/p.Glu2849Lys, c.3640A>T/p.Ser1214Cys & c.6108C>G/p.Phe2036Leu, c.4411C>A/p.Pro1471Thr & c.5969G>A/p.Arg1990His, c.6986G>A/p.Ser2329Asn & c.3640A>T/p.Ser1214Cys, c.8551A>G/p.Ser2851Gly & c.3329C>T/p.Thr1110Ile, and c.8624G>A/p.Arg2875Gln & c.6313A>T/p.Thr2105Ser) (Fig. 3b and the DNA sequencing chromatograms in Extended Data Fig. 3a). The variant p.Ser1214Cys was also recurrently identified in two unrelated cases. All heterozygous variants presented no or low allele frequencies in the gnomAD database (criteria in step 3 in Fig. 1). In addition to the significant excess of recessive variants, the frequency of biallelic variants in the cases were significantly higher than that in the asymptomatic parent controls (Fig 2). Additionally, the aggregate frequencies of the variants in the cases were significantly higher than those in the controls (16/468 vs. 496/81210 in the controls of the gnomAD-all population, P = 6.61 × 10-8; vs. 113/8124 in the controls of the gnomAD-East Asian population, P = 2.35 × 10-3; vs. 3/592 in our 296 normal controls, P = 5.91 × 10-4, Supplementary Table 10).
a, Pedigrees and Sanger sequencing verification of the eight cases with biallelic CELSR2 variants. b, Modelling of CELSR2 missense variants indicated abnormal hydrogen bonds. Wild-type and mutant residues are colored green and light magenta, respectively. Hydrogen bonds are colored orange. Variants that destabilized the global conformation of the CELSR2 protein were indicated by the value of free energy stability changes (ΔΔG, Kcal/mol) using online tools (http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi). An absolute value of ΔΔG > 0.5 indicates a large decrease in stability. The variants in red font (the top) indicate that the variant changed hydrogen bonds with surrounding amino acid residues. The ΔΔG in red font (bottom) indicates that the variant significantly changed the protein stability.
Four of the twelve missense variants were located at the functional domains, including two (p.Pro588Arg and p.Thr1110Ile) in the cadherin EGF LAG seven-pass G-type receptor (CELSR) repeat domains, one (p.Pro1471Thr) in the laminin G-like domain, and one (p.Ser2329Asn) in the G-protein-coupled receptor proteolytic site (GPS) domain. The other eight variants were located in the nonfunctional regions, including one (p.Ser1214Cys) in the region between the CELSR repeats and EGF-like domain, three (p.Arg1990His, p.Phe2036Leu, and p.Thr2105Ser) in the region between the laminin G-like and GPS domains, one (p.His2409Gln) located in the transmembrane region, and three (p.Glu2849Lys, p.Ser2851Gly, and p.Arg2875Gln) located in the C-terminal intracellular region (Fig. 3b). Among the eight pairs of biallelic variants, at least one of the paired missense variants was predicted to affect hydrogen bonding or decrease protein stability (Extended Data Fig. 3b).
The clinical features of the eight cases are listed in Table 2. The two cases (LG38 and LG231) with homozygous variant p.His2409Gln exhibited intractable seizures. In contrast, the two cases (LG242 and LG40) with compound heterozygous variants that had both variants located at nonfunctional regions (p.Ser1214Cys & p.Phe2036Leu and p.Thr2105Ser & p.Arg2875Gln) achieved seizure-free with monotherapy of valproate/lamotrigine. Among the other four cases with one of the paired variants located in the functional domains, two cases (LG150 and LG238, who had variants p.Thr1110Ile and p.Ser2329Asn in functional domains, respectively) presented refractory seizures; one case (LG230, with p.Pro1471Thr in the laminin G-like domain) achieved seizure-free but had intellectual disability; and the other case (LG54) achieved seizure-free without intellectual disability and had compound heterozygous variants with two variants located furthest apart (p.Pro588Arg and p.Glu2849Lys, a distance of 2261 amino acids), which have been suggested to be associated with phenotype severity32,33.
TENM1
Five hemizygous missense variants in TENM1 (MIM*10178) were identified in six unrelated cases, including c.467A>G/p.Asp156Gly, c.503G>A/p.Cys168Tyr, c.638C>T/p.Ala213Val, c.3326C>T/p.Thr1109Met, and c.5246T>C/p.Val1756Ala (Fig. 3c and the DNA sequencing chromatograms in Extended Data Fig. 4a). All TENM1 hemizygous variants were inherited from their asymptomatic mothers, consistent with an XLR inheritance pattern. No hemizygotes/homozygotes of the variants were present in the controls of the gnomAD-all population. The aggregate frequencies of the variants in this cohort were significantly higher than those in the male controls (6/163 vs. 0/2549 in the hemizygote controls of the gnomAD-all population, P = 4.32 × 10-8; vs. 0/251 in the hemizygote controls of the gnomAD-East Asian population, P = 3.5 × 10-3, Supplementary Table 11).
a, Pedigrees and Sanger sequencing verification of the six cases with TENM1 variants. b, Modelling of TENM1 missense variants predicted abnormal hydrogen bonds. Wild-type and mutant residues are colored green and light magenta, respectively. Hydrogen bonds are colored orange. The variant Tyr168 rebuilds a hydrogen bond with His151. Variants that destabilized the global conformation of the TENM1 protein were indicated by the value of free energy stability changes (ΔΔG, Kcal/mol) using online tools (http://gpcr.biocomp.unibo.it/cgi/predictors/I-Mutant3.0/I-Mutant3.0.cgi). An absolute value of ΔΔG > 0.5 indicates a large decrease in stability. The variants in red font (the top) indicate that the variant changed hydrogen bonds with surrounding amino acid residues. The ΔΔG in red font (bottom) indicates that the variant significantly changed the protein stability.
Three variants (p.Asp156Gly, p.Cys168Tyr, and p.Ala213Val) were located at the N-terminal intracellular teneurin domain, one variant (p.Thr1109Met) was located in the extracellular region between EGF-like repeats and ncl-1, HT2A and lin-41 (NHL) repeats, and one variant (p.Val1756Ala) was located in the extracellular region between Tyr and Asp dipeptide (YD) 5 and YD 6 domains (Fig. 3c).
Among the three cases with variants in the N-terminal intracellular teneurin domain, two cases (LG239 and LG60) suffered from refractory seizures even under three AEDs. In contrast, the three cases (LG28, LG189, and LG270) with variants located in the nonfunctional regions achieved seizure-free under the combination therapy of valproate and lamotrigine.
Seizure activity in Sbf, Flamingo, and Ten-m knockdown Drosophila
To validate the association between the three novel candidate genes and epilepsy, we established a gene knockdown model in Drosophila by RNA interference (RNAi). The Drosophila homologs of the human SBF1, CELSR2, and TENM1 genes are Sbf, Flamingo, and Ten-m, respectively. The RNAi efficiency of the three genes in flies was detected by reverse transcription quantitative PCR (RT-qPCR). Compared to WT flies, mRNA expression of the three genes in the knockdown flies was reduced by 47% (Sbf), 75% (Flamingo), and 79% (Ten-m) (Extended Data Fig. 5). We examined bang-sensitivity (BS) seizure-like behavior in tub-Gal4>Sbf-, Flamingo-, and Ten-m-RNAi flies. Three phases of typical seizure-like behavior in the BS test, including seizure, paralysis, and recovery, were observed in the knockdown flies (Fig. 4a). Seizure-like behaviors were detected in 40.2% (Sbf), 24.8% (Flamingo), and 33.0% (Ten-m) of the knockdown flies, which was 8, 3, and 4 times higher than the control flies, respectively (Fig. 4b). The gene knockdown flies also showed a longer duration of seizure-like behavior than the controls. The knockdown flies recovered from seizures within 30 (Sbf), 7 (Flamingo), and 20 (Ten-m) seconds, while the control flies recovered within 3, 5, and 5 seconds, respectively (Fig. 4c).
The relative mRNA levels of Sbf, Flamingo, and Ten-m in knockdown flies were detected by RT-qPCR. The mRNA levels of wild-type flies were normalized to 1.0. The Gapdh was used as an internal control. Data were analyzed by Student’s t test.
a, Three stages of seizure-like behaviours in the (bang-sensitivity) BS paralysis test, including seizure (manifesting as vibrating wings), paralysis, and recovery, were observed in Sbf, Flamingo, and Ten-m knockdown flies. b, Seizure-like behaviors occurred at a higher rate in the knockdown flies (tub-Gal4>Sbf-/Flamingo-/Ten-m-RNAi) than in WT flies (Canton-S), tub-Gal4 (Control), and the respective RNAi control flies. c, The recovery time from seizures in the knockdown flies (tub-Gal4>Sbf-/Flamingo-/Ten-m-RNAi) and in the control groups. Data were collected from 4-7 groups, 4-6 flies in each group, for BS paralysis and recovery tests. All data are represented as the mean ± s.e.m. One-way ANOVA and Tukey’s post hoc test were used for multiple comparisons.
Projection neurons (PNs) are important excitatory neurons in the central nervous system of Drosophila34. We examined the effect of gene deficiencies on the electrophysiological activity of PNs by using whole-cell recording (Fig. 5a). The Sbf, Flamingo, and Ten-m knockdown flies showed regular bursts of firing in the PNs (Fig. 5b). The frequencies of spontaneous excitatory postsynaptic potentials (sEPSPs) in knockdown flies were significantly higher than those in WT flies (Fig. 5c). There was no significant difference in the amplitude of sEPSPs between the knockdown flies and WT flies (Fig. 5d). These results suggested that knockdown of Sbf, Flamingo, and Ten-m resulted in significantly increased seizure susceptibility and epileptic electrophysiological activity in Drosophila.
a, Whole-cell recording in the fly brain. Left, a schematic diagram showing the insert position of a glass microelectrode in an immobilized isolated brain (scale bar, 100 μm). Right, cell bodies of the squared region in the left panel are indicated at higher magnification (scale bar, 10 μm) and were targeted for whole-cell recording in the living isolated brain preparation. b, Representative traces of electrical activity in projection neurons (PNs) showing spontaneous excitatory postsynaptic potentials (sEPSPs) of WT and tub-Gal4>Sbf-/Flamingo-/Ten-m-RNAi flies. c, Frequencies of sEPSPs in PNs of WT and knockdown flies. d, Amplitude of sEPSPs between WT and knockdown flies. The box plots show all data points from minimum to maximum. Boxes represent data from the lower (25th percentile) to the upper (75th percentile) quartiles. The box center corresponds to the 50th percentile. A horizontal line indicates the median. Data are represented as the mean ± s.e.m. (Canton-S, n = 7; tub-Gal4>Sbf-RNAi, n = 8; tub-Gal4>Flamingo-RNAi, n = 7, tub-Gal4>Ten-m-RNAi, n = 8), from three independent experiments; one-way ANOVA test.
Seizure activity in sbf1 knockout zebrafish
To further determine the potential pathogenic roles of SBF1, we generated a sbf1 (the zebrafish homolog of SBF1) knockout model using CRISPR/Cas9 technology in zebrafish. SET binding factor 1 (encoded by SBF1) in humans and zebrafish is 70% identical. A germline mutation with a 4-nt (TGGA) deletion was introduced in exon 3 of the primary sbf1 transcript (RefSeq: NM_001045158.1), resulting in a premature stop codon in exon 5 (Extended Data Fig. 6a). Using RT-qPCR, we confirmed that there was an absence of 88% of total sbf1 mRNA in sbf1-/- and 41% of total sbf1 mRNA in sbf1+/- zebrafish, proving the efficiency of the sbf1 knockout (Extended Data Fig. 6b). At 5 days postfertilization (d.p.f.), morphological abnormalities, including lack of a swim bladder, severe body curvature, and small body size, were observed in 1.1% of sbf1-/- and 0.9% of sbf1+/- larvae (Fig. 6a, b). At 17 d.p.f., all sbf1-/- zebrafish were dead, while sbf1+/- and sbf1+/+ siblings remained alive. Approximately 62.5% of sbf1+/- and 83.3% of sbf1+/+ zebrafish survived to 30 d.p.f., suggesting an essential role of sbf1 in survival (Fig. 6c).
a, The CRISPR/Cas9 gRNA targeted exon 3 (E3) of sbf1, resulting in a 4-nt (TGGA) deletion in E3. The frameshift mutation led to a stop codon in exon 5 (E5). b, The relative mRNA expression of sbf1 was detected by RT-qPCR in the sbf1+/+, sbf1+/-, and sbf1-/- fish. The mRNA levels of sbf1+/+ were normalized to 1.0. The gapdh was used as an internal control. Data were analyzed by Student’s t test.
a, Malformation was observed in 5 days post-fertilization (d.p.f.) larvae of sbf1+/- and sbf1-/- but not sbf1+/+ (WT). b, The malformation rates in the sbf1+/- and sbf1-/- larvae were significantly higher than those in sbf1+/+ larvae (n = 8, one-way ANOVA); sbf1+/+ (grey), sbf1+/- (blue), and sbf1-/- (orange). c, Kaplan–Meier survival analysis for sbf1+/+ (n = 18), sbf1+/- (n = 24), and sbf1-/- (n = 15). d, Representative swimming tracks of 7 d.p.f. larvae within 60 minutes. Red, green, and black lines represent high- (≥ 20 mm/s), intermediate- (8 - 20 mm/s), and slow-speed movements (< 8 mm/s), respectively. e, f, Distance (e) and duration (f) of high-speed movement in each genotype. Data are represented as the mean ± s.e.m. (sbf1+/+, n = 10; sbf1+/-, n = 20; sbf1-/-, n = 10), from three independent experiments; one-way ANOVA test. g, Extracellular field recordings were performed in immobilized and agar-embedded zebrafish larvae forebrains. Left panel, schematic showing the insert position of a glass microelectrode. Right panel, representative field recording trace of epileptiform discharge from sbf1+/+ (top), sbf1+/- (middle), and sbf1-/- (bottom) larvae at 7 d.p.f. h, i, Total number of electrographic seizure-like bursts (including interictal and ictal discharges) (h) and duration of the discharges (i). Data were from 15 min of recording in 7-11 d.p.f. larvae of sbf1+/+ (n = 22), sbf1+/- (n = 27), and sbf1-/- (n = 13); Kruskal-Wallis test.
We next examined seizure-like behavior in sbf1 knockout zebrafish. Seizure-like behavior characterized by high-speed (> 20 mm/sec) “whirlpool-like” circling swimming was observed in the sbf1+/- and sbf1-/- larvae at 7 d.p.f. (Fig. 6d). It was noted that the sbf1+/- larvae exhibited more hyperactive/seizure-like behavior with a longer distance of high-speed movement than the sbf1+/+ larvae (Fig. 6e), with significant differences between sbf1+/- and sbf1+/+ larvae in the 20 to 60 min observations. In contrast, the sbf1-/- larvae exhibited more hyperactive/seizure-like behavior with a significantly longer distance of high-speed movement than the sbf1+/+ larvae at only the 60 min observation (Fig. 6e). Similarly, the sbf1+/- larvae exhibited a longer duration of high-speed movement than the sbf1+/+ larvae (Fig. 6f). We further performed electrophysiological recordings in the forebrain of larvae at 7-10 d.p.f. as reported previously28. Spontaneous epileptiform discharges were recorded in sbf1-/- and sbf1+/- larvae but not in sbf1+/+ larvae (Fig. 6g). Both the sbf1-/- and sbf1+/- larvae exhibited a greater number and duration of discharges than the sbf1+/+ larvae (Fig. 6h, i). These results suggest that deficiency of sbf1 is potentially associated with seizure-like electrophysiological activity in zebrafish.
Spontaneous seizure and electroencephalography recording in Celsr2 knockout mice
Celsr2, a core planar cell polarity (PCP) component, plays versatile roles in neural development, including cilia organization, neuron migration, and axon navigation35. To validate the potential role of CELSR2 in seizures, we generated a Celsr2 knockout model by inserting a targeting vector containing an internal ribosomal entry site (IRES) and the LacZ gene into exon 23 of the Celsr2 gene to interrupt its transcription as described previously29. The Celsr2+/+ mice did not show seizures or epileptiform discharges on video-EEG recordings (Fig. 7a). In contrast, spontaneous seizures with typical EEG alterations were observed in 4 of the Celsr2-/- mice (4/15, 26.7%, Fig. 7b, c, and Supplementary Video). The average duration of seizure activity in the Celsr2-/- mice was 194.9 ± 168.0 seconds (mean ± SD, Fig. 7d). Among the four Celsr2-/- mice with spontaneous seizures, two mice exhibited seizures to stage 3. The other two mice displayed seizures up to stage 5, followed by death (Fig. 7e). Analysis of the power spectra of the background EEGs showed that the Celsr2-/- mice had higher spectral power, characterized by high-amplitude and low-frequency oscillations (Fig. 7f).
a, Normal EEGs were detected from the WT mice. b, A representative trace shows ictal epileptiform discharges in Celsr2-/- mice. The dotted boxes indicate the zoomed-in traces of the onset and ictal epileptiform discharges. The green arrows indicate the time point of behavior of the mouse from video recording. c, The proportion of Celsr2+/+ and Celsr2-/- mice exhibiting spontaneous seizures. d, Seizure duration in Celsr2+/+ (n = 10) and Celsr2-/- (n = 4) mice; Mann-Whitney test. e, Seizure behaviors were rated with a modified Racine scale in Celsr2+/+ and Celsr2-/- mice. f, Power spectrum analysis of EEG signals in Celsr2+/+ and Celsr2-/- mice.
Discussion
The development of high-throughput sequencing techniques has enabled genetic studies with large cohorts. However, these studies have encountered challenges in discovering novel pathogenic genes12–14 in recent years. The gene-disease associations in more than three-fourths of the human genome remain undetermined (https://omim.org). The present study focused on LGS, an epileptic encephalopathy of childhood onset, and used trio-based WES with individualized analysis before statistical studies, which includes individualized analysis of each trio by explainable inheritance origin with stratified MAF filtration and individualized analysis of each gene from four aspects. Three novel candidate genes for LGS were identified, including SBF1 with de novo origin, CELSR2 with biallelic recessive inheritance, and TENM1 with X-linked recessive inheritance. These genes presented significant repetitiveness, including significantly higher excesses of de novo variants in SBF1 and biallelic variants in CELSR2, and aggregated frequencies of variants in SBF1, CELSR2, and TENM1 compared with that in the controls of the gnomAD population. The frequency of compound heterozygous/homozygous CELSR2 variants in the cases was significantly higher than that in the asymptomatic parent controls. Clinically, the phenotype severity was correlated with the genotype. Animal models with knockdown or knockout of these genes exhibited increased seizure-like behavior and increased firing of excitatory neurons. This study suggested that SBF1, CELSR2, and TENM1 were pathogenic genes of LGS and highlights the implication of phenotype subclassification and individualized analysis of trios in combination with specified statistical analysis in identifying novel pathogenic genes.
Epilepsy syndromes are characterized by age-dependent clinical manifestations, especially the onset age1, which are potentially associated with distinct pathogenic genetic causes16. The present study focused on LGS, the epileptic encephalopathy of childhood onset with characteristic clinical manifestations, differing from most of the DEEs that are commonly of early onset and used to be generally named EIEE. Consistently, this study identified three novel potential pathogenic genes, i.e., SBF1, CELSR2, and TENM1, which accounted for 7.7% (18/235) of cases, in addition to CHD2, which was ranked at the top of the pathogenic genes (10/235) in this cohort, differing from the DEEs in which SCN1A was the most common pathogenic gene12–14. Recent studies with large cohorts have encountered challenges in identifying novel pathogenic genes for epileptic encephalopathies12–14. The present study identified three pathogenic genes of LGS from 235 cases, which is the largest cohort of LGS so far, to our knowledge. More importantly, targeting LGS, a characteristic epilepsy syndrome with distinct clinical features, potentially decreased the heterogeneity of the subjects and improved the efficiency of detection. Therefore, this study reinforces the implication of phenotype subclassification by fining clinical features, which is potentially suggestive for genetic studies of large cohorts.
Statistical analysis is generally used to detect potential pathogenic variants/genes in genetic studies with large cohorts12–14. This study employed an individualized analysis procedure before statistical burden analyses, which potentially improves the specificity and sensitivity. After general filtrations, variants with explainable inheritance origin in each trio were selected, which potentially improved the specificity. The variants were then filtered with MAF criteria according to the inheritance pattern. The MAF of de novo variants, hemizygotes, and homozygotes was set as absent in the control populations in gnomAD, and the filtration MAF for compound heterozygotes was set at < 1×10-6 for the product of multiplying the frequencies of two alleles in gnomAD. This measure aimed to avoid omission of a single low-MAF variant in biallelic variants and potentially improved the sensitivity. Such a single heterozygous variant with low MAF alone is not pathogenic but is potentially pathogenic when it appears as one of the paired biallelic variants, including those in homozygotes/hemizygotes. For gene profile filtration, the four criteria for genes potentially improve specificity in identifying novel candidate genes. Specific statistical analyses were used for variants of different inheritance, including the Poisson cumulative distribution function for de novo variants and the cumulative binomial probability for recessive variants. For the compound heterozygous variants, it is currently unknown how frequently compound heterozygous variants are present in normal populations. We therefore established a control cohort that included 1942 asymptomatic parents from trios for analysis of compound heterozygous variants. SBF1, CELSR2, and TENM1 were then selected as novel potentially pathogenic genes for LGS and subjected to further studies.
SBF1 (MIM*603560) encodes a member of the protein-tyrosine phosphatase family, which plays a vital role in cell growth and differentiation36. Previously, SBF1 recessive variants were reported to be associated with Charcot-Marie-Tooth disease type 4B3 (CMT4B3, OMIM*615284), which is characterized by progressive limb muscle weakness and distal sensory impairment37. SBF1 is expressed in the brain and muscle at similar levels, suggesting functional roles in the brain. In mice, homozygous knockout of Sbf1 causes postnatal lethality with incomplete penetrance38. In this study, three de novo heterozygous SBF1 missense variants were identified, which showed a significant excess of de novo variants and significantly higher aggregated frequencies of variants in the LGS cohort than in the controls of gnomAD populations. In animal experiments, Sbf1 knockdown flies and knock-out zebrafish showed seizure-like behavior and increased firing of excitatory neurons. Clinically, SBF1 variants in the functional domains were associated with a relatively severe phenotype, whereas the variant between domains resulted in late-onset seizures with a good prognosis, indicating a genotype-phenotype correlation. These findings suggested that SBF1 is potentially associated with epilepsy. It is noted that the heterozygous Sbf1 knockout zebrafish presented more hyperactive/seizure-like behavior than the homozygous knockout larvae. Additionally, the probability of being loss of function intolerant (pLI) of SBF1 is 1.00, indicating that SBF1 is highly intolerant to heterozygous loss-of-function variants. These evidences potentially suggest an association between epilepsy and heterozygous SBF1 variants.
Previously, the CMT4B3-associated variants were all biallelic recessive variants, including biallelic missense variants and biallelic truncation variants (Extended Data Fig. 7). The present study showed that Sbf1 homozygous knockout in zebrafish causes premature death. Similarly, Sbf1 homozygous knockout in mice causes postnatal lethality with incomplete penetrance38. The Sbf1 homozygous knockout mice did not present overt motor or gait deficiencies, although an approximately 10% reduction in the number of myelinated axons in the sciatic nerve was observed39. Therefore, the mechanism underlying the pathogenesis of CMT4B3 caused by SBF1 biallelic variants is unclear. Gene expression across multiple tissues is common, and subsequently, a gene is potentially associated with multiple phenotypes. An example is DYNC1H1 (MIM*600112), which is associated with both CMT and abnormalities of the brain40, including epilepsy. Among the patients with phenotypes caused by DYNC1H1 variants, 36% (59/164) presented abnormalities of the brain, including seizures in 36 patients (22%); 26% (43/164) of patients presented only CMT; and 38% (62/164) of patients presented both CMT and abnormalities of the brain (data from the HGMD, https://www.hgmd.cf.ac.uk). Although the location and damage effect of variants are suspected to play roles40,41, the distinct mechanism underlying the phenotype variation is unknown. Our recent study on PCDH15, which is associated with deafness and epilepsy, showed that the distinct phenotypes are associated with the tissue-specific gene expression stage42. Further studies are required to determine the specific mechanisms underlying the distinct phenotypes of SBF1.
Variants identified in this study are shown in red (the top). Variants associated with Charcot-Marie-Tooth disease are shown in blue (bottom). The gray dotted lines indicate the pairs of compound heterozygous variants of SBF1.
CELSR2 (MIM*604265) encodes cadherin EGF LAG seven-pass G-type receptor 2 (CELSR2), a transmembrane protein of the flamingo subfamily. Our previous studies have shown that two CELSR subfamily genes, CELSR1 and CELSR3, are associated with epilepsy43,44. CELSR2 is highly expressed in the brain with predominance in early life and plays a vital role in cell/cell signaling during nervous system formation35. The loss of function observed/expected upper bound fraction (LOEUF) of CELSR2 is 0.25 (LOEUF < 0.35), and the pLI is 1.00, indicating that CELSR2 variants with loss of function are potentially pathogenic. In this study, we identified eight pairs of biallelic CELSR2 variants in the LGS cohort, which showed a significant excess of biallelic variants genome-wide, significantly higher aggregated frequencies of variants than that in the controls of gnomAD populations, and a significantly higher frequency of compound heterozygous/homozygous variants in the cases than that in asymptomatic parent controls. The CELSR2 homozygous variants and compound heterozygous variants with one of the paired variants located at the functional domains were associated with a severe phenotype, whereas compound heterozygous variants with both variants located at nonfunctional regions were associated with a relatively milder phenotype, suggesting a genotype-phenotype correlation. The Celsr2 knock-down flies and knock-out mice showed seizure-like behavior and increased firing of excitatory neurons. These findings suggest that CELSR2 is potentially associated with epilepsy.
TENM1 (MIM*10178) encodes a transmembrane teneurin subfamily protein that is highly expressed in the brain, predominantly during the prenatal and mature stages. TENM1 is involved in neural development by regulating the establishment of proper connectivity within the nervous system45,46. The LOEUF of the TENM1 gene was 0.19 (LOEUF < 0.35), and the pLI was 1.00, indicating that loss of function of TENM1 is potentially pathogenic. This study identified hemizygous TENM1 variants in six unrelated cases. No hemizygotes/homozygotes of the variants were present in the controls of gnomAD populations, and the aggregate frequencies of the hemizygotes were significantly higher than those in the male controls. Two patients with hemizygous variants located in the N-terminal intracellular teneurin domain presented refractory seizures, while the patients with variants located in nonfunctional regions achieved seizure-free, suggesting a genotype-phenotype correlation. Tenm1 knockdown flies exhibited an increased seizure-like behavior rate and increased excitability of neurons in the present study. These findings indicated that TENM1 was potentially associated with epilepsy.
Additionally, 35 novel genes (in 44 cases) met the six criteria in individualized analysis, including inheritance pattern and stratified MAF, gene expression in the brain, defined genotype not against epilepsy, high pLI/pRec, and abnormal phenotypes produced by KO/KD. The majority (82.9%, 29/35) of the genes showed a recessive inheritance pattern, among which three genes (ADGRA1, AKAP13, and CRMP1) presented higher frequencies of recessive variants compared to the asymptomatic parent controls. These genes are potential candidate genes of epilepsy, for which further studies are required.
Among the novel candidate genes, SBF1 presents dominant inheritance (heterozygous de novo variants in three cases), and the remaining majority of candidate genes present recessive inheritance, including XLR. This could be basically understood by the quantitative dependent feature of genes47,48. We defined the lowest limit of the required quantity of genetic function of the gene as the genetic dependent quantity (GDQ). If the full scale of GDQ was set at 100% with contributions from two copies of a gene, the genes with GDQ < 50% are much more abundant (https://gdap.org.cn/)47,48, since the human genome is diploid and usually a portion of functioning from one copy of a gene is sufficient for biophysiological function. Subsequently, pathogenic recessive (including X-linked recessive) variants are potentially common. By utilizing trio-based sequencing, this study was designed to detect variants of different origin, including those of recessive origin, which occur in genes with GDQ < 50%. The subsequent stratified MAF filtration, instead of a clear cut-off of MAF as “absent from populations”, ensured the inclusion of heterozygous variants of low MAF, of which a single variant alone is not pathogenic but is potentially pathogenic when it appears as one of the paired biallelic variants. In fact, a single heterozygous variant of “damaging” in recessive disorders is not “pathogenic” and thus may be prevalent with low MAF in the general population, typically p. Arg208Ter (MAF of 0.00025) and c.509-1G>C (MAF of 0.00039) in TPP1, which were identified in 89% of patients with ceroid lipofuscinosis49, a severe neurodegenerative disease with epilepsy. Therefore, special attention is required to recessive variants in designing protocols to identify the underlying causes of genetic disorders, including the application of controls for compound heterozygous variants.
In conclusion, the present study identified three novel pathogenic genes of childhood epileptic encephalopathy, SBF1 with de novo dominance inheritance, CELSR2 with biallelic recessive inheritance, and TENM1 with X-linked recessive inheritance, which were validated by genetic knockdown/knockout experiments. This study highlights the implication of phenotype subclassification, individualized analysis of trios in combination with specified statistical analysis, and setting controls for compound heterozygous variants.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Author Contributions
WPL and YWS conceptualized the study, analyzed and interpreted the data, and drafted and revised the manuscript. WPL had full access to all data in the study and takes responsibility for the integrity of the data and accuracy of the data analysis. JGZ, NH, ZLY, NXS, and HKL performed the clinical data and whole-exome sequencing analysis. WBL, XCQ, and CXF performed seizure activity studies. WPL, YWS, JGZ, NH, ZLY, NXS, HKL, WBL, XCQ, and CXF drafted and revised the manuscript and contributed to the statistical analysis. All authors collected data, revised the manuscript, and contributed to the writing.
Competing interests
All authors claim that there are no conflicts of interest.
Ethics statement
All procedures performed were in accordance with the ethical standards of the institutional committee. Ethical approval was approved by the ethics committee of the Second Affiliated Hospital of Guangzhou Medical University (approval ethics number 2020-hs-49). We declared that all patients’ IDs were not identified to anyone outside the research group.
Acknowledgements
We thank the affected patients and their families for participating in this study. This work was funded by the National Natural Science Foundation of China (grant nos. 82171439, 82271505, and 81971216) and the Guangdong Basic and Applied Basic Research Foundation (grant no. 2021A1515010986). The funders had no role in the study design, data collection, and analysis or in the decision to publish or the preparation of the manuscript.
Footnotes
The article has been polished; Section on discussion has been modified; Figure 2 revised.