Abstract
An early genetic diagnosis can guide the time-sensitive treatment and care of individuals with genetic epilepsies. However, identification of a genetic cause often occurs long after onset of these disorders. Here, we aimed to identify early clinical features suggestive of genetic diagnoses in individuals with epilepsy by systematic large-scale analysis of clinical information from full-text patient notes in the electronic medical records (EMR).
From the EMR of 32,112 individuals with childhood epilepsy, we retrieved 4,572,783 clinical notes spanning 203,369 total patient-years. A subcohort of 1,925 individuals had a known or presumed genetic epilepsy with 738 genetic diagnoses spanning 271 genes. We employed a customized natural language processing (NLP) pipeline to extract 89 million time-stamped standardized clinical annotations from free text of the retrieved clinical notes. Our analyses identified 47,641 clinical associations with a genetic cause at distinct ages prior to diagnosis. Notable among these associations were: SCN1A with status epilepticus between 9 and 12 months of age (P<0.0001, 95% CI=8.10-133); STXBP1 with muscular hypotonia between 6 and 9 months (P=3.4×10−4, 95% CI=3.08-102); SCN2A with autism between 1.5 and 1.75 years (P<0.0001, 95% CI=11.1-Inf); DEPDC5 with focal-onset seizure between 5.75 and 6 years (P<0.0001, 95% CI=12.8-Inf); and IQSEC2 with myoclonic seizure between 2.75 and 3 years (P=2.5×10−4, 95% CI=11.3-1.15×104). We also identified associations between clinical terms and gene groups. Variants in ion channel gating mechanisms were associated with myoclonus between 3 and 6 months of age (P<0.0001, 95% CI=5.23-24.2), and variants in calcium channel genes were associated with neurodevelopmental delay between 1.75 and 2 years (P<0.0001, 95% CI=4.8-Inf). Cumulative longitudinal analysis revealed further associations, including KCNT1 with migrating focal seizures from at 0 to 1.75 years (P<0.0001, 95% CI=96.8-4.50×1015). A neurodevelopmental abnormality presenting between 6 and 9 months of age was strongly associated with an individual having any genetic diagnosis (P<0.0001, 95% CI=3.55-7.42). The earliest features associated with genetic diagnosis occurred a median of 3.6 years prior to the median age of diagnosis. Latency to diagnosis was greater in older individuals (P<0.0001) and those who initially underwent less comprehensive genetic testing (P=5.5×10−3, 95% CI=1.23-3.35).
In summary, we identified key clinical features that precede genetic diagnosis, leveraging EMR data at scale from a large cohort of individuals with genetic epilepsies. Our findings demonstrate that automated EMR analysis may assist clinical decision making, leading to earlier diagnosis and more precise prognostication and treatment of genetic epilepsies in the precision medicine era.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
IH is supported by a NINDS K award (K02 NS112600) and the Hartwell Foundation (Individual Biomedical Research Award). BL is supported by National Institute for Neurological Disorders and Stroke (DP1NS122038) and The Jonathan Rothberg Family Fund. DLS is supported by the Wellcome Trust [203914/Z/16/Z]. For the purpose of Open Access, the author has applied a CC BY public copyright license to any Author Accepted Manuscript version arising from this submission.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The IRB of Children's Hospital of Philadelphia gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
Abbreviations
- CI
- Confidence Interval
- EGRP
- Epilepsy Genetics Research Project
- EMR
- Electronic Medical Records
- HPO
- Human Phenotype Ontology
- NLP
- Natural Language Processing
- OR
- Odds Ratio
- PELHS
- Pediatric Epilepsy Learning Health System
- PPV
- Positive Predictive Value