Abstract
Background Many thousands of patients with a suspected Mendelian disease have their exomes/genomes sequenced every year, but only about 30% receive a definitive diagnosis. Since a novel Mendelian gene-disease association is published on average every business day, thousands of undiagnosed patient cases could receive a diagnosis each year if their genomes were regularly compared to the latest literature. With millions of genomes expected to be sequenced for rare disease analysis by 2025, and considering the current publication rate of 1.1 million new articles per annum in PubMed, manually reanalyzing the growing cases of undiagnosed patients is not sustainable.
Methods We describe a fully automated reanalysis framework for patients with suspected, but undiagnosed, Mendelian disorders. The presented framework was tested by automatically parsing all ∼100,000 newly published peer reviewed papers every month and matching them on genotype and phenotype with all stored undiagnosed patients. If a new article contains a possible diagnosis for an undiagnosed patient, the system provides notification. We test the accuracy of the automatic reanalysis system on 110 patients, including 61 with available trio data.
Results Even when trained only on older data, our system identifies 80% of reanalysis diagnoses, while sending only 0.5-1 alerts per patient per year, a 100-1,000-fold efficiency gain over manual literature surveillance of equivalent yield.
Conclusion We show that automatic reanalysis of patients with suspected Mendelian disease is feasible and has the potential to greatly streamline diagnosis. Our system is not intended to replace clinical judgment. Rather, clinical diagnostic services could greatly benefit from a modest re-allocation of time from manual literature exploration to review of automated reanalysis alerts. Our system additionally supports a new paradigm for medical IT systems: proactive, continuously learning and consequently able to autonomously identify valuable insights as they emerge in digital health records. We have launched automated patient reanalysis, trained on the latest data, with user accounts and daily literature updates at https://AMELIE.stanford.edu.
Competing Interest Statement
DNC and PDS acknowledge the receipt of financial support from Qiagen Inc through a License Agreement with Cardiff University. The Department of Molecular and Human Genetics at Baylor College of Medicine receives revenue from clinical genetic testing completed at Baylor Genetics. EAA is advisor to Apple, co-founder of Personalis Inc., and of DeepCell Inc. MTW is a stockholder of Personalis. The remaining authors declare no conflict of interest.
Funding Statement
All computational work was funded only by a Bio-X SIGF fellowship (JB), the Stanford Department of Pediatrics (JAB, GB), a Packard Foundation Fellowship (GB), and a Microsoft Faculty Fellowship (GB). UDN curated data used in this manuscript was supported by the NIH.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
De-identified data from the DDD project were accessed via the European Genome-Phenome Archive (study EGAS00001000775). As applicable to the participating patients, the study protocol was reviewed and approved by the Stanford University Institutional Review Board (IRB-32036) and the central IRB at the NIH National Human Genome Research Institute for the Undiagnosed Diseases Network. Written informed consent was obtained from all participants.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
A portion of the data we use is available from EGA. Another portion is of consented Stanford or UDN patients. Some of the latter can be shared while respecting consent conditions.