PT - JOURNAL ARTICLE AU - Francisco M. De La Vega AU - Shimul Chowdhury AU - Barry Moore AU - Erwin Frise AU - Jeanette McCarthy AU - Edgar Javier Hernandez AU - Terrence Wong AU - Kiely James AU - Lucia Guidugli AU - Pankaj B Agrawal AU - Casie A Genetti AU - Catherine A Brownstein AU - Alan H Beggs AU - Britt-Sabina Löscher AU - Andre Franke AU - Braden Boone AU - Shawn E. Levy AU - Katrin Õunap AU - Sander Pajusalu AU - Matt Huentelman AU - Keri Ramsey AU - Marcus Naymik AU - Vinodh Narayanan AU - Narayanan Veeraraghavan AU - Paul Billings AU - Martin G. Reese AU - Mark Yandell AU - Stephen F. Kingsmore TI - Artificial intelligence enables comprehensive genome interpretation and nomination of candidate diagnoses for rare genetic diseases AID - 10.1101/2021.02.09.21251456 DP - 2021 Jan 01 TA - medRxiv PG - 2021.02.09.21251456 4099 - http://medrxiv.org/content/early/2021/02/12/2021.02.09.21251456.short 4100 - http://medrxiv.org/content/early/2021/02/12/2021.02.09.21251456.full AB - Background Clinical interpretation of genetic variants in the context of the patient’s phenotype is becoming the largest component of cost and time expenditure for genome-based diagnosis of rare genetic diseases. Artificial intelligence (AI) holds promise to greatly simplify and speed interpretation by comprehensively evaluating genetic variants for pathogenicity in the context of the growing knowledge of genetic disease. We assess the diagnostic performance of GEM, a new, AI-based, clinical decision support tool, compared with expert manual interpretation.Methods We benchmarked GEM in a retrospective cohort of 119 probands, mostly NICU infants, diagnosed with rare genetic diseases, who received whole genome sequencing (WGS) at Rady Children’s Hospital. We also performed a replication study in a separate cohort of 60 cases diagnosed at five additional academic medical centers. For comparison, we also analyzed these cases with commonly used variant prioritization tools (Phevor, Exomiser, and VAAST). Included in the comparisons were WGS and whole exome sequencing (WES) as trios, duos, and singletons. Variants underpinning diagnoses spanned diverse modes of inheritance and types, including structural variants (SVs). Patient phenotypes were extracted either manually or by automated clinical natural language processing (CNLP) from clinical notes. Finally, 14 previously unsolved cases were re-analyzed.Results GEM ranked >90% of causal genes among the top or second candidate, using manually curated or CNLP derived phenotypes, and prioritized a median of 3 genes for review per case. Ranking of trios and duos was unchanged when analyzed as singletons. In 17 of 20 cases with diagnostic SVs, GEM identified the causal SVs as the top or second candidate irrespective of whether SV calls where provided or inferred ab initio by GEM when absent. Analysis of 14 previously unsolved cases provided novel findings in one, candidates ultimately not advanced in 3, and no new findings in 10, demonstrating the utility of GEM for reanalysis.Conclusions GEM enables automated diagnostic interpretation of WES and WGS for all types of variants, including SVs, nominating a very short list of candidate genes and disorders for final review and reporting. In combination with deep phenotyping by CNLP, GEM enables substantial automation of genetic disease diagnosis, potentially decreasing the cost and speeding case review.Competing Interest StatementFV, EF, JM, MGR were employees of Fabric Genomics Inc. during the performance of this work. EF, FV, JM, MGR, and MY are stock holders, or have received stock option awards from Fabric Genomics Inc. BM and MY have received consulting fees from Fabric Genomics Inc. The other authors declare no competing interests.Funding StatementMH, KR, MN and VN were supported in part by The Center for Rare Childhood Disorders, funded through donations made to the TGen Foundation. AF and BSL were supported by the DFG Cluster of Excellence "Precision Medicine in Chronic Inflammation". KO and SP were supported by Estonian Research Council grants PUT355, PRG471, MOBTP175 and PUTJD827. Sequencing and analysis were partially provided by the Broad Institute of MIT and Broad Center for Mendelian Genomics (Broad CMG) and was funded by the National Human Genome Research Institute, the National Eye Institute, and the National Heart, Lung and Blood Institute grant UM1 HG008900 and in part by National Human Genome Research Institute grant R01 HG009141. The phenotyping and analysis of patients at Boston Children's Hospital was funded by MDA602235 from the Muscular Dystrophy Association, and the Tommy Fuss Foundation, and the Yale Center for Mendelian Genomics. Sanger sequencing confirmations utilized the resources of the Boston Children's Hospital IDDRC Molecular Genetics Core Facility supported by U54HD090255 from the National Institutes of health. NIH Public Access Policy: Any publication that received direct funding from an NIH grant (UM1 HG008900 - Joint Center for Mendelian Genomics) is required to be entered into PubMed Central within a year of publication date. As it takes a while to go through the NIHMS system, it is suggested that the submission process begin as soon as possible, and you may be marked as non-compliant if the submission process is not initiated soon after publication. You can determine the submission method for your journal via this website. If you have any questions or need any assistance with this process, please contact us at cmg@broadinstitute.org. The NIH Public Access Policy implements Division F Section 217 of PL 111-8 (Omnibus Appropriations Act, 2009). The law states: The Director of the National Institutes of Health (NIH) shall require in the current fiscal year and thereafter that all investigators funded by the NIH submit or have submitted for them to the National Library of Medicine's PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, that the NIH shall implement the public access policy in a manner consistent with copyright law.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Rady Children's Hospital: The studies from which cases derived from were approved by the institutional review board at Rady Children's Hospital, San Diego, CA, USA. The studies were designated to be of "nonsignificant risk" by the FDA in response to an investigational device exemption presubmission inquiry in April 2014. The studies were performed in accordance with the Declaration of Helsinki. Informed consent was obtained from at least one parent or guardian. Patients were consented to clinical research studies approved by the Institutional Review Board. Boston Childrens Hospital: This study was approved by the Institutional Review Board of Boston Children's Hospital. Christian-Albrechts University of Kiel: Proper informed consent and ethical approvals were obtained on all patients included in this analysis. HudsonAlpha Institute for Biotechnology: Proper informed consent and ethical approvals were obtained on all patients included in this analysis. Translational Genomics Research Institute: Participants were consented and enrolled in the WIRB Protocol #20120789. Tartu University Hospital: This study was approved by the Research Ethics Committee of the University of Tartu (approval date 17/10/2016 and number 263/M‐16).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe datasets supporting the conclusions of this article are included within the article and its additional files. Due to patient privacy, data sharing consent, and HIPAA regulations, our raw data cannot be submitted to publicly available databases. GEM, PHEVOR and VAAST software implementations for versions used in this analysis are part of the Fabric Enterprise analysis platform and is commercially available (https://www.fabricgenomics.com). Exomiser source code (version 12.1.0) is available on GitHub at: https://github.com/exomiser/Exomiser.https://github.com/exomiser/Exomiser.