PT - JOURNAL ARTICLE AU - M.A. Bouzinier AU - D. Etin AU - S.I. Trifonov AU - V.N. Evdokimova AU - V. Ulitin AU - J. Shen AU - A. Kokorev AU - A. A. Ghazani AU - Y. Chekaluk AU - Z. Albertyn AU - A. Giersch AU - C.C. Morton AU - F. Abraamyan AU - P.K. Bendapudi AU - S. Sunyaev AU - Undiagnosed Diseases Network AU - Brigham Genomic Medicine AU - SEQuencing a Baby for an Optimal Outcome, Quantori AU - J.B. Krier TI - AnFiSA: An open-source computational platform for the analysis of sequencing data for rare genetic disease AID - 10.1101/2021.09.26.21263358 DP - 2021 Jan 01 TA - medRxiv PG - 2021.09.26.21263358 4099 - http://medrxiv.org/content/early/2021/09/29/2021.09.26.21263358.1.short 4100 - http://medrxiv.org/content/early/2021/09/29/2021.09.26.21263358.1.full AB - Despite genomic sequencing rapidly transforming from being a bench-side tool to a routine procedure in a hospital, there is a noticeable lack of genomic analysis software that supports both clinical and research workflows as well as crowdsourcing. Furthermore, most existing software packages are not forward-compatible in regards to supporting ever-changing diagnostic rules adopted by the genetics community. Regular updates of genomics databases pose challenges for reproducible and traceable automated genetic diagnostics tools. Lastly, most of the software tools score low on explainability amongst clinicians.We have created a fully open-source variant curation tool, AnFiSA, with the intention to invite and accept contributions from clinicians, researchers and professional software developers. The design of AnFiSA addresses the aforementioned issues via the following architectural principles: using a multidimensional database management system (DBMS) for genomic data to address reproducibility, curated decision trees adaptable to changing clinical rules, and a crowdsourcing-friendly interface to address difficult-to-diagnose cases. We discuss how we have chosen our technology stack and describe the design and implementation of the software. Finally, we show in detail how selected workflows can be implemented using the current version of AnFiSA by a medical geneticist.All software is available at https://github.com/ForomePlatform under the Apache 2.0 license. The public demo instance with the public data sets can be accessed via https://github.com/ForomePlatform/AnFiSA#public-demoAuthor Summary We describe Anfisa, a new application to facilitate the analysis of large-scale data from genomic sequencing for the purposes of identifying genetic causes of disease. Anfisa includes functionality for clinical geneticists to rapidly identify disease-causing genetic variants in genes that are already known to cause genetic disease, as well tools for genomics researchers to identify novel causes of genetic disease. While tools exist to support these activities, Anfisa has multiple distinguishing characteristics. With respect to technological innovation, Anfisa is the first annotation and filtration tool to offer a decision tree solution for step by step clinical guidelines design and among few successfully implemented multidimensional data models to perform rapid filtering computation. In addition, Anfisa was developed true to Open Source community standards and is well documented and maintained, which differentiates it from commercial or academic tools with related functionality. We have successfully deployed Anfisa to diagnose genetic causes of congenital hearing loss in participants previously refractory to molecular diagnosis, and for establishing evidence for an association between a multifactorial disease (purpura fulminans) and a set of related pathways. Finally, we have demonstrated the successful implementation of Anfisa by a bioinformatics team independent of the initial developers.Competing Interest StatementThe authors have declared no competing interest.Funding StatementGrant support: SEQuencing a Baby for an Optimal Outcome, NIH5R01DC015052-03; Center for Integrated Approaches to Undiagnosed Disease, NIH 1U01HG007690-01; Computational resources to support some aspects of this work were provided by IBM. Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This work was conducted under the protocol titled Genomic Evaluation of Disease,(Krier is a co-PI on this protocol), which was approved in 2015 by the MassGeneral Brigham/Partners IRB, is currently active (last annual review in 2021, Protocol 2015P000904/MGH) and broadly covers the development and application of bioinformatics tools for the purpose of identifying genetic etiologies of rare disease. The initial approval letter is attached. The primary data that has been used for development is publicly available and has been available before this work began (e.g. Genome in a Bottle data). We also reference two publications regarding the data: M. P. Ball et al., J. Wagner et al. The data used in the case description is subject to exemption category 4 (previously collected and fully anonymized data).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSource code is available on GitHub. https://github.com/ForomePlatform/AnFiSA#public-demo https://github.com/ForomePlatform