PT - JOURNAL ARTICLE AU - Xiao Chen AU - Alba Sanchis-Juan AU - Courtney E French AU - Andrew J Connell AU - Isabelle Delon AU - Zoya Kingsbury AU - Aditi Chawla AU - Aaron L Halpern AU - Ryan J Taft AU - NIHR BioResource AU - David R Bentley AU - Matthew ER Butchbach AU - F Lucy Raymond AU - Michael A Eberle TI - Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data AID - 10.1101/19006635 DP - 2019 Jan 01 TA - medRxiv PG - 19006635 4099 - http://medrxiv.org/content/early/2019/12/18/19006635.short 4100 - http://medrxiv.org/content/early/2019/12/18/19006635.full AB - Purpose Spinal muscular atrophy (SMA), caused by loss of the SMN1 gene, is a leading cause of early childhood death. Due to the near identical sequences of SMN1 and SMN2, analysis of this region is challenging. Population-wide SMA screening to quantify the SMN1 copy number (CN) is recommended by the American College of Medical Genetics.Methods We developed a method that accurately identifies the CN of SMN1 and SMN2 using genome sequencing (GS) data by analyzing read depth and eight informative reference genome differences between SMN1/2.Results We characterized SMN1/2 in 12,747 genomes, identified 1568 samples with SMN1 gains or losses and 6615 samples with SMN2 gains or losses and calculated a pan-ethnic carrier frequency of 2%, consistent with previous studies. Additionally, 99.8% of our SMN1 and 99.7% of SMN2 CN calls agreed with orthogonal methods, with a recall of 100% for SMA and 97.8% for carriers, and a precision of 100% for both SMA and carriers.Conclusion This SMN copy number caller can be used to identify both carrier and affected status of SMA, enabling SMA testing to be offered as a comprehensive test in neonatal care and an accurate carrier screening tool in GS sequencing projects.Competing Interest StatementXiao Chen, Aditi Chawla, Aaron L Halpern1, Ryan J Taft, David R Bentley, and Michael A Eberle are all employed by Illumina a maker of genome sequencing instruments.Funding StatementThis work was supported by the Cambridge Biomedical Research Centre and the National Institute for Health Research (NIHR) for the NIHR BioResource (grant number RG65966), the National Institute of General Medical Sciences of the National Institutes of Health (P30GM114736 and P20GM103446; to MERB) and the Nemours Foundation (to MERB). We thank the New York Genome Center (supported by NHGRI Grant 3UM1HG008901-03S1), and the Coriell Institute for Medical Research for generating and releasing the 1kGP WGS data. Author DeclarationsAll relevant ethical guidelines have been followed and any necessary IRB and/or ethics committee approvals have been obtained.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesAny clinical trials involved have been registered with an ICMJE-approved registry such as ClinicalTrials.gov and the trial ID is included in the manuscript.Not ApplicableI have followed all appropriate research reporting guidelines and uploaded the relevant Equator, ICMJE or other checklist(s) as supplementary files, if applicable.Not ApplicableThe 1kGP data can be downloaded from https://www.ncbi.nlm.nih.gov/bioproject/PRJEB31736/. Data from the NIHR BioResource participants have been deposited in European Genome-phenome Archive (EGA) at the EMBL European Bioinformatics Institute. Those participants from the NIHR BioResource who enrolled for the 100,000 Genomes Project-Rare Diseases Pilot can be accessed by seeking access via Genomics England Limited following the procedure outlined at: https://www.genomicsengland.co.uk/about-gecip/joining-research-community. The Bam files from the NGC individuals have been deposited in EGA under accession number EGAD00001004357.