RT Journal Article SR Electronic T1 Nanopore sequencing of 1000 Genomes Project samples to build a comprehensive catalog of human genetic variation JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2024.03.05.24303792 DO 10.1101/2024.03.05.24303792 A1 Gustafson, Jonas A. A1 Gibson, Sophia B. A1 Damaraju, Nikhita A1 Zalusky, Miranda PG A1 Hoekzema, Kendra A1 Twesigomwe, David A1 Yang, Lei A1 Snead, Anthony A. A1 Richmond, Phillip A. A1 Coster, Wouter De A1 Olson, Nathan D. A1 Guarracino, Andrea A1 Li, Qiuhui A1 Miller, Angela L. A1 Goffena, Joy A1 Anderson, Zachery A1 Storz, Sophie HR A1 Ward, Sydney A. A1 Sinha, Maisha A1 Gonzaga-Jauregui, Claudia A1 Clarke, Wayne E. A1 Basile, Anna O. A1 Corvelo, André A1 Reeves, Catherine A1 Helland, Adrienne A1 Musunuri, Rajeeva Lochan A1 Revsine, Mahler A1 Patterson, Karynne E. A1 Paschal, Cate R. A1 Zakarian, Christina A1 Goodwin, Sara A1 Jensen, Tanner D. A1 Robb, Esther A1 The 1000 Genomes ONT Sequencing Consortium A1 University of Washington Center for Rare Disease Research (UW-CRDR) A1 Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium A1 McCombie, W. Richard A1 Sedlazeck, Fritz J. A1 Zook, Justin M. A1 Montgomery, Stephen B. A1 Garrison, Erik A1 Kolmogorov, Mikhail A1 Schatz, Michael C. A1 McLaughlin, Richard N. A1 Dashnow, Harriet A1 Zody, Michael C. A1 Loose, Matt A1 Jain, Miten A1 Eichler, Evan E. A1 Miller, Danny E. YR 2024 UL http://medrxiv.org/content/early/2024/03/07/2024.03.05.24303792.abstract AB Less than half of individuals with a suspected Mendelian condition receive a precise molecular diagnosis after comprehensive clinical genetic testing. Improvements in data quality and costs have heightened interest in using long-read sequencing (LRS) to streamline clinical genomic testing, but the absence of control datasets for variant filtering and prioritization has made tertiary analysis of LRS data challenging. To address this, the 1000 Genomes Project ONT Sequencing Consortium aims to generate LRS data from at least 800 of the 1000 Genomes Project samples. Our goal is to use LRS to identify a broader spectrum of variation so we may improve our understanding of normal patterns of human variation. Here, we present data from analysis of the first 100 samples, representing all 5 superpopulations and 19 subpopulations. These samples, sequenced to an average depth of coverage of 37x and sequence read N50 of 54 kbp, have high concordance with previous studies for identifying single nucleotide and indel variants outside of homopolymer regions. Using multiple structural variant (SV) callers, we identify an average of 24,543 high-confidence SVs per genome, including shared and private SVs likely to disrupt gene function as well as pathogenic expansions within disease-associated repeats that were not detected using short reads. Evaluation of methylation signatures revealed expected patterns at known imprinted loci, samples with skewed X-inactivation patterns, and novel differentially methylated regions. All raw sequencing data, processed data, and summary statistics are publicly available, providing a valuable resource for the clinical genetics community to discover pathogenic SVs.Competing Interest StatementWDC, ML, FS, and DEM have received research support and/or consumables from ONT. WDC, JG, FS, and DEM have received travel funding to speak on behalf of ONT. DEM is on a scientific advisory board at ONT. FS has received research support from Illumina, Genetech, and PacBio. SBM is an advisor to BioMarin, MyOme, and Tenaya Therapeutics. EEE is a scientific advisory board (SAB) member of Variant Bio, Inc. DEM holds stock options in MyOme.Funding StatementSBG is supported by NIH grant 5T32HG000035-29; WDC is a recipient of a postdoctoral fellowship from FWO [12ASR24N]; EG and AG are supported by NIH grants R01HG013017 and U01DA057530 and NSF grant 2118744; SG is supported by NIH grant 5R50CA243890; TDJ is supported by NIH grant T32HG000044; MK is supported by Intramural NIH funding; SBM, TDJ, and ER is supported by NIH Grant U01HG011762; MCS is supported by NIH grants U24HG010263, R03CA272952, and U01CA253481 and the Lustgarten Foundation grant 90101412; FJS is supported by NIH grants 1U01HG011758-01, 1UG3NS132105-01, and U01AG058589; AAS is supported by an NSF postdoctoral research fellowship in biology [NSF 22-623]; RNM and LY are supported by NIH grants 5R35GM142733-03 and 5R21AI174130-02; EEE is supported by NIH grant HG010169 and EEE is an investigator of the Howard Hughes Medical Institute; DEM is supported by the NIH Directors Early Independence Award DP5OD033357. The GREGoR Consortium is funded by the National Human Genome Research Institute of the National Institutes of Health, through the following grants: U01HG011758, U01HG011755, U01HG011745, U01HG011762, U01HG011744, and U24HG011746.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This study uses only publicly available cell lines from the 1000 Genomes Project available at Coriell and data available at public sources such as at https://www.internationalgenome.org/data/.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesData for all samples sequenced as part of the 1000 Genomes Project ONT Sequencing Consortium are publicly available at https://s3.amazonaws.com/1000g-ont/index.html. Data from the 100 samples reported here, as well as summary analysis data, are available at https://s3.amazonaws.com/1000g-ont/index.html?prefix=FIRST_100_FREEZE/. Data and code related to pangenome analyses are available at https://github.com/AndreaGuarracino/1000G-ONT-F100-PGGB.https://s3.amazonaws.com/1000g-ont/index.html