Mitochondrial DNA variation across 56,434 individuals in gnomAD

  1. Sarah E. Calvo1,2,9
  1. 1Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA;
  2. 2Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
  3. 3Yale School of Medicine, New Haven, Connecticut 06510, USA;
  4. 4Murdoch Children's Research Institute, Melbourne, Victoria 3052, Australia;
  5. 5Garvan Institute of Medical Research and UNSW Sydney, Sydney, New South Wales 2010, Australia;
  6. 6Howard Hughes Medical Institute and Massachusetts General Hospital, Boston, Massachusetts 02114, USA
  • Corresponding author: scalvo{at}broadinstitute.org
  • Abstract

    Genomic databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, databases such as the Genome Aggregation Database (gnomAD) have focused on nuclear DNA and have ignored the mitochondrial genome (mtDNA). Here, we present a pipeline to call mtDNA variants that addresses three technical challenges: (1) detecting homoplasmic and heteroplasmic variants, present, respectively, in all or a fraction of mtDNA molecules; (2) circular mtDNA genome; and (3) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded contaminated samples, cell lines, and samples prone to NUMT misalignment due to few mtDNA copies. Furthermore, we report variants with heteroplasmy ≥10%. We applied this pipeline to 56,434 whole-genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mtDNA population allele frequencies are freely accessible and will aid in diagnostic interpretation and research studies.

    Footnotes

    • 7 Full lists of Consortium authors and affiliations are located in the Supplemental Material.

    • 8 These authors contributed equally to this work.

    • 9 Co-senior authors.

    • [Supplemental material is available for this article.]

    • Article published online before print. Article, supplemental material, and publication date are at https://www.genome.org/cgi/doi/10.1101/gr.276013.121.

    • Freely available online through the Genome Research Open Access option.

    • Received July 23, 2021.
    • Accepted January 19, 2022.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

    | Table of Contents
    OPEN ACCESS ARTICLE

    Preprint Server