Whole population, genome-wide mapping of hidden relatedness

  1. Alexander Gusev1,11,
  2. Jennifer K. Lowe2,3,4,
  3. Markus Stoffel5,
  4. Mark J. Daly3,6,7,
  5. David Altshuler3,4,7,
  6. Jan L. Breslow2,
  7. Jeffrey M. Friedman2,8,9 and
  8. Itsik Pe'er1,10,11
  1. 1 Department of Computer Science, Columbia University, New York, New York 10027, USA;
  2. 2 The Rockefeller University, New York, New York 10065, USA;
  3. 3 Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA;
  4. 4 Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
  5. 5 ETH Zurich, Zurich 8093, Switzerland;
  6. 6 Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
  7. 7 Department of Medicine, Harvard Medical School, Boston, Massachusetts 02115, USA;
  8. 8 Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA;
  9. 9 Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA;
  10. 10 Center for Computational Biology and Bioinformatics, New York, New York 10032, USA

    Abstract

    We present GERMLINE, a robust algorithm for identifying segmental sharing indicative of recent common ancestry between pairs of individuals. Unlike methods with comparable objectives, GERMLINE scales linearly with the number of samples, enabling analysis of whole-genome data in large cohorts. Our approach is based on a dictionary of haplotypes that is used to efficiently discover short exact matches between individuals. We then expand these matches using dynamic programming to identify long, nearly identical segmental sharing that is indicative of relatedness. We use GERMLINE to comprehensively survey hidden relatedness both in the HapMap as well as in a densely typed island population of 3000 individuals. We verify that GERMLINE is in concordance with other methods when they can process the data, and also facilitates analysis of larger scale studies. We bolster these results by demonstrating novel applications of precise analysis of hidden relatedness for (1) identification and resolution of phasing errors and (2) exposing polymorphic deletions that are otherwise challenging to detect. This finding is supported by concordance of detected deletions with other evidence from independent databases and statistical analyses of fluorescence intensity not used by GERMLINE.

    Footnotes

    | Table of Contents

    Preprint Server