Variable prediction accuracy of polygenic scores within an ancestry group

Elife. 2020 Jan 30:9:e48376. doi: 10.7554/eLife.48376.

Abstract

Fields as diverse as human genetics and sociology are increasingly using polygenic scores based on genome-wide association studies (GWAS) for phenotypic prediction. However, recent work has shown that polygenic scores have limited portability across groups of different genetic ancestries, restricting the contexts in which they can be used reliably and potentially creating serious inequities in future clinical applications. Using the UK Biobank data, we demonstrate that even within a single ancestry group (i.e., when there are negligible differences in linkage disequilibrium or in causal alleles frequencies), the prediction accuracy of polygenic scores can depend on characteristics such as the socio-economic status, age or sex of the individuals in which the GWAS and the prediction were conducted, as well as on the GWAS design. Our findings highlight both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use.

Keywords: GWAS; genetics; genomics; human; human genetics; polygenic scores; portability; trait prediction.

Plain language summary

Complex diseases like cancer and heart disease are caused by the interplay of many factors: the variants of genes we inherit, the lifestyles we lead and the environments we inhabit, plus the interaction of all these factors. In fact, almost every trait, even how many years we will spend studying, is influenced both by our environment and our genes. To identify some of the genetic factors at play, scientists perform analyses known as genome-wide association studies, or GWAS for short. In these studies, the genomes from many different people are scanned to look for genetic differences associated with differences in traits. By summing up all the small genetic differences, so-called “polygenic scores” can be calculated. When there is a large genetic component to a trait, polygenic scores can be useful predictive tools. But there is a catch: polygenic scores make less accurate predictions for individuals of a different ancestry than those involved in the GWAS, which limits the use of these tools around the world. Mostafavi, Harpak et al. set out to understand if there are other factors in addition to ancestry that could influence the performance of polygenic scores. Using data from the UK Biobank, an international health resource that pairs genomic data and clinical information, Mostafavi, Harpak et al. examined polygenic scores among individuals that share a single, common ancestry. These polygenic scores were used to predict three traits (blood pressure, body mass index and educational attainment) in individuals and the predictions were then compared to the actual trait values to see how accurate they were. The analysis revealed that even within a group of people with similar ancestry, the accuracy of polygenic scores can vary, depending on characteristics such as the sex, age or socioeconomic status of the individuals. This analysis emphasises how variable GWAS and their predictive value can be even within seemingly similar population groups. It further highlights both the complexities of interpreting polygenic scores and underappreciated obstacles to their broad use in medical and social sciences.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Age Factors
  • Aged
  • Female
  • Gene Frequency / genetics
  • Genetics, Population / methods*
  • Genome-Wide Association Study / methods*
  • Humans
  • Male
  • Middle Aged
  • Multifactorial Inheritance / genetics*
  • Polymorphism, Single Nucleotide / genetics
  • Sex Factors
  • Socioeconomic Factors
  • United Kingdom

Associated data

  • Dryad/10.5061/dryad.66t1g1jxs