Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Global biobank analyses provide lessons for computing polygenic risk scores across diverse cohorts

View ORCID ProfileYing Wang, View ORCID ProfileShinichi Namba, View ORCID ProfileEsteban Lopera, View ORCID ProfileSini Kerminen, View ORCID ProfileKristin Tsuo, View ORCID ProfileKristi Läll, View ORCID ProfileMasahiro Kanai, Wei Zhou, View ORCID ProfileKuan-Han Wu, Marie-Julie Favé, View ORCID ProfileLaxmi Bhatta, Philip Awadalla, View ORCID ProfileBen Brumpton, View ORCID ProfilePatrick Deelen, Kristian Hveem, View ORCID ProfileValeria Lo Faro, View ORCID ProfileReedik Mägi, View ORCID ProfileYoshinori Murakami, View ORCID ProfileSerena Sanna, View ORCID ProfileJordan W. Smoller, Jasmina Uzunovic, View ORCID ProfileBrooke N. Wolford, Global Biobank Meta-analysis Initiative, Cristen Willer, View ORCID ProfileEric R. Gamazon, Nancy J. Cox, Ida Surakka, View ORCID ProfileYukinori Okada, Alicia R. Martin, View ORCID ProfileJibril Hirbo
doi: https://doi.org/10.1101/2021.11.18.21266545
Ying Wang
1Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
2Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ying Wang
  • For correspondence: yiwang@broadinstitute.org armartin@broadinstitute.org jibril.hirbo@vumc.org
Shinichi Namba
4Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shinichi Namba
Esteban Lopera
5University of Groningen, UMCG, Department of Genetics, Groningen, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Esteban Lopera
Sini Kerminen
6Institute for Molecular Medicine Finland, FIMM, HiLIFE, University of Helsinki, Helsinki, Finland
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sini Kerminen
Kristin Tsuo
1Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
2Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kristin Tsuo
Kristi Läll
7Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kristi Läll
Masahiro Kanai
1Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
2Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
8Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
9Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Masahiro Kanai
Wei Zhou
1Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
2Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kuan-Han Wu
10Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI, 48103, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kuan-Han Wu
Marie-Julie Favé
11Ontario Institute for Cancer Research, Toronto, Ontario, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Laxmi Bhatta
12K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, 7030, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Laxmi Bhatta
Philip Awadalla
11Ontario Institute for Cancer Research, Toronto, Ontario, Canada
13Department of Molecular Genetics, University of Toronto, Toronto, ON, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ben Brumpton
12K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, 7030, Norway
14HUNT Research Centre, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Levanger, 7600, Norway
15Clinic of Medicine, St. Olavs Hospital, Trondheim University Hospital, Trondheim, 7030, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ben Brumpton
Patrick Deelen
5University of Groningen, UMCG, Department of Genetics, Groningen, the Netherlands
16Oncode Institute, Utrecht, The Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Patrick Deelen
Kristian Hveem
12K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, 7030, Norway
14HUNT Research Centre, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Levanger, 7600, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Valeria Lo Faro
17Department of Ophthalmology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
18Department of Clinical Genetics, Amsterdam University Medical Center (AMC), Amsterdam, The Netherlands
19Department of Immunology, Genetics and Pathology, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Valeria Lo Faro
Reedik Mägi
7Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Reedik Mägi
Yoshinori Murakami
20Division of Molecular Pathology, Institute of Medical Science, The University of Tokyo, Tokyo, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yoshinori Murakami
Serena Sanna
5University of Groningen, UMCG, Department of Genetics, Groningen, the Netherlands
21Institute for Genetics and Biomedical Research (IRGB), National Research Council (CNR), Cagliari 09100, Italy
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Serena Sanna
Jordan W. Smoller
22Psychiatric and Neurodevelopmental Genetics Unit, Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA 02114, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jordan W. Smoller
Jasmina Uzunovic
11Ontario Institute for Cancer Research, Toronto, Ontario, Canada
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brooke N. Wolford
10Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor MI, 48103, USA
12K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, 7030, Norway
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Brooke N. Wolford
12K.G. Jebsen Center for Genetic Epidemiology, Department of Public Health and Nursing, NTNU, Norwegian University of Science and Technology, Trondheim, 7030, Norway
23Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
24Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
25Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA
Eric R. Gamazon
26Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
27MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
28Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Eric R. Gamazon
Nancy J. Cox
26Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
28Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ida Surakka
23Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yukinori Okada
4Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
29Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
30Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita 565-0871, Japan
31Integrated Frontier Research for Medical Science Division, Institute for Open and Transdisciplinary Research Initiatives, Osaka University, Suita 565-0871, Japan
32Center for Infectious Disease Education and Research (CiDER), Osaka University, Suita 565-0871, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yukinori Okada
Alicia R. Martin
1Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
2Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
3Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: yiwang@broadinstitute.org armartin@broadinstitute.org jibril.hirbo@vumc.org
Jibril Hirbo
26Department of Medicine, Division of Genetic Medicine, Vanderbilt University School of Medicine, Nashville, TN, USA
28Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jibril Hirbo
  • For correspondence: yiwang@broadinstitute.org armartin@broadinstitute.org jibril.hirbo@vumc.org
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Summary

With the increasing availability of biobank-scale datasets that incorporate both genomic data and electronic health records, many associations between genetic variants and phenotypes of interest have been discovered. Polygenic risk scores (PRS), which are being widely explored in precision medicine, use the results of association studies to predict the genetic component of disease risk by accumulating risk alleles weighted by their effect sizes. However, limited studies have thoroughly investigated best practices for PRS in global populations across different diseases. In this study, we utilize data from the Global-Biobank Meta-analysis Initiative (GBMI), which consists of individuals from diverse ancestries and across continents, to explore methodological considerations and PRS prediction performance in 9 different biobanks for 14 disease endpoints. Specifically, we constructed PRS using heuristic (pruning and thresholding, P+T) and Bayesian (PRS-CS) methods. We found that the genetic architecture, such as SNP-based heritability and polygenicity, varied greatly among endpoints. For both PRS construction methods, using a European ancestry LD reference panel resulted in comparable or higher prediction accuracy compared to several other non-European based panels; this is largely attributable to European descent populations still comprising the majority of GBMI participants. PRS-CS overall outperformed the classic P+T method, especially for endpoints with higher SNP-based heritability. For example, substantial improvements are observed in East-Asian ancestry (EAS) using PRS-CS compared to P+T for heart failure (HF) and chronic obstructive pulmonary disease (COPD). Notably, prediction accuracy is heterogeneous across endpoints, biobanks, and ancestries, especially for asthma which has known variation in disease prevalence across global populations. Overall, we provide lessons for PRS construction, evaluation, and interpretation using the GBMI and highlight the importance of best practices for PRS in the biobank-scale genomics era.

Competing Interest Statement

E.R.G. receives an honorarium from the journal Circulation Research of the American Heart Association as a member of the Editorial Board.

Funding Statement

A.R.M is funded by the K99/R00MH117229. E.L. is funded by the Colciencias fellowship ed.783. S.N. was supported by Takeda Science Foundation. Y.O. was supported by JSPS KAKENHI (19H01021, 20K21834), and AMED (JP21km0405211, JP21ek0109413, JP21ek0410075, JP21gm4010006, and JP21km0405217), JST Moonshot R&D (JPMJMS2021, JPMJMS2024), Takeda Science Foundation, and Bioinformatics Initiative of Osaka University Graduate School of Medicine, Osaka University. E.R.G. is supported by the National Institutes of Health (NIH) Awards R35HG010718, R01HG011138, R01GM140287, and NIH/NIA AG068026. V.L.F. was supported by the European Unions Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.675033 (EGRET plus). L. B. and B. B. receive support from the K.G. Jebsen Center for Genetic Epidemiology funded by Stiftelsen Kristian Gerhard Jebsen; Faculty of Medicine and Health Sciences, NTNU; The Liaison Committee for education, research and innovation in Central Norway; and the Joint Research Committee between St Olavs Hospital and the Faculty of Medicine and Health Sciences, NTNU. K.L. and R.M. were supported by the Estonian Research Council grant PUT (PRG687) and by INTERVENE - This project has received funding from the European Unions Horizon 2020 research and innovation programme under grant agreement No 101016775. W.Z. was supported by the National Human Genome Research Institute of the National Institutes of Health under award number T32HG010464. The work of the contributing biobanks was supported by numerous grants from governmental and charitable bodies. The biobank specific acknowledgements and full author list for GBMI are included in the Supplementary Notes.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Lead Contact: Ying Wang (yiwang{at}broadinstitute.org)

  • author affiliations updated; Supplemental files updated

Data Availability

All data produced in the present work are contained in the manuscript

https://www.globalbiobankmeta.org/resources

http://results.globalbiobankmeta.org/

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/data

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted December 02, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Global biobank analyses provide lessons for computing polygenic risk scores across diverse cohorts
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Global biobank analyses provide lessons for computing polygenic risk scores across diverse cohorts
Ying Wang, Shinichi Namba, Esteban Lopera, Sini Kerminen, Kristin Tsuo, Kristi Läll, Masahiro Kanai, Wei Zhou, Kuan-Han Wu, Marie-Julie Favé, Laxmi Bhatta, Philip Awadalla, Ben Brumpton, Patrick Deelen, Kristian Hveem, Valeria Lo Faro, Reedik Mägi, Yoshinori Murakami, Serena Sanna, Jordan W. Smoller, Jasmina Uzunovic, Brooke N. Wolford, Global Biobank Meta-analysis Initiative, Cristen Willer, Eric R. Gamazon, Nancy J. Cox, Ida Surakka, Yukinori Okada, Alicia R. Martin, Jibril Hirbo
medRxiv 2021.11.18.21266545; doi: https://doi.org/10.1101/2021.11.18.21266545
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Global biobank analyses provide lessons for computing polygenic risk scores across diverse cohorts
Ying Wang, Shinichi Namba, Esteban Lopera, Sini Kerminen, Kristin Tsuo, Kristi Läll, Masahiro Kanai, Wei Zhou, Kuan-Han Wu, Marie-Julie Favé, Laxmi Bhatta, Philip Awadalla, Ben Brumpton, Patrick Deelen, Kristian Hveem, Valeria Lo Faro, Reedik Mägi, Yoshinori Murakami, Serena Sanna, Jordan W. Smoller, Jasmina Uzunovic, Brooke N. Wolford, Global Biobank Meta-analysis Initiative, Cristen Willer, Eric R. Gamazon, Nancy J. Cox, Ida Surakka, Yukinori Okada, Alicia R. Martin, Jibril Hirbo
medRxiv 2021.11.18.21266545; doi: https://doi.org/10.1101/2021.11.18.21266545

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (215)
  • Allergy and Immunology (495)
  • Anesthesia (106)
  • Cardiovascular Medicine (1093)
  • Dentistry and Oral Medicine (195)
  • Dermatology (141)
  • Emergency Medicine (274)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (499)
  • Epidemiology (9757)
  • Forensic Medicine (5)
  • Gastroenterology (480)
  • Genetic and Genomic Medicine (2303)
  • Geriatric Medicine (222)
  • Health Economics (462)
  • Health Informatics (1553)
  • Health Policy (732)
  • Health Systems and Quality Improvement (602)
  • Hematology (236)
  • HIV/AIDS (501)
  • Infectious Diseases (except HIV/AIDS) (11631)
  • Intensive Care and Critical Care Medicine (616)
  • Medical Education (236)
  • Medical Ethics (67)
  • Nephrology (256)
  • Neurology (2139)
  • Nursing (134)
  • Nutrition (335)
  • Obstetrics and Gynecology (426)
  • Occupational and Environmental Health (517)
  • Oncology (1172)
  • Ophthalmology (363)
  • Orthopedics (128)
  • Otolaryngology (220)
  • Pain Medicine (145)
  • Palliative Medicine (50)
  • Pathology (309)
  • Pediatrics (694)
  • Pharmacology and Therapeutics (298)
  • Primary Care Research (265)
  • Psychiatry and Clinical Psychology (2172)
  • Public and Global Health (4645)
  • Radiology and Imaging (775)
  • Rehabilitation Medicine and Physical Therapy (455)
  • Respiratory Medicine (623)
  • Rheumatology (274)
  • Sexual and Reproductive Health (225)
  • Sports Medicine (208)
  • Surgery (250)
  • Toxicology (43)
  • Transplantation (120)
  • Urology (94)