Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Leveraging fine-mapping and non-European training data to improve cross-population polygenic risk scores

Omer Weissbrod, View ORCID ProfileMasahiro Kanai, Huwenbo Shi, Steven Gazal, Wouter J. Peyrot, Amit V. Khera, View ORCID ProfileYukinori Okada, The Biobank Japan Project, View ORCID ProfileAlicia R. Martin, Hilary Finucane, Alkes L. Price
doi: https://doi.org/10.1101/2021.01.19.21249483
Omer Weissbrod
1Harvard School of Public Health, Epidemiology Department, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: oweissbrod@hsph.harvard.edu
Masahiro Kanai
2Broad Institute of MIT and Harvard, Cambridge, MA
3Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Masahiro Kanai
Huwenbo Shi
1Harvard School of Public Health, Epidemiology Department, Boston, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Steven Gazal
1Harvard School of Public Health, Epidemiology Department, Boston, MA
4Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
5Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wouter J. Peyrot
1Harvard School of Public Health, Epidemiology Department, Boston, MA
6Department of Psychiatry, Amsterdam UMC, Vrije Universiteit, Amsterdam, the Netherlands
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amit V. Khera
2Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yukinori Okada
3Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, Japan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yukinori Okada
7Institute of Medical Science, The University of Tokyo, Tokyo, Japan
Alicia R. Martin
2Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Alicia R. Martin
Hilary Finucane
2Broad Institute of MIT and Harvard, Cambridge, MA
8Department of Medicine, Massachusetts General Hospital, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alkes L. Price
1Harvard School of Public Health, Epidemiology Department, Boston, MA
2Broad Institute of MIT and Harvard, Cambridge, MA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Polygenic risk scores (PRS) based on European training data suffer reduced accuracy in non-European target populations, exacerbating health disparities. This loss of accuracy predominantly stems from LD differences, MAF differences (including population-specific SNPs), and/or causal effect size differences. PRS based on training data from the non-European target population do not suffer from these limitations, but are currently limited by much smaller training sample sizes. Here, we propose PolyPred, a method that improves cross-population polygenic prediction by combining two complementary predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing LD differences; and BOLT-LMM, a published predictor. In the special case where a large training sample is available in the non-European target population (or a closely related population), we propose PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. PolyPred and PolyPred+ require individual-level training data (for their BOLT-LMM component), but we also propose analogous methods that replace the BOLT-LMM component with summary statistic-based components if only summary statistics are available. We applied PolyPred to 49 diseases and complex traits in 4 UK Biobank populations using UK Biobank British training data (average N=325K), and observed statistically significant average relative improvements in prediction accuracy vs. BOLT-LMM ranging from +7% in South Asians to +32% in Africans (and vs. LD-pruning + P-value thresholding (P+T) ranging from +77% to +164%), consistent with simulations. We applied PolyPred+ to 23 diseases and complex traits in UK Biobank East Asians using both UK Biobank British (average N=325K) and Biobank Japan (average N=124K) training data, and observed statistically significant average relative improvements in prediction accuracy of +24% vs. BOLT-LMM and +12% vs. PolyPred. The summary statistic-based analogues of PolyPred and PolyPred+ attained similar improvements. In conclusion, PolyPred and PolyPred+ improve cross-population polygenic prediction accuracy, ameliorating health disparities.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This research was funded by NIH grants U01 HG009379, R37 MH107649, R01 MH101244 and R01 HG006399. MK is supported by a Nakajima Foundation Fellowship and the Masason Foundation. WJP is supported by an NWO Veni grant (91619152). ARM is supported by NIMH K99/R00MH117229. HKF is supported by Eric and Wendy Schmidt. AVK is supported by grants 1K08HG010155 and 1U01HG011719 from the National Human Genome Research Institute and a sponsored research agreement from IBM Research.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

UK Biobank: Collection of the UK Biobank (UKBB) data was approved by the UKBB's Research Ethics Committee. Approval to use UKBB individual-level in this work was obtained under application #16549. Biobank Japan: All the participants provided written informed consent approved by the ethics committees of RIKEN Center for Integrative Medical Sciences and the Institute of Medical Sciences at the University of Tokyo. Uganda-APCDR: As described previously in Asiki et al 2013, before all survey procedures including interviews, blood tests and sample storage for future use, written consent or assent in conjunction with parental/guardian consent for those less than 18 years of age, are obtained following Uganda National Council of Science and Technology (UNCST) guidelines. Written consent/assent is also obtained from participants on the use of their clinical records for research purposes. All study procedures including material transfer agreements are approved annually by the Uganda Virus Research Institute Science and Ethics Committee and the UNCST. A request to use of these deidentified data for this work (genetic data from EGAD00010000965 for genetic data and phenotype data via sftp with reference: DD_PK_050716 gwas_phenotypes_28Oct14.txt) via a Data Access Application for External Investigators was approved by the Data Access Committee for APCDR via and accessed through the European Genome-Phenome Archive.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • ↵* co-first authors

  • Added new figures, tables, secondary analyses, and an analysis of meta-analyzed summary statistics from the ENGAGE consortium. Also added PolyPred-S, PRS-CS, and PolyPred-P as new main methods in all of the experiments, figures, and tables.

Data Availability

PolyPred and PolyPred+ are provided as part of the open-source software package PolyFun, which is freely available at https://github.com/omerwe/polyfun. Access to the UK Biobank resource is available via application (http://www.ukbiobank.ac.uk/). PRS coefficients generated in this study are available for public download at http://data.broadinstitute.org/alkesgroup/polypred_results.

http://data.broadinstitute.org/alkesgroup/polypred_results

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 20, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Leveraging fine-mapping and non-European training data to improve cross-population polygenic risk scores
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Leveraging fine-mapping and non-European training data to improve cross-population polygenic risk scores
Omer Weissbrod, Masahiro Kanai, Huwenbo Shi, Steven Gazal, Wouter J. Peyrot, Amit V. Khera, Yukinori Okada, The Biobank Japan Project, Alicia R. Martin, Hilary Finucane, Alkes L. Price
medRxiv 2021.01.19.21249483; doi: https://doi.org/10.1101/2021.01.19.21249483
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Leveraging fine-mapping and non-European training data to improve cross-population polygenic risk scores
Omer Weissbrod, Masahiro Kanai, Huwenbo Shi, Steven Gazal, Wouter J. Peyrot, Amit V. Khera, Yukinori Okada, The Biobank Japan Project, Alicia R. Martin, Hilary Finucane, Alkes L. Price
medRxiv 2021.01.19.21249483; doi: https://doi.org/10.1101/2021.01.19.21249483

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (216)
  • Allergy and Immunology (495)
  • Anesthesia (106)
  • Cardiovascular Medicine (1096)
  • Dentistry and Oral Medicine (196)
  • Dermatology (141)
  • Emergency Medicine (274)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (502)
  • Epidemiology (9772)
  • Forensic Medicine (5)
  • Gastroenterology (481)
  • Genetic and Genomic Medicine (2313)
  • Geriatric Medicine (223)
  • Health Economics (462)
  • Health Informatics (1561)
  • Health Policy (736)
  • Health Systems and Quality Improvement (603)
  • Hematology (238)
  • HIV/AIDS (504)
  • Infectious Diseases (except HIV/AIDS) (11650)
  • Intensive Care and Critical Care Medicine (617)
  • Medical Education (238)
  • Medical Ethics (67)
  • Nephrology (257)
  • Neurology (2144)
  • Nursing (134)
  • Nutrition (337)
  • Obstetrics and Gynecology (427)
  • Occupational and Environmental Health (518)
  • Oncology (1180)
  • Ophthalmology (364)
  • Orthopedics (128)
  • Otolaryngology (220)
  • Pain Medicine (146)
  • Palliative Medicine (50)
  • Pathology (311)
  • Pediatrics (695)
  • Pharmacology and Therapeutics (300)
  • Primary Care Research (267)
  • Psychiatry and Clinical Psychology (2182)
  • Public and Global Health (4661)
  • Radiology and Imaging (778)
  • Rehabilitation Medicine and Physical Therapy (457)
  • Respiratory Medicine (624)
  • Rheumatology (274)
  • Sexual and Reproductive Health (226)
  • Sports Medicine (210)
  • Surgery (252)
  • Toxicology (43)
  • Transplantation (120)
  • Urology (94)