Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Significant Sparse Polygenic Risk Scores across 813 traits in UK Biobank

View ORCID ProfileYosuke Tanigawa, Junyang Qian, View ORCID ProfileGuhan Venkataraman, View ORCID ProfileJohanne Marie Justesen, View ORCID ProfileRuilin Li, View ORCID ProfileRobert Tibshirani, View ORCID ProfileTrevor Hastie, View ORCID ProfileManuel A. Rivas
doi: https://doi.org/10.1101/2021.09.02.21262942
Yosuke Tanigawa
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
4Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yosuke Tanigawa
  • For correspondence: mrivas{at}stanford.edu tanigawa{at}mit.edu
Junyang Qian
2Department of Statistics, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guhan Venkataraman
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guhan Venkataraman
Johanne Marie Justesen
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Johanne Marie Justesen
Ruilin Li
3Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ruilin Li
Robert Tibshirani
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
2Department of Statistics, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Robert Tibshirani
Trevor Hastie
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
2Department of Statistics, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Trevor Hastie
Manuel A. Rivas
1Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, United States
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Manuel A. Rivas
  • For correspondence: mrivas{at}stanford.edu tanigawa{at}mit.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 × 10−5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman’s ρ = 0.61, p = 2.2 × 10−59 for quantitative traits, ρ = 0.21, p = 9.6 × 10−4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).

Author summary Polygenic risk scores (PRSs), an approach to estimate genetic predisposition on disease liability by aggregating the effects across multiple genetic variants, has attracted increasing research interest. While there have been improvements in the predictive performance of PRS for some traits, the applicability of PRS models across a wide range of human traits has not been clear. Here, applying penalized regression using Batch Screening Iterative Lasso (BASIL) algorithm to more than 269,000 individuals of white British ancestry in UK Biobank, we systematically characterize PRS models across more than 1,500 traits. We report 813 traits with PRS models of statistically significant predictive performance. While the statistical significance does not necessarily directly translate into clinical relevance, we investigate the properties of the 813 significant PRS models and report a significant correlation between predictive performance and estimated SNP-based heritability. We find that the number of genetic variants selected in our sparse PRS model is significantly correlated with the incremental predictive performance in both quantitative and binary traits. Our transferability assessment of PRS models in UK Biobank revealed that the sparse PRS models trained on individuals of European ancestry had a lower predictive performance for individuals of African and Asian ancestry groups.

Competing Interest Statement

M.A.R is a consultant at MazeTx and is currently on leave at HiBio. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Funding Statement

This work has been supported by the Funai Foundation for Information Technology [to Y.T.]; Stanford University School of Medicine [to Y.T., R.L., and M.A.R.]; National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) [R01HG010140 to M.A.R.]; NIH center for Multi and Trans-ethnic Mapping of Mendelian and Complex Diseases [5U01 HG009080 to M.A.R]; NIH [5R01 EB 001988-21 to T.H., 5R01 EB001988-16 to R.T.]; and National Science Foundation (NSF) [DMS-1407548 to T.H., 19 DMS1208164 to R.T.]. The authors of this manuscript have received the following salary support: NHGRI of NIH [R01HG010140 to Y.T. and M.A.R., R01HG008155 to Y.T.], NIH [5U01 HG009080 to M.A.R.], and the National Institute on Aging of NIH [R01AG067151 to Y.T.]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding agencies; funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Based on the information provided in Protocol 44532, the Stanford IRB has determined that the research does not involve human subjects as defined in 45 CFR 46.102(f) or 21 CFR 50.3(g).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • We revised the manuscript based on the feedback from colleagues. The major changes in this revision are the following three points. 1) Given the feedback from colleagues, we removed the sentences that inappropriately mentioned genetic architecture in binary traits. We instead clarified a power difference between quantitative traits and binary traits. 2) Given the concerns regarding the lack of theoretical basis in using incremental ROC-AUC for assessing linear relationship (estimated SNP-based heritabilities and transferability assessment), we now use Nagelkerke's pseudo-R2 as the primary evaluation metric of predictive performance for binary traits in the current version of the manuscript. 3) As we change the evaluation metric for binary traits, we now observe a significant rank-based correlation between the effect size (incremental Nagelkerke's pseudo-R2) and the model size (number of genetic variants with non-zero coefficients) of the sparse PRS model.

Data Availability

The sparse PRS model weights generated from this study are available on the Global Biobank Engine (https://biobankengine.stanford.edu/prs). The significant PRS models are also available at the PGS catalog (https://www.pgscatalog.org/publication/PGP000244/ and https://www.pgscatalog.org/publication/PGP000128/, score IDs are listed in S1 Table). The BASIL algorithm implemented in the R snpnet package was used in the PRS analysis, which is available at https://github.com/rivas-lab/snpnet. The analyses presented in this study were based on data accessed through the UK Biobank: https://www.ukbiobank.ac.uk.

https://biobankengine.stanford.edu/prs

https://www.pgscatalog.org/publication/PGP000244/

https://www.pgscatalog.org/publication/PGP000128/

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted January 27, 2022.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Significant Sparse Polygenic Risk Scores across 813 traits in UK Biobank
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Significant Sparse Polygenic Risk Scores across 813 traits in UK Biobank
Yosuke Tanigawa, Junyang Qian, Guhan Venkataraman, Johanne Marie Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, Manuel A. Rivas
medRxiv 2021.09.02.21262942; doi: https://doi.org/10.1101/2021.09.02.21262942
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Significant Sparse Polygenic Risk Scores across 813 traits in UK Biobank
Yosuke Tanigawa, Junyang Qian, Guhan Venkataraman, Johanne Marie Justesen, Ruilin Li, Robert Tibshirani, Trevor Hastie, Manuel A. Rivas
medRxiv 2021.09.02.21262942; doi: https://doi.org/10.1101/2021.09.02.21262942

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (427)
  • Allergy and Immunology (753)
  • Anesthesia (220)
  • Cardiovascular Medicine (3281)
  • Dentistry and Oral Medicine (362)
  • Dermatology (274)
  • Emergency Medicine (478)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1164)
  • Epidemiology (13337)
  • Forensic Medicine (19)
  • Gastroenterology (896)
  • Genetic and Genomic Medicine (5128)
  • Geriatric Medicine (479)
  • Health Economics (781)
  • Health Informatics (3251)
  • Health Policy (1137)
  • Health Systems and Quality Improvement (1189)
  • Hematology (427)
  • HIV/AIDS (1012)
  • Infectious Diseases (except HIV/AIDS) (14611)
  • Intensive Care and Critical Care Medicine (908)
  • Medical Education (475)
  • Medical Ethics (126)
  • Nephrology (521)
  • Neurology (4898)
  • Nursing (261)
  • Nutrition (725)
  • Obstetrics and Gynecology (879)
  • Occupational and Environmental Health (795)
  • Oncology (2515)
  • Ophthalmology (722)
  • Orthopedics (280)
  • Otolaryngology (346)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (537)
  • Pediatrics (1297)
  • Pharmacology and Therapeutics (548)
  • Primary Care Research (554)
  • Psychiatry and Clinical Psychology (4189)
  • Public and Global Health (7482)
  • Radiology and Imaging (1700)
  • Rehabilitation Medicine and Physical Therapy (1010)
  • Respiratory Medicine (979)
  • Rheumatology (478)
  • Sexual and Reproductive Health (495)
  • Sports Medicine (424)
  • Surgery (545)
  • Toxicology (71)
  • Transplantation (235)
  • Urology (203)