Abstract
Methods to estimate polygenic scores (PGS) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived using seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling and target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well-tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes (β-coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best performing single methods when tuned with cross-validation). Our interactively browsable online-results (https://methodscomparison.intervenegeneticscores.org/) and open-source workflow prspipe (https://github.com/intervene-EU-H2020/prspipe) provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
Competing Interest Statement
M.I. is a trustee of the Public Health Genomics (PHG) Foundation, a member of the Scientific Advisory Board of Open Targets, and has a research collaboration with AstraZeneca PLC which is unrelated to this study. O.P. provides consultancy services for UCB pharma company.
Funding Statement
This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 101016775, the Hasso Plattner Foundation (HPF) and EMBL-EBI Core Funds. M.I. is supported by core funding from the British Heart Foundation (RG/18/13/33946) and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014; NIHR203312). Genes & Health is/has recently been core-funded by Wellcome (WT102627, WT210561), the Medical Research Council (UK) (M009017, MR/X009777/1, MR/X009920/1), Higher Education Funding Council for England Catalyst, Barts Charity (845/1796), Health Data Research UK (for London substantive site), and research delivery support from the NHS National Institute for Health Research Clinical Research Network (North Thames). Genes & Health is/has recently been funded by Alnylam Pharmaceuticals, Genomics PLC; and a Life Sciences Industry Consortium of Astra Zeneca PLC, Bristol-Myers Squibb Company, GlaxoSmithKline Research and Development Limited, Maze Therapeutics Inc, Merck Sharp & Dohme LLC, Novo Nordisk A/S, Pfizer Inc, Takeda Development Centre Americas Inc.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
This research has been conducted using the UK Biobank Resource under Application Number 78537. Ethics approval for the UK Biobank study was obtained from the North West Centre for Research Ethics Committee (11/NW/0382). The genotyping in Trondelag Health Study (HUNT) and work presented here was approved by the Regional Committee for Ethics in Medical Research, Central Norway (2014/144, 2018/1622, 2018/411492). The activities of the Estonian Biobank are regulated by the Human Genes Research Act, which was adopted in 2000 specifically for the operations of the Estonian Biobank. Individual level data analysis in the Estonia Biobank was carried out under ethical approval 1.1-12/624 from the Estonian Committee on Bioethics and Human Research (Estonian Ministry of Social Affairs), using data according to release application S22, document number 6-7/GI/16259 from the Estonian Biobank. Patients and control subjects in FinnGen provided informed consent for biobank research, based on the Finnish Biobank Act. Alternatively, separate research cohorts, collected prior the Finnish Biobank Act came into effect (in September 2013) and start of FinnGen (August 2017), were collected based on study-specific consents and later transferred to the Finnish biobanks after approval by Fimea (Finnish Medicines Agency), the National Supervisory Authority for Welfare and Health. Recruitment protocols followed the biobank protocols approved by Fimea. The Coordinating Ethics Committee of the Hospital District of Helsinki and Uusimaa (HUS) statement number for the FinnGen study is Nr HUS/990/2017. Ethics approval for Genes & Health was obtained from the London South East Research Ethics Committee (IRAS 146051).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The prspipe workflow used to generate polygenic score weights, perform polygenic scoring and ancestry matching is available on GitHub (https://github.com/intervene-EU-H2020/prspipe). Genotypes and linked healthcare records held in biobanks are controlled access data and are not publicly available. An application must be made to each biobank to gain access to the data. The 1000 genomes processed genotype data (HapMap3-1KG) are available on figshare (10.6084/m9.figshare.20802700). Non-sensitive experimental data exported from the biobanks are permissively licensed and deposited in an open data repository (https://zenodo.org/doi/10.5281/zenodo.10012995). Processed summary statistics are permissively licensed and hosted on GitHub and accessible through in an R data package (https://github.com/intervene-EU-H2020/pgsCompaR). A website containing an interactive results browser is permissively licensed and available on GitHub (https://github.com/intervene-EU-H2020/pgs-method-compare) hosted at https://methodscomparison.intervenegeneticscores.org/. Polygenic score weight files have been deposited in the PGS catalog under publication ID PGP000517 (https://www.pgscatalog.org/publication/PGP000517/).
https://github.com/intervene-EU-H2020/prspipe
https://github.com/intervene-EU-H2020/pgsCompaR