The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation

We present the Polygenic Score (PGS) Catalog (https://www.PGSCatalog.org), an open resource of published scores (including variants, alleles and weights) and consistently curated metadata required for reproducibility and independent applications. The PGS Catalog has capabilities for user deposition, expert curation and programmatic access, thus providing the community with a platform for PGS dissemination, research and translation.

B y aggregating the effects of many genetic variants into a single number, PGSs have emerged as a method to predict an individual's genetic predisposition to a phenotype [1][2][3][4] . Early studies indicated that combining allelic counts of genome-wide association study (GWAS)-significant variants in individuals is predictive of the phenotype [5][6][7][8] . Owing to larger and more powerful GWAS, recent PGSs typically comprise hundreds to millions of trait-associated genetic variants, which are combined by using a weighted sum of allele doses multiplied by their corresponding effect sizes.
Many PGSs have been developed and demonstrated to be predictive of common complex traits (for example, body mass index (BMI) 9 , blood lipids 10 and educational attainment 11 ). Similarly, PGSs for various diseases have been shown to be predictive of disease incidence, defining marked increases in risk over the life course or at earlier ages for people with high PGSs (for example, coronary artery disease 12,13 , breast cancer 14 and schizophrenia 15 ). Existing risk-prediction models using traditional risk factors can be improved by incorporating PGSs 12,16,17 . In some cases, PGSs may be the most informative risk factors in presymptomatic individuals 1,18 and, for some diseases, may be independent of a family history of the condition [19][20][21][22] . Other potential clinical uses of PGSs include prediction of prognosis, etiology and disease subtypes 23 ; stratification of patients according to therapeutic benefit; and identification of new disease biomarkers and drug targets 24 . Given their multiple applications, many PGSs have been developed, and more than 1,000 related articles have been indexed in PubMed since 2009.
There is widespread variability in PGS research, even regarding nomenclature: the scores can be referred to as genetic or genomic scores, and as polygenic risk scores (PRSs) or genomic risk scores (GRS) if they predict a discrete phenotype (such as a disease) 25 . Many approaches also exist to derive PGSs by using individual-level genotype data or GWAS summary statistics 26 . The goals of most computational methods are to select the most predictive set of variants in the score, and to adjust their weights to maximize the predictive ability and account for linkage disequilibrium between variants.

the need for an open resource of polygenic scores
Multiple barriers hinder progress in PGS research and the translation of PGSs into healthcare settings. The lack of best practices and standards, particularly regarding PGS reporting, is a major issue identified by our group and others 25,27 . Reproducibility has been hampered by underreporting of key PGS information: approximately 40% of 231 publications developing new PGSs that we reviewed during our curation efforts did not include adequate variant information (for example, chromosomal location, effect allele and weight) to calculate the PGSs for new samples, thus limiting the utility and reusability of the scores.
Beyond the information necessary for PGS calculation, a complete understanding of a score's ability to accurately predict its target trait (also known as analytic validity) is necessary to help evaluate clinical utility and enable other applications of PGSs. However, the performance metrics reported for existing PGSs are conditional on study design, participant demographics, case definitions and the covariates adjusted for in the original studies' models. Although few direct evaluations of PGSs have been conducted, benchmarking of multiple PGSs for the same trait in external data provides directly comparable performance metrics 28 needed to decide which PGS has the best performance for a particular task and how its predictive ability varies in response to changes in important factors, such as ancestry 29 . Because PGSs are based on data and cohorts composed of largely European-ancestry individuals, there is a well-characterized underperformance of PGSs when they are applied to non-European-ancestry individuals; thus, the transferability of PGS performance is a particularly important challenge that could lead to health disparities [30][31][32] .
Here, we present the PGS Catalog, an open resource of published PGSs, including full scoring information annotated with expertly curated metadata required for accurate application and evaluation. The PGS Catalog promotes PGS reproducibility by providing a venue to annotate and distribute scores according to current exemplar reporting standards. As such, it allows users to reuse and evaluate PGSs, to firmly establish their predictive ability and facilitate further investigations of clinical utility.

Development of the PGs catalog
The aim of the PGS Catalog is to index and distribute the key aspects of each PGS (underlying variants, results and experimental design) in a standardized representation, to facilitate evaluation of analytic validity. To maximize usability, the data representation and database were designed to be findable, accessible, interoperable and reusable (FAIR) according to established principles for scientific data management 33 (Supplementary Table 1).
To define the key information that would need to be captured in the PGS Catalog, we undertook an initial literature review comment Nature GeNetics | VOL 53 | APrIL 2021 | 416-425 | www.nature.com/naturegenetics of publications that developed PGSs for the following traits and diseases, according to their potential clinical utility and public-health burden of disease: coronary artery disease, diabetes (types 1 and 2), obesity/BMI, breast cancer, prostate cancer and Alzheimer's disease. During our review, we took note of how the PGSs were described, how they differed between studies and traits, and the most common study designs and PGS evaluation scenarios. To capture common aspects of PGS studies, we built upon the NHGRI-EBI GWAS Catalog's established frameworks to catalog published data from genomic studies, by using accepted conventions for representing sample ancestry 34 , variant and trait information 35 . Using our survey and established frameworks, we defined four major data objects: scores, samples, performance metrics and publications (Box 1 and Supplementary Table 2). These objects describe the common PGS development and evaluation processes (Fig. 1a), and can be used to capture the detailed data elements necessary to evaluate PGS development and performance. High consistency and accuracy across curated data were ensured by developing detailed curation guidelines, inclusion criteria and data acquisition methods (outlined in the Supplementary Note).
To ensure that the PGS Catalog contains the information necessary to describe and evaluate PGSs, we collaborated with the ClinGen Complex Disease working group, composed of experts in epidemiology, statistics, implementation science and the actionability of genetic results, as well as those with disease-domain-specific knowledge and interests in PRS application. Together, we developed the Polygenic Risk Score Reporting Standards (PRS-RS) 25 , a joint statement describing a set of reporting items that should be described in studies developing and evaluating PRSs. The PGS Catalog captures the data required by the PRS-RS to assess PGS validity while also being flexible enough to capture multiple different study designs and evaluation scenarios in a structured database. The PGS Catalog therefore provides a venue to index PGS analyses and maximize uptake of these reporting standards.

the PGs catalog: data content, access and expansion
Any published or preprinted PGS can be added to the PGS Catalog, provided that it has (1) established analytic validity in external samples not used for score development and (2) the information necessary to calculate the score (additional details in Supplementary Note). To populate the PGS Catalog, we screened more than 275 publications for eligibility, 162 of which presented sufficient data for curation and inclusion in the Catalog. As of December 2020, the PGS Catalog contains 657 consistently annotated PGSs curated from 119 publications (with the earliest published in 2008). These PGSs predict a wide variety of diseases (for example, cardiovascular diseases, different types of cancer, schizophrenia and major depressive disorder), as well as anatomical (for example, BMI and bone density), cellular (for example, blood cell phenotypes and counts) and molecular (for example, serum urate, cholesterol and triglyceride levels) traits and measurements, encompassing 156 unique mapped ontology terms. Currently, most PGSs included in the Catalog were developed in European-ancestry individuals; however, 11 PGSs were both developed and evaluated in individuals of non-European ancestry. To assess external validity, the Catalog also indexes the results of evaluations of existing PGSs in new contexts (for example, direct comparisons of multiple PGSs on the same sample); 13 of these benchmarking publications evaluating 14 existing PGSs are also included in the current release of the PGS Catalog. Of the 119 publications,

Box 1 | Descriptions of PGs catalog objects and metadata
Individual reporting items are described field by field in Supplementary Table 2. Scores (for example, PGSs, PRSs or GRSs) are the main data object type in the PGS Catalog, are linked to all other objects internally, and can be cited or externally linked to through a persistent identifier (for example, PGS000018). Each PGS has a PGS scoring file-a flat text file in a consistent format (Supplementary Note), which contains the variant-level information necessary to calculate the score on new data (minimally the genome build, rsID or chromosomal positions, effect alleles and their weights). The PGS is also annotated with information about the phenotype that it predicts (reported trait) and is mapped to Experimental Factor Ontology terms 41,42 to consistently annotate related scores and facilitate data linkage and searching. Information describing the computational algorithms (for example, independent GWAS variants, pruning/ clumping and thresholding, or LDpred) and parameters (for example, P-value and linkage-disequilibrium (r 2 ) thresholds) used during score development are also recorded for each score. The GWAS summary statistics used to derive the PGS, if any, are linked as sample objects and further linked to the GWAS Catalog if applicable 35 , and any other datasets used for training are also linked as sample objects.
Samples are described with detailed information to enable the interpretation and assessment of the validity of a PGS. Sample size (stratified by cases and controls if dichotomous) and participant ancestry are described by using frameworks identical to those in the GWAS Catalog 34 to enable the systematic tracking of participant diversity in PGSs 32 . To facilitate reproducible analyses, phenotyping descriptions (for example, case definition, International Classification of Diseases 9/10 codes and measurement methods), the sex distribution, and the distributions of participant ages and follow-up times for prospective study designs can also be recorded. To ensure that PGSs are not evaluated on individuals who contributed to the original GWAS or PGS training cohorts, samples can be annotated with existing cohort names 43 . Groups of samples used to evaluate a PGS are given a Sample Set ID.
Performance metrics assess the validity of a PGS in a Sample Set independently of the samples used for score development. Common metrics include standardized effect sizes (odds ratios or hazard ratios, and regression coefficients (β)), classification accuracy metrics (for example, area under the receiver operating characteristic curve, C-index and area under the precision-recall curve), but other relevant metrics (for example, calibration (χ 2 )) can also be recorded. The covariates used in the model (most commonly age, sex and genetic principal components to account for the population structure) are also recorded for each set of metrics. Multiple PGSs can be evaluated on the same sample set and further indexed as directly comparable performance metrics.
Publications provide provenance information for scores and performance metrics (including those from external evaluations of existing PGSs). Both journal articles and preprints can be indexed through either the doi number or PubMed ID.

Predicted trait
Mapped to EFO terms

Predicted trait
Linked to phenotyping definition and methods PGS scoring file (rsID/location, effect alleles and weights) Fig. 1 | common aspects of PGs analyses that are captured and displayed in the PGs catalog. a, PGS analyses can broadly be described in two stages: determining the set of variants and weights that will predict a trait of interest (score development) and an evaluation of how predictive the PGS is in an external set of samples (PGS evaluation). Major data items (Box 1) that can be queried and browsed in the PGS Catalog are highlighted as colored boxes and linked to metadata items that are recorded. b,c, Examples of how PGS metadata are displayed for each score on https://www.PGSCatalog.org (example score PGS000013; ref. 13 ), including score details, contributing samples and score development/training (b) and performance metrics and evaluated samples (c). Sections are highlighted with colored bars corresponding to the data objects that they display in a. comment Nature GeNetics | VOL 53 | APrIL 2021 | 416-425 | www.nature.com/naturegenetics on each score's page (annotated example in Fig. 1b). Pages describing traits with available PGSs, and the scores developed and evaluated within each publication can also be viewed (Supplementary Fig. 1). Trait pages display any PGSs associated with subtraits by default (for example, scores for all breast cancer subtypes are displayed on the breast cancer trait page), and higher-level disease and trait categories are accessible via our ontology-enriched search ( Supplementary Fig. 2). Links to relevant study pages in the GWAS Catalog are included for any score developed by using cataloged GWAS data. Navigation from the GWAS Catalog to relevant data in the PGS Catalog is supported through links on the publication, study and trait pages in the GWAS Catalog.

Covariates included in the model
Each PGS in the Catalog is provided as a scoring file containing a header describing the provenance of the score, and consistently formatted columns describing the variants, alleles and weights. The scoring file can be used in conjunction with common tools to calculate the PGS (for example, PLINK 36 ). The metadata and scoring files can be downloaded alone or in bulk from our website and FTP server; programmatic access to the database is also available through a RESTful API (complete implementation and scoring file details in the Supplementary Note). Importantly, the PGS Catalog provides users with a source of existing published PGSs that can be directly applied to their own data, thus making results obtained by using the same score comparable across users and use cases, and circumventing the need to develop a new PGS for every application.
The Catalog identifies new articles from a manual literature search and user submissions, which subsequently undergo curation before their inclusion (Supplementary Note). Data curation and submission have been designed around a flexible template that allows common PGS development and evaluation details and results to be described according to our reporting items, and the template and PGS can be submitted directly to the Catalog for inclusion after validation by curators. Authors of PGS studies are encouraged to submit new PGSs as well as the results of subsequent PGS validations for indexing (by e-mail to pgs-info@ebi.ac.uk; more information at https://www.PGSCatalog. org/submit), to grow the Catalog for the community, maximize the utility of their PGSs and support reproducibility.

Generating comparable PGs performance metrics
Where multiple PGSs have been cataloged for a trait of interest, a complete understanding of the predictive ability of each PGS would be useful for deciding which score is best for a user's particular application. However, the performance metrics of PGSs are not directly comparable (owing to differences in samples or cohorts, covariates and study design) and have usually been measured in only a single ancestry group. To demonstrate how the PGS Catalog can be used to systematically compare PGS performance, we measured the performance of nine PGSs for colorectal cancer in people of European, South Asian and African ancestries in the UK Biobank (UKB) 37 , a dataset external to the development of all scores (methods described in Supplementary Note; cohort described in Supplementary  Table 3). For each ancestry group, each PGS was evaluated by using the standardized effect size of the PGS (odds ratio/hazard ratio per s.d. increase in PGS) and changes in classification accuracy (area under the receiver operating characteristic curve and C-index) as performance metrics ( Fig. 2 and Supplementary Fig. 3). Eight of the nine scores were predictive of colorectal cancer in European ancestries in the UKB to varying degrees, and the magnitudes of the effect sizes for two PGSs were similar to that previously reported ( Supplementary  Fig. 3). The score not significantly predictive of colorectal cancer in European-ancestry participants (PGS000151) comprised only 14 variants, and its predictive ability in Europeans had not previously been evaluated. Although the majority of scores were predictive of colorectal cancer in the 409,253 European-ancestry participants, the PGSs were largely not predictive of disease risk in the 6,086 participants of South Asian ancestry and 5,984 participants of African ancestry (together composing approximately 8% of all UKB participants; Supplementary Table 3); these data further illustrate that some PGSs developed by using European-biased GWAS data have lesser predictive ability and may not be valid in people of non-European ancestry 30,32 . The PGS Catalog will continue to curate and generate PGS benchmarking data from participants with diverse ancestries to provide a more comprehensive understanding of PGS performance, which is necessary to prioritize the best PGS for a particular application.

conclusions and future developments
The PGS Catalog is a publicly available resource of published PGSs. The Catalog  39 , and it serves the community by providing a platform for PGS distribution and research. We hope to facilitate reproducible PGS analyses by working with others toward developing standard formats and content of scoring files, and providing new tools (such as for validation and scoring) to support this aim. For instance, to address a common user request, we will harmonize variants in PGS scoring files to frequently used genome builds (GRCh37 and 38). PGS reproducibility must also ensure that calculations are valid and consistent, with minimal variability across users. According to community need, we intend to provide reference-sample calculations and population distributions similar to those for clinical tests. These enhancements will facilitate systematic and external PGS benchmarking studies, which are key in evaluating the validity of existing PGSs.
As PGSs increase in number, and the diversity of the phenotypes that they predict increases, we will continue to grow the Catalog. We will be developing an interface to support author submission of developed and evaluated PGSs, providing an accession ID at the point of submission to enable citation. We are also working toward an improved literature search to reliably identify published PGSs. This search will be used to provide a regularly updated comprehensive index of PGS publications in the Catalog, with a page for each publication regardless of data availability; a more systematic resource will better highlight gaps in the field (for example, a lack of PGSs developed for participants of non-European ancestry) 32 . Our evaluation of PGS performance across ancestries further underscores the need to develop, share and evaluate PGSs in diverse populations to avoid subsequent health disparities and inaccurate interpretations from studies using PGSs to evaluate differences among populations 30,40 . Methodological improvements and more diverse participation in biobanks are likely to overcome these limitations and ideally empower future applications of PGSs in research and clinical settings.
While curating publications for the Catalog, we encountered numerous barriers to making PGS data available. These included restrictions on sharing the variants and weights in a PGS, owing to commercial interests, the need to accept terms and conditions to access the underlying GWAS summary statistics, or researchers not sharing the PGS and indicating that the full summary statistics were sufficient even if the PGS was based on filtered or reweighted variants. We hope that researchers developing PGSs in the future will consider the need to share their data and PGSs, and adopt reporting standards (PRS-RS 25 ) to enable reproducibility as well as subsequent applications and translation of the scores that they have developed. We encourage all researchers, funders and publishers to promote data sharing and submission so that the PGS Catalog can provide a comprehensive resource for the community. ❐