PT  - JOURNAL ARTICLE
AU  - Smail, Craig
AU  - Ferraro, Nicole M.
AU  - Durrant, Matthew G.
AU  - Rao, Abhiram S.
AU  - Aguirre, Matthew
AU  - Li, Xin
AU  - Gloudemans, Michael J.
AU  - Assimes, Themistocles L.
AU  - Kooperberg, Charles
AU  - Reiner, Alexander P.
AU  - Hui, Qin
AU  - Huang, Jie
AU  - O’Donnell, Christopher J.
AU  - Sun, Yan V.
AU  - ,
AU  - Rivas, Manuel A.
AU  - Montgomery, Stephen B.
TI  - Integration of rare large-effect expression variants improves polygenic risk prediction
AID  - 10.1101/2020.12.02.20242990
DP  - 2020 Jan 01
TA  - medRxiv
PG  - 2020.12.02.20242990
4099  - http://medrxiv.org/content/early/2020/12/11/2020.12.02.20242990.short
4100  - http://medrxiv.org/content/early/2020/12/11/2020.12.02.20242990.full
AB  - Polygenic risk scores (PRS) aim to quantify the contribution of multiple genetic loci to an individual’s likelihood of a complex trait or disease. However, existing PRS estimate genetic liability using common genetic variants, excluding the impact of rare variants. We identified rare, large-effect variants in individuals with outlier gene expression from the GTEx project and then assessed their impact on PRS predictions in the UK Biobank (UKB). We observed large deviations from the PRS-predicted phenotypes for carriers of multiple outlier rare variants; for example, individuals classified as “low-risk” but in the top 1% of outlier rare variant burden had a 6-fold higher rate of severe obesity. We replicated these findings using data from the NHLBI Trans-Omics for Precision Medicine (TOPMed) biobank and the Million Veteran Program, and demonstrated that PRS across multiple traits will significantly benefit from the inclusion of rare genetic variants.Competing Interest StatementThe authors have declared no competing interest.Funding StatementCS is supported by NIH grant T32LM012409. NMF is supported by a National Science Foundation Graduate Research Fellowship (grant number DGE 1656518) and a graduate fellowship from the Stanford Center for Computational, Evolutionary and Human Genomics. MGD is supported by a National Science Foundation Graduate Research Fellowship. MA is supported by the National Library of Medicine under training grant T15LM007033. XL is supported by the National Natural Science Foundation of China (grant number 31970554), National Key R&amp;amp;D Program of China (grant number 2019YFC1315804) and Shanghai Municipal Science and Technology Major Project (grant number 2017SHZDZX01). MJG is supported by a Stanford Graduate Fellowship. MAR is partially supported by Stanford University and a National Institute of Health center for Multi- and Trans-ethnic Mapping of Mendelian and Complex Diseases grant (5U01 HG009080) and partially supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) under award R01HG010140. SBM is supported by NIH grants U01HG009431, R01HL142015, R01HG008150, R01AG066490 and U01HG009080. The WHI program is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, U.S. Department of Health and Human Services through contracts HHSN268201600018C, HHSN268201600001C, HHSN268201600002C, HHSN268201600003C, and HHSN268201600004C. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). This research is also supported by funding from the Department of Veterans Affairs Office of Research and Development, Million Veteran Program (MVP) Grant I01-BX003340 and I01-BX003362.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Based on the information provided in Protocol 44532 the Stanford University IRB has determined that the research does not involve human subjects as defined in 45 CFR 46.102(f) or 21 CFR 50.3(g).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesGTEx (v7) RNA-seq and WGS data is available from dbGaP (dbGaP Accession phs000424.v7.p2) GTEx (v7) eQTL summary statistics were downloaded from the GTEx Portal available at https://gtexportal.org/home/datasets Data from the TOPMed Women&#039;s Health Initiative is available from dbGaP (dbGaP Accession phs001237) UK Biobank (UKB) data was obtained under application number 24983 (PI: Dr. Manuel Rivas) UKB Phase 1 GWAS summary statistics were downloaded from the Neale Lab server available at http://www.nealelab.is/uk-biobank Polygenic risk scores (PRS) for body mass index and type-2 diabetes were downloaded from the Cardiovascular Disease Knowledge Portal available at http://kp4cd.org/dataset_downloads/mi Gene annotation data was obtained from GENCODE (version 19) available at https://www.gencodegenes.org/human/release_19.html Allele frequency data was obtained from gnomAD (version r2.0.2) available at https://console.cloud.google.com/storage/browser/gnomad-public/release/2.0.2/ hg19 coordinates were converted to hg38 using the chain file available at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/liftOver/ Custom scripts to conduct all analyses not performed using existing software can be found at https://github.com/csmail/outlier_prs