A combined polygenic score of 21,293 rare and 22 common variants significantly improves diabetes diagnosis based on hemoglobin A1C levels

Peter Dornbos; Ryan Koesterer; Andrew Ruttenburg; Joanne B. Cole; AMP-T2D-GENES Consortia; Aaron Leong; James B. Meigs; Jose C. Florez; Jerome I. Rotter; Miriam S. Udler; Jason Flannick

doi:10.1101/2021.11.04.21265868

Abstract

Polygenic scores (PS), constructed from the combined effects of many genetic variants¹, have been shown to predict risk or treatment strategies for certain common diseases^2–6. As most PS to date are based on common variants⁷, the benefit of adding rare variation to PS remains largely unknown and methodically challenging. We developed and validated a method for constructing a rare variant PS and applied it to a previously identified clinical scenario, in which genetic variants modify the hemoglobin A1C (HbA1C) threshold recommended for type 2 diabetes (T2D) diagnosis^{6, 8–10}. The resultant rare variant PS is highly polygenic (21,293 variants across 144 genes), depends on ultra-rare variants (72.7% of variants observed in <3 people), and identifies significantly more undiagnosed T2D cases than expected by chance (OR=2.71, p=1.51×10^-6). A model combining the rare variant PS with a previously published common variant PS⁶ is expected to identify 4.9M misdiagnosed T2D cases in the USA, nearly 1.5-fold more than the common variant PS alone. These results provide a method for constructing complex trait PS from rare variants and suggest that rare variants will augment common variants in precision medicine approaches for common disease.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This project was supported by R01DK125490 and UM1DK105554. JBC is supported by a NIDDK Pathway to Independence Award (K99DK127196). AL was supported by Grant 2020096 from the Doris Duke Charitable Foundation. JBM was supported by NIH R01 DK078616 and R01HL151855. JCF was supported by NHLBI K24 HL157960. JIR was supported by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. Infrastructure for the CHARGE Consortium is supported in part by the National Heart, Lung, and Blood Institute (NHLBI) grant R01HL105756. Also supported in part by the National Institutes of Health, National Heart, Lung, Long and Blood Institute (NHLBI) contract 1R01HL151855 and the National Institute of Diabetes and Digestive and Kidney Diseases contract UM1DK078616. MSU was supported by K23DK114551.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study involves only openly available human data, which can be obtained from (a) the NCBI database of genotypes and phenotypes (dbGaP) or (b) through the UK Biobank.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

All data analyzed are available via (a) NCBI database of genotypes and phenotypes (dbGaP) or (b) through the UK Biobank. Genetic association summary statistics produced are available through the common metabolic diseases knowledge portal.

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.