Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Integrating large scale genetic and clinical information to predict cases of heart failure

View ORCID ProfileKuan-Han H. Wu, Nicholas J. Douville, Xianshi Yu, Michael R. Mathis, Sarah E. Graham, Global Biobank Meta-analysis Initiative (GBMI), Ida Surakka, Whitney E. Hornsby, View ORCID ProfileCristen J. Willer, Xu Shi
doi: https://doi.org/10.1101/2022.07.19.22277830
Kuan-Han H. Wu
1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kuan-Han H. Wu
Nicholas J. Douville
1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
2Department of Anesthesiology, Michigan Medicine, Ann Arbor, Michigan, USA
3Institute of Healthcare Policy & Innovation, University of Michigan, Ann Arbor, Michigan, USA
MD, PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xianshi Yu
4Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael R. Mathis
1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
2Department of Anesthesiology, Michigan Medicine, Ann Arbor, Michigan, USA
3Institute of Healthcare Policy & Innovation, University of Michigan, Ann Arbor, Michigan, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sarah E. Graham
5Department of Internal Medicine, Michigan Medicine, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ida Surakka
5Department of Internal Medicine, Michigan Medicine, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Whitney E. Hornsby
5Department of Internal Medicine, Michigan Medicine, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cristen J. Willer
1Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
5Department of Internal Medicine, Michigan Medicine, Ann Arbor, Michigan, USA
6Department of Human Genetics, University of Michigan, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cristen J. Willer
Xu Shi
4Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: shixu@umich.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Background Heart failure is a major cause of death globally and earlier initiation of treatment could mitigate disease progression. Multiple efforts have been made using genome-wide association studies (GWAS) or electronic health records (EHR) to identify individuals at high risk of heart failure (HF). However, integrating both sources using novel natural language processing (NLP) techniques and large scale global genetic predictors into heart failure prediction models has not been evaluated.

Objectives The study aimed to improve the accuracy of HF prediction by integrating GWAS- and EHR-derived risk scores.

Methods We previously performed the largest HF GWAS to date within the Global Biobank Meta-analysis Initiative, which includes 974,174 samples (51,274 cases; 5%) from 9 biobanks across the world, to create a polygenic risk score (PRS). Next, to extract information from the Michigan Medicine high-dimensional EHR (N=61,849 subjects), we treated diagnosis codes as ‘words’ and applied NLP on the data. NLP was used to learn code co-occurrence patterns and extract 350 latent phenotypes (low-dimensional features) representing 29,346 EHR codes. Next, we regressed HF on the latent phenotypes in an independent cohort and the coefficients were used as the weights to calculate a clinical risk score (ClinRS). Model performances were compared between baseline (age and sex) model and three models with risk scores added: 1) PRS, 2) ClinRS, and 3) PRS+ClinRS, using 10-fold cross validated Area Under the Receiver Operating Characteristic Curve (AUC).

Results Our results show that PRS and ClinRS are each, separately, able to predict HF outcomes significantly better than the baseline model, up to eight years prior to HF diagnosis. Higher AUC (95% CI) were observed in the PRS model (0.76 [0.74-0.78]) and ClinRS model (0.77 [0.74-0.79]), compared to the baseline model (0.71 [0.68-0.73]). Moreover, by including both PRS and ClinRS in the model, we achieved superior performance in predicting HF up to ten years prior to HF diagnosis (AUC: 0.79 [0.77-0.82]), 2-3 years earlier than using either single risk predictor alone.

Conclusions We demonstrate the additive power of integrating GWAS- and EHR-derived risk scores to predict HF cases prior to diagnosis. Clinical application of this approach may allow identification of patients with higher susceptibility to HF and enable preventive therapies to be initiated at an earlier stage.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the National Institutes of Health grants R35-HL135824 and R01-GM139926.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

IRB of University of Michigan gave ethical approval for this work.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • ↵* Senior author

Data Availability

All data produced in the present study are available upon reasonable request to the authors

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted July 22, 2022.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Integrating large scale genetic and clinical information to predict cases of heart failure
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Integrating large scale genetic and clinical information to predict cases of heart failure
Kuan-Han H. Wu, Nicholas J. Douville, Xianshi Yu, Michael R. Mathis, Sarah E. Graham, Global Biobank Meta-analysis Initiative (GBMI), Ida Surakka, Whitney E. Hornsby, Cristen J. Willer, Xu Shi
medRxiv 2022.07.19.22277830; doi: https://doi.org/10.1101/2022.07.19.22277830
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Integrating large scale genetic and clinical information to predict cases of heart failure
Kuan-Han H. Wu, Nicholas J. Douville, Xianshi Yu, Michael R. Mathis, Sarah E. Graham, Global Biobank Meta-analysis Initiative (GBMI), Ida Surakka, Whitney E. Hornsby, Cristen J. Willer, Xu Shi
medRxiv 2022.07.19.22277830; doi: https://doi.org/10.1101/2022.07.19.22277830

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Cardiovascular Medicine
Subject Areas
All Articles
  • Addiction Medicine (269)
  • Allergy and Immunology (549)
  • Anesthesia (135)
  • Cardiovascular Medicine (1749)
  • Dentistry and Oral Medicine (238)
  • Dermatology (172)
  • Emergency Medicine (310)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (654)
  • Epidemiology (10785)
  • Forensic Medicine (8)
  • Gastroenterology (584)
  • Genetic and Genomic Medicine (2935)
  • Geriatric Medicine (286)
  • Health Economics (531)
  • Health Informatics (1919)
  • Health Policy (833)
  • Health Systems and Quality Improvement (743)
  • Hematology (290)
  • HIV/AIDS (627)
  • Infectious Diseases (except HIV/AIDS) (12501)
  • Intensive Care and Critical Care Medicine (684)
  • Medical Education (299)
  • Medical Ethics (86)
  • Nephrology (322)
  • Neurology (2785)
  • Nursing (150)
  • Nutrition (431)
  • Obstetrics and Gynecology (556)
  • Occupational and Environmental Health (597)
  • Oncology (1458)
  • Ophthalmology (441)
  • Orthopedics (172)
  • Otolaryngology (255)
  • Pain Medicine (190)
  • Palliative Medicine (56)
  • Pathology (380)
  • Pediatrics (865)
  • Pharmacology and Therapeutics (362)
  • Primary Care Research (334)
  • Psychiatry and Clinical Psychology (2633)
  • Public and Global Health (5342)
  • Radiology and Imaging (1004)
  • Rehabilitation Medicine and Physical Therapy (595)
  • Respiratory Medicine (724)
  • Rheumatology (329)
  • Sexual and Reproductive Health (289)
  • Sports Medicine (278)
  • Surgery (327)
  • Toxicology (47)
  • Transplantation (149)
  • Urology (125)