ABSTRACT
Background Heart failure is a major cause of death globally and earlier initiation of treatment could mitigate disease progression. Multiple efforts have been made using genome-wide association studies (GWAS) or electronic health records (EHR) to identify individuals at high risk of heart failure (HF). However, integrating both sources using novel natural language processing (NLP) techniques and large scale global genetic predictors into heart failure prediction models has not been evaluated.
Objectives The study aimed to improve the accuracy of HF prediction by integrating GWAS- and EHR-derived risk scores.
Methods We previously performed the largest HF GWAS to date within the Global Biobank Meta-analysis Initiative, which includes 974,174 samples (51,274 cases; 5%) from 9 biobanks across the world, to create a polygenic risk score (PRS). Next, to extract information from the Michigan Medicine high-dimensional EHR (N=61,849 subjects), we treated diagnosis codes as ‘words’ and applied NLP on the data. NLP was used to learn code co-occurrence patterns and extract 350 latent phenotypes (low-dimensional features) representing 29,346 EHR codes. Next, we regressed HF on the latent phenotypes in an independent cohort and the coefficients were used as the weights to calculate a clinical risk score (ClinRS). Model performances were compared between baseline (age and sex) model and three models with risk scores added: 1) PRS, 2) ClinRS, and 3) PRS+ClinRS, using 10-fold cross validated Area Under the Receiver Operating Characteristic Curve (AUC).
Results Our results show that PRS and ClinRS are each, separately, able to predict HF outcomes significantly better than the baseline model, up to eight years prior to HF diagnosis. Higher AUC (95% CI) were observed in the PRS model (0.76 [0.74-0.78]) and ClinRS model (0.77 [0.74-0.79]), compared to the baseline model (0.71 [0.68-0.73]). Moreover, by including both PRS and ClinRS in the model, we achieved superior performance in predicting HF up to ten years prior to HF diagnosis (AUC: 0.79 [0.77-0.82]), 2-3 years earlier than using either single risk predictor alone.
Conclusions We demonstrate the additive power of integrating GWAS- and EHR-derived risk scores to predict HF cases prior to diagnosis. Clinical application of this approach may allow identification of patients with higher susceptibility to HF and enable preventive therapies to be initiated at an earlier stage.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by the National Institutes of Health grants R35-HL135824 and R01-GM139926.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB of University of Michigan gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
↵* Senior author
Data Availability
All data produced in the present study are available upon reasonable request to the authors