Abstract
Background Early diagnosis of structural heart disease improves patient outcomes, yet many remain underdiagnosed. While population screening with echocardiography is impractical, electrocardiogram (ECG)-based prediction models can help target high-risk patients. We developed a novel ECG-based machine learning approach to predict multiple structural heart conditions, hypothesizing that a composite model would yield higher prevalence and positive predictive values (PPVs) to facilitate meaningful recommendations for echocardiography.
Methods Using 2,232,130 ECGs linked to electronic health records and echocardiography reports from 484,765 adults between 1984-2021, we trained machine learning models to predict the presence of any of seven echocardiography-confirmed diseases within one year. This composite label included: moderate or severe valvular disease (aortic/mitral stenosis or regurgitation, tricuspid regurgitation), reduced ejection fraction <50%, or interventricular septal thickness >15mm. We tested various combinations of input features (demographics, labs, structured ECG data, ECG traces) and evaluated model performance using 5-fold cross-validation, multi-site validation trained on one clinical site and tested on 11 other independent sites, and simulated retrospective deployment trained on pre-2010 data and deployed in 2010.
Findings Our composite “rECHOmmend” model using age, sex and ECG traces had an area under the receiver operating characteristic curve (AUROC) of 0.91 and a PPV of 42% at 90% sensitivity at a prevalence of 17.9% for our composite label. Individual disease models had AUROCs ranging from 0.86-0.93 and lower PPVs from 1%-31%. The AUROC for models using different input features ranged from 0.80-0.93, increasing with additional features. Multi-site validation showed similar results to the cross-validation, with an aggregate AUROC of 0.91 across our independent test set of 11 clinical sites after training on a separate site. Our simulated retrospective deployment showed that for ECGs acquired in patients without pre-existing known structural heart disease in a single year, 2010, 11% were classified as high-risk, of which 41% developed true, echocardiography-confirmed disease within one year.
Interpretation An ECG-based machine learning model using a composite endpoint can predict previously undiagnosed, clinically significant structural heart disease while outperforming single disease models and improving practical utility with higher PPVs. This approach can facilitate targeted screening with echocardiography to improve under-diagnosis of structural heart disease.
Competing Interest Statement
Geisinger investigators (AUC, LJ, SR, JAR, DBR, JBL, CMH, RC) receive funding from Tempus for ongoing development of predictive modeling technology. Tempus and Geisinger have jointly applied for predictive modeling patents. None of the Geisinger investigators have ownership interest in any of the intellectual property resulting from the partnership. Tempus did not have any input in the design, execution, interpretation of results or decision to publish. JMP, NZ, GL, and BFK are Tempus employees. SRS is a consultant for Tempus. SRS is also an employee of physIQ and reports personal fees from Otsuka and Janssen, outside the submitted work. BKF reports personal fees from Novartis, outside the submitted work.
Funding Statement
This work is supported by a grant from Tempus
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Institutional Review Board of Geisinger approved this study with a waiver of consent
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
All intermediate, subgroup, and aggregate results are publicly available online as a searchable dashboard. Patient-level data are not available for the Geisinger data set. Requests for code or data can be made to the corresponding author.