TY - JOUR T1 - Dense phenotyping from electronic health records enables machine-learning-based prediction of preterm birth JF - medRxiv DO - 10.1101/2020.07.15.20154864 SP - 2020.07.15.20154864 AU - Abin Abraham AU - Brian Le AU - Idit Kosti AU - Peter Straub AU - Digna R. Velez-Edwards AU - Lea K. Davis AU - J. M. Newton AU - Louis J. Muglia AU - Antonis Rokas AU - Cosmin A. Bejan AU - Marina Sirota AU - John A. Capra Y1 - 2022/01/01 UR - http://medrxiv.org/content/early/2022/03/07/2020.07.15.20154864.abstract N2 - Identifying pregnancies at risk for preterm birth, one of the leading causes of worldwide infant mortality, has the potential to improve prenatal care. However, we lack broadly applicable methods to accurately predict preterm birth risk. The dense longitudinal information present in electronic health records (EHRs) is enabling scalable and cost-efficient risk modeling of many diseases, but EHR resources have been largely untapped in the study of pregnancy. Here, we apply machine learning to diverse data from EHRs to predict singleton preterm birth. Leveraging a large cohort of 35,282 deliveries, we find that machine learning models based on billing codes alone can predict preterm birth risk at various gestational ages (e.g., ROC-AUC=0.75, PR-AUC=0.40 at 28 weeks of gestation) and outperform comparable models trained using known risk factors (e.g., ROC-AUC=0.65, PR-AUC=0.25 at 28 weeks). Examining the patterns learned by the model reveals it stratifies deliveries into interpretable groups, including high-risk preterm birth sub-types enriched for distinct comorbidities. Our machine learning approach also predicts preterm birth sub-types (spontaneous vs. indicated), mode of delivery, and recurrent preterm birth. Finally, we demonstrate the portability of our approach by showing that the prediction models maintain their accuracy on a large, independent cohort (5,978 deliveries) from a different healthcare system. By leveraging rich phenotypic and genetic features derived from EHRs, we suggest that machine learning algorithms have great potential to improve medical care during pregnancy.Competing Interest StatementLJM is a consultant for Mirvie, Inc.Funding StatementFor funding, please see pdf of manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:A designee of the Vanderbilt Institutional Review Board reviewed the research study identified above. The designee determined the study does not qualify as "human subject" research per Section 46.102(f)(2).I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll code and models in this study are available at https://github.com/abraham-abin13/ptb_predict_ml. https://github.com/abraham-abin13/ptb_predict_ml ER -