Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Differential Predictability of Preterm Birth Types: Strong Signals for Indicated Cases versus Limited Success in Spontaneous Preterm Birth

Yun Chao Lin, Andrea Clark-Sevilla, Mahdi A. Loodaricheh, Itsik Pe’er, Anita Raja, Ronald Wapner, Ansaf Salleb-Aouissi
doi: https://doi.org/10.1101/2025.07.09.25329712
Yun Chao Lin
aDepartment of Computer Science, Columbia University, New York, NY
MPhil
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: ycl2112{at}columbia.edu
Andrea Clark-Sevilla
aDepartment of Computer Science, Columbia University, New York, NY
MS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mahdi A. Loodaricheh
cDepartment of Computer Science, CUNY Hunter College, New York, NY
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Itsik Pe’er
aDepartment of Computer Science, Columbia University, New York, NY
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Anita Raja
cDepartment of Computer Science, CUNY Hunter College, New York, NY
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ronald Wapner
bDepartment of Obstetrics and Gynecology, Columbia University, New York, NY
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ansaf Salleb-Aouissi
aDepartment of Computer Science, Columbia University, New York, NY
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Summary

Background Preterm birth, defined as birth occurring before 37 weeks of gestation, poses a significant and enduring public health challenge, with substantial emotional and financial burdens on families and society. To identify preterm births early in pregnancy, we investigated the predictive ability of machine learning models in a nulliparous (first-time pregnancy) study cohort. Preterm births are categorized into two major types: indicated preterm birth, which occurs due to medical conditions such as preeclampsia or other maternal/fetal complications requiring early delivery, and spontaneous preterm birth, which involves the natural onset of preterm labor. Our research aims to develop predictive tools that could enable earlier intervention and improved outcomes for these vulnerable pregnancies.

Methods Our study analyzed the Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-be cohort (nu-MoM2b), comprising data from eight clinical sites throughout the United States from October 2010 to May 2014, including treatment, psychological, physiological, medical history, demographic, ultrasound, activity, toxicology, family history, pre-pregnancy diet, and genetic race. We distinguished between spontaneous and indicated preterm births to develop targeted predictive models for each subtype. We also used a novel approach to predict preterm birth called learning with privileged information, information available during training but often inaccessible during evaluation. Specifically, the set of privileged information that we utilized for PTB prediction includes the occurrence of adverse pregnancy outcomes (APOs), after-delivery physiology information, and maternal outcomes. We developed an enhanced model, XGBoost+, which incorporates this privileged information to improve predictive performance compared to traditional machine learning approaches.

Results We selected XGBoost as our base model due to its robust performance with tabular data and its ensemble approach that effectively mitigates overfitting while capturing complex relationships between clinical variables, making it particularly well-suited for the heterogeneous risk factors associated with preterm birth prediction. XGboost-based models achieved higher AUC against all other models, including decision tree, random forest, logistic regression, and SVM for all visits. Our XGboost+ model, utilizing privileged information, achieved an AUC of 0.72. Analyzing the subcategories of preterm birth, XGboost+ achieved similar performance with XGboost for spontaneous preterm birth (0.68 AUC versus 0.67 AUC), but improvements were more significant for indicated preterm birth (0.78 versus 0.74). These results demonstrate the benefits of how information that is not typically utilized in traditional machine learning models can help build better models.

Conclusion Our extensive analysis of this comprehensive set of risk factors revealed preterm birth as a multifaceted issue, with different risk factors associated with two subcategories of preterm birth - spontaneous and indicated. No-tably, we achieved significant success in predicting indicated preterm birth, demonstrating strong predictive performance (AUC 0.78) using our XGBoost+ model. This finding represents an important advancement, as indicated preterm birth is influenced mainly by conditions related to hypertension and preeclampsia, which our model effectively captured. While spontaneous preterm birth remains challenging to predict with clinical data alone, especially in early pregnancy, our research successfully differentiates between these subtypes and provides a valuable predictive tool for indicated preterm birth. The complexity of spontaneous preterm birth suggests that future research should focus on gathering more proximal biological data, including vaginal microbiota or raw cervical images, to complement our successful approach for indicated preterm birth prediction.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study was supported by grant funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) grant number R01LM013327. The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This study, titled “SCH: Prediction of Preterm Birth in Nulliparous Women,” was conducted with full ethical oversight and approval. The research protocol underwent comprehensive review and received approval from both the Columbia University Human Subjects Institutional Review Board and the City University of New York (CUNY) Institutional Review Board. All research activities were performed in accordance with relevant guidelines and regulations for human subjects research. Informed consent was obtained from all participants prior to their inclusion in the study.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

All data produced are available online at NICHD Dash: Nulliparous Pregnancy Outcomes Study: Monitoring Mothers-to-Be (nuMoM2b)

https://dash.nichd.nih.gov/study/226675

Data Availability

The datasets generated and analyzed during the current study were obtained from the NIH Data and Specimen Hub (DASH). Access restrictions apply to these data in accordance with data use agreements and participant privacy protections. While the raw data are not publicly available due to these restrictions, researchers may request access through formal application to the NIH DASH repository. Qualified researchers can obtain the data by submitting a request to the repository administrators and securing appropriate institutional permissions. The authors will facilitate data access requests when possible, subject to compliance with all applicable regulations and the original data use agreement terms.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted July 10, 2025.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Differential Predictability of Preterm Birth Types: Strong Signals for Indicated Cases versus Limited Success in Spontaneous Preterm Birth
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Differential Predictability of Preterm Birth Types: Strong Signals for Indicated Cases versus Limited Success in Spontaneous Preterm Birth
Yun Chao Lin, Andrea Clark-Sevilla, Mahdi A. Loodaricheh, Itsik Pe’er, Anita Raja, Ronald Wapner, Ansaf Salleb-Aouissi
medRxiv 2025.07.09.25329712; doi: https://doi.org/10.1101/2025.07.09.25329712
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Differential Predictability of Preterm Birth Types: Strong Signals for Indicated Cases versus Limited Success in Spontaneous Preterm Birth
Yun Chao Lin, Andrea Clark-Sevilla, Mahdi A. Loodaricheh, Itsik Pe’er, Anita Raja, Ronald Wapner, Ansaf Salleb-Aouissi
medRxiv 2025.07.09.25329712; doi: https://doi.org/10.1101/2025.07.09.25329712

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Obstetrics and Gynecology
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15278)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6645)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4605)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15961)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5495)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (535)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)