Abstract
Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.
Competing Interest Statement
Antonio Parraga-Leo and Patricia Diaz-Gimeno are receiving hononaria from the IVI Foundation.
Funding Statement
This study was supported by the March of Dimes (JLG, TTO, AR, AST, VC, CWYH, RJW, KF, GA, IK, JB, AN, JG, ZW, PN, AK, IB, EK, SJ, SN, YSLL, PRB, DAM, SVL, JA, DKS, NA, JCC, MS) and R35GM138353 (NA), 1R01HL139844 (NA), 3P30AG066515 (NA), 1R61NS114926 (NA), 1R01AG058417 (NA), R01HD105256 (NA, MS), P01HD106414 (NA), the Burroughs Welcome Fund (NA), the Alfred E. Mann Foundation (NA), and the Robertson foundation (NA), Spanish Ministry of Science, Innovation and Universities through FPU program FPU18/0177; EST22/00170 (ALP), Instituto de Salud Carlos III (Spanish Ministry of Science and Innovation) through Miguel Servet program CP20/00118 and co-funded by European Union (PGD).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Collection, generation, and analysis of vaginal microbiome data was approved by the National Heart, Lung, and Blood Institute (NHLBI) Clinical Data Science Institutional Review Board (CDS-IRB) in study number 2021-040, and reliance was granted to the NHLBI CDS-IRB by the University of California, San Francisco Institutional Review Board in study number 21-35274.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Adjusted author list to reflect full participation, funding, and COI
Data availability
Sequence data and associated metadata for Study SDY465 were downloaded from ImmPort57 via the March of Dimes Preterm Birth database38. Sequence data and associated metadata for BioProjects PRJNA242473, PRJNA294119, PRJNA393472, and PRJNA430482 were downloaded from the NCBI Sequence Read Archive55. Additional associated metadata for PRJNA430482 were requested through and obtained from the RAMS Registry (https://ramsregistry.vcu.edu).
Sequence data and associated metadata for Projects PRJEB11895, PRJEB12577, PRJEB21325, and PRJEB30642 were downloaded from the Sequence Read Archive of the European Nucleotide Archive56, with associated metadata for PRJEB11895 and PRJEB12577 downloaded from Additional Files 4 and 6 from the paper by the Kindinger et al.60. Additional associated metadata for Projects PRJEB11895, PRJEB12577, PRJEB21325, and PRJEB30642 were requested from the senior author.
Sequence data and associated metadata for accession number phs001739.v1.p1 were downloaded from the database of Genotypes and Phenotypes (dbGaP)37.
The training dataset representing 7 of the 9 aggregated studies and the validation dataset for our Challenge are available under Study ID SDY2187 from the MOD Preterm Birth Research Database (https://pretermbirthdb.org/mod/studydata). Two of the nine training data (PRJNA430482 and phs001739.v1.p1.) are exclusively available via dbGap after following the application procedures there.