RT Journal Article SR Electronic T1 Microbiome Preterm Birth DREAM Challenge: Crowdsourcing Machine Learning Approaches to Advance Preterm Birth Research JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.03.07.23286920 DO 10.1101/2023.03.07.23286920 A1 Jonathan L. Golob A1 Tomiko T. Oskotsky A1 Alice S. Tang A1 Alennie Roldan A1 Verena Chung A1 Connie W.Y. Ha A1 Ronald J. Wong A1 Kaitlin J. Flynn A1 Antonio Parraga-Leo A1 Camilla Wibrand A1 Samuel S. Minot A1 Gaia Andreoletti A1 Idit Kosti A1 Julie Bletz A1 Amber Nelson A1 Jifan Gao A1 Zhoujingpeng Wei A1 Guanhua Chen A1 Zheng-Zheng Tang A1 Pierfrancesco Novielli A1 Donato Romano A1 Ester Pantaleo A1 Nicola Amoroso A1 Alfonso Monaco A1 Mirco Vacca A1 Maria De Angelis A1 Roberto Bellotti A1 Sabina Tangaro A1 Abigail Kuntzleman A1 Isaac Bigcraft A1 Stephen Techtmann A1 Daehun Bae A1 Eunyoung Kim A1 Jongbum Jeon A1 Soobok Joe A1 The Preterm Birth DREAM Community A1 Kevin R. Theis A1 Sherrianne Ng A1 Yun S. Lee Li A1 Patricia Diaz-Gimeno A1 Phillip R. Bennett A1 David A. MacIntyre A1 Gustavo Stolovitzky A1 Susan V. Lynch A1 Jake Albrecht A1 Nardhy Gomez-Lopez A1 Roberto Romero A1 David K. Stevenson A1 Nima Aghaeepour A1 Adi L. Tarca A1 James C. Costello A1 Marina Sirota YR 2023 UL http://medrxiv.org/content/early/2023/04/11/2023.03.07.23286920.abstract AB Globally, every year about 11% of infants are born preterm, defined as a birth prior to 37 weeks of gestation, with significant and lingering health consequences. Multiple studies have related the vaginal microbiome to preterm birth. We present a crowdsourcing approach to predict: (a) preterm or (b) early preterm birth from 9 publicly available vaginal microbiome studies representing 3,578 samples from 1,268 pregnant individuals, aggregated from raw sequences via an open-source tool, MaLiAmPi. We validated the crowdsourced models on novel datasets representing 331 samples from 148 pregnant individuals. From 318 DREAM challenge participants we received 148 and 121 submissions for our two separate prediction sub-challenges with top-ranking submissions achieving bootstrapped AUROC scores of 0.69 and 0.87, respectively. Alpha diversity, VALENCIA community state types, and composition (via phylotype relative abundance) were important features in the top performing models, most of which were tree based methods. This work serves as the foundation for subsequent efforts to translate predictive tests into clinical practice, and to better understand and prevent preterm birth.Competing Interest StatementAntonio Parraga-Leo and Patricia Diaz-Gimeno are receiving hononaria from the IVI Foundation.Funding StatementThis study was supported by the March of Dimes (JLG, TTO, AR, AST, VC, CWYH, RJW, KF, GA, IK, JB, AN, JG, ZW, PN, AK, IB, EK, SJ, SN, YSLL, PRB, DAM, SVL, JA, DKS, NA, JCC, MS) and R35GM138353 (NA), 1R01HL139844 (NA), 3P30AG066515 (NA), 1R61NS114926 (NA), 1R01AG058417 (NA), R01HD105256 (NA, MS), P01HD106414 (NA), the Burroughs Welcome Fund (NA), the Alfred E. Mann Foundation (NA), and the Robertson foundation (NA), Spanish Ministry of Science, Innovation and Universities through FPU program FPU18/0177; EST22/00170 (ALP), Instituto de Salud Carlos III (Spanish Ministry of Science and Innovation) through Miguel Servet program CP20/00118 and co-funded by European Union (PGD).Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Collection, generation, and analysis of vaginal microbiome data was approved by the National Heart, Lung, and Blood Institute (NHLBI) Clinical Data Science Institutional Review Board (CDS-IRB) in study number 2021-040, and reliance was granted to the NHLBI CDS-IRB by the University of California, San Francisco Institutional Review Board in study number 21-35274.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesSequence data and associated metadata for Study SDY465 were downloaded from ImmPort57 via the March of Dimes Preterm Birth database38. Sequence data and associated metadata for BioProjects PRJNA242473, PRJNA294119, PRJNA393472, and PRJNA430482 were downloaded from the NCBI Sequence Read Archive55. Additional associated metadata for PRJNA430482 were requested through and obtained from the RAMS Registry (https://ramsregistry.vcu.edu).Sequence data and associated metadata for Projects PRJEB11895, PRJEB12577, PRJEB21325, and PRJEB30642 were downloaded from the Sequence Read Archive of the European Nucleotide Archive56, with associated metadata for PRJEB11895 and PRJEB12577 downloaded from Additional Files 4 and 6 from the paper by the Kindinger et al.60. Additional associated metadata for Projects PRJEB11895, PRJEB12577, PRJEB21325, and PRJEB30642 were requested from the senior author.Sequence data and associated metadata for accession number phs001739.v1.p1 were downloaded from the database of Genotypes and Phenotypes (dbGaP)37.The training dataset representing 7 of the 9 aggregated studies and the validation dataset for our Challenge are available under Study ID SDY2187 from the MOD Preterm Birth Research Database (https://pretermbirthdb.org/mod/studydata). Two of the nine training data (PRJNA430482 and phs001739.v1.p1.) are exclusively available via dbGap after following the application procedures there.