Abstract
Background Preterm birth is defined by the onset of labor at a gestational age shorter than 37 weeks and it can lead to premature birth and impose a threat to newborns’ health. The Puerto Rico PROTECT cohort is a well-characterized prospective birth cohort that was designed to investigate environmental and social contributors to preterm birth in Puerto Rico, where preterm birth rates have been elevated in recent decades. To elucidate possible relationships between metabolites and preterm birth in this cohort, we conducted a nested case-control study to conduct untargeted metabolomic characterization of maternal plasma of 31 preterm birth women and 69 full-term labor controls at 24-28 gestational weeks.
Results A total of 333 metabolites were identified and annotated with liquid chromatography/mass spectrometry. Subsequent weighted gene correlation network analysis shows the fatty acid and carene enriched module has a significant positive association (p-value=8e-04) with preterm birth. After controlling for potential clinical confounders, a total of 38 metabolites demonstrated significant changes uniquely associated with preterm birth, where 17 of them were preterm biomarkers. Among seven machine-learning classifiers, application of random forest achieved the highly accurate and specific prediction (AUC = 0.92) for preterm birth in testing data, demonstrating their strong potential as biomarkers for preterm births. The 17 preterm biomarkers are involved in cell signaling, lipid metabolism, and lipid peroxidation functions. Further causality analysis infers that suberic acid upregulates several fatty acids to promote preterm birth.
Conclusions Altogether, this study demonstrates the involvement of lipids, particularly fatty acids, in the pathogenesis of preterm birth.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was supported by the Superfund Research Program of the National Institute of Environmental Health Sciences, National Institutes of Health (grants P42ES017198). Additional support was provided from NIEHS grant numbers P50ES026049, R01ES032203, and P30ES017885 and the Environmental influences on Child Health Outcomes (ECHO) program grant number UH3OD023251. LXG is supported by grants K01ES025434 awarded by NIEHS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), R01 LM012373 and LM012907 awarded by NLM, R01 HD084633 (LXG and SS) awarded by NICHD.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethics and Research Committees of the University of Puerto Rico (IRB # A8570110) and Northeastern University (IRB # 150629).
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
The metabolomics data set has been uploaded to Metabolomic Workbench, which is a public repository for metabolomics.
Abbreviations
- LDA
- linear discriminant analysis
- RF
- random forest
- LOG
- elastic net
- GBM
- gradient boosting
- SVM
- support vector machine
- RPART
- classification tree
- PC
- phosphocholine
- PS
- acylglycerophosphoserines
- PE
- diacylglycerophosphoethanolamines
- PI
- phosphatidyinositol
- PG
- phosphatiduglecerol
- FA
- fatty acid
- CAR
- carene
- CYP4F/A
- cytochrome P450 (CYP) 4 F/A
- AUC
- area under the ROC curve
- WGCNA
- weighted gene correlation network analysis
- SOV
- source of variation