Abstract
Physicians, particularly intensivists, face information overload and decision fatigue, underscoring the need for automated diagnostic tools. Acute Respiratory Distress Syndrome (ARDS) affects over 10% of critical care patients, with over 40% mortality rate, yet is only recognized in 30-70% of cases in clinical settings. We present a reproducible computational pipeline that automates ARDS adjudication in retrospective datasets of mechanically ventilated adults, implementing the Berlin Definition via natural language processing and classification algorithms. We used labeled chest imaging reports from two hospitals to train an XGBoost model to detect bilateral infiltrates, and a labeled subset of attending physician notes from one hospital to train another XGBoost model to detect a pneumonia diagnosis. Both models achieve high discriminative performance on test sets—an area under the receiver operating characteristic curve (AUROC) of 0.88 for adjudicating bilateral infiltrates on chest imaging reports, and an AUROC of 0.87 for detecting pneumonia on attending physician notes. We integrated these models with rule-based components and validated the entire pipeline on a subset of healthcare encounters from a third hospital (MIMIC-III). We find a sensitivity of 93.5% in adjudicating ARDS — far surpassing the 22.6% ARDS documentation rate we found for this cohort — along with a false positive rate of 17.4%. We conclude that our reproducible, automated pipeline holds promise for improving ARDS recognition and could aid clinical practice through real-time EHR integration.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Feihong Xu was supported in part by the National Institutes of Health Training Grant (T32GM008449) through Northwestern University's Biotechnology Training Program. Curtis H. Weiss was supported by the National Heart Lung and Blood Institute (R01HL140362 and K23HL118139). Luís A. Nunes Amaral was supported by the National Heart Lung and Blood Institute (R01HL140362). Luís A. Nunes Amaral and Feihong Xu are supported by the National Institute of Allergy and Infectious Diseases (U19AI135964).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Board of Northwestern University gave ethical approval for this work (STU00208049). Institutional Review Board of Endeavor Health gave ethical approval for this work (EH17-325).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
1. Clarify we used of ARDS documentation rate instead of ARDS recognition rate. 2. Add tables showing how the pipeline performs with optimal probability cutoffs. 3. Clarify what our stance is with the false positive vs. false negative tradeoff, and that this is a choice for every health system to make. 4. Switch from using the term "whole training set" to "development set" to refer to the data used for developing the model for bilateral infiltrates. 5. Clarify how we calculated the PF ratio for Hospital A (2013) and MIMIC (2001-12). 6. Explain our rationale for a 48-h window between hypoxemia and bilateral infiltrates. 7. Caveat that we do not include patients on High Flow oxygenation due to dataset predating the 2024 Berlin Definition update. 8. Clarify role of Table 1. 9. Sharpen what our contribution is to the literature. 10. Address the major limitation of not having chest image data available for the study, and only having one radiologist report per image study (unavailability of a better gold standard). 11. Clarify that we had chest computed tomography and X-ray image reports available, not just chest X-rays. Also, that we did not include lung ultrasound images or reports. 12. Demonstrate that the Bilateral Infiltrates model does well at a patient-level (not just at a report-level). 13. Demonstrate that the pipeline does flag early ARDS (median time post-intubation: 3.9 hours). 14. Demonstrate that the pipeline differentiates a patient population with clinical traits consistent with ARDS. 14. Explore reasons for our physician raters disagreeing in 8% of their labels. 15. Change layout of confusion matrices.
Data Availability
MIMIC-III is available on PhysioNet (https://doi.org/10.13026/C2XW26). Data from Hospital A (2013), Hospital A (2016), and Hospital B (2017-18) are IRB-protected. Upon publication, we will only release de-identified data from the Hospital (2013) cohort needed to reproduce Figure 8 at the ARCH repository hosted by Northwestern University (https://arch.library.northwestern.edu).





