Abstract
BACKGROUND Lower respiratory tract infections (LRTIs) are a leading cause of mortality worldwide and can be difficult to diagnose in critically ill patients, as non-infectious causes of respiratory failure can present with similar clinical features.
METHODS We developed a LRTI diagnostic method combining the pulmonary transcriptomic biomarker FABP4 with electronic medical record (EMR) text assessment using the large language model Generative Pre-trained Transformer 4 (GPT-4). We evaluated this approach in a prospective cohort of critically ill adults with acute respiratory failure from whom tracheal aspirate FABP4 expression was measured by RNA sequencing. Patients with LRTI or non-infectious conditions were identified using retrospective, multi-physician clinical adjudication. We then confirmed our findings by applying this method to an independent validation cohort of 115 adults with acute respiratory failure.
RESULTS In the derivation cohort, a combined classifier incorporating FABP4 expression and GPT-4– assisted EMR analysis achieved an AUC of 0.93 (±0.08) and an accuracy of 84%, outperforming FABP4 expression alone (AUC 0.84 ± 0.11) and GPT-4–based analysis alone (AUC 0.83 ± 0.07). By comparison, the primary medical team’s admission diagnosis had an accuracy of 72%. In the validation cohort, the combined classifier yielded an AUC of 0.98 (±0.04) and an accuracy of 96%.
CONCLUSIONS Integrating a host transcriptional biomarker with EMR text analysis using a large language model may offer a promising new approach to improving the diagnosis of LRTIs in critically ill adults.
Description We present the novel use of a host transcriptional biomarker combined with artificial intelligence analysis of electronic medical record data to diagnose lower respiratory tract infections in a derivation cohort of critically ill adults, then the validation of this approach in a second, fully independent, cohort. This approach demonstrated high diagnostic accuracy compared to a gold standard of post-hoc multi-physician adjudication.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
5R01HL155418 (CRL); Chan Zuckerberg Biohub San Francisco (CRL); R35HL140026 (CSC)
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The cohort was approved by the University of California Institutional Review Board (protocol #10-02701) and informed consent was obtained from patients or surrogate decision makers.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Addition of a validation cohort to confirm results.
Data Sharing Statement
The gene count data are available at https://github.com/infectiousdisease-langelier-lab/LRTI_FABP4_GPT4_classifier. The code and required source data are available at https://github.com/infectiousdisease-langelier-lab/LRTI_FABP4_GPT4_classifier.





