Abstract
Background and Aims Patient-reported outcomes (PROs) are vital in assessing disease activity and treatment outcomes in inflammatory bowel disease (IBD). However, manual extraction of these PROs from the free-text of clinical notes is burdensome. We aimed to improve data curation from free-text information in the electronic health record, making it more available for research and quality improvement. This study aimed to compare traditional natural language processing (tNLP) and large language models (LLMs) in extracting three IBD PROs (abdominal pain, diarrhea, fecal blood) from clinical notes across two institutions.
Methods Clinic notes were annotated for each PRO using preset protocols. Models were developed and internally tested at the University of California San Francisco (UCSF), and then externally validated at Stanford University. We compared tNLP and LLM-based models on accuracy, sensitivity, specificity, positive and negative predictive value. Additionally, we conducted fairness and error assessments.
Results Inter-rater reliability between annotators was >90%. On the UCSF test set (n=50), the top-performing tNLP models showcased accuracies of 92% (abdominal pain), 82% (diarrhea) and 80% (fecal blood), comparable to GPT-4, which was 96%, 88%, and 90% accurate, respectively. On external validation at Stanford (n=250), tNLP models failed to generalize (61-62% accuracy) while GPT-4 maintained accuracies >90%. PaLM-2 and GPT-4 showed similar performance. No biases were detected based on demographics or diagnosis.
Conclusions LLMs are accurate and generalizable methods for extracting PROs. They maintain excellent accuracy across institutions, despite heterogeneity in note templates and authors. Widespread adoption of such tools has the potential to enhance IBD research and patient care.
Competing Interest Statement
VAR receives research support from Alnylam, Takeda, Merck, Genentech, Blueprint Medicines, Stryker, Mitsubishi Tanabe, and Janssen. He also is a shareholder of ZebraMD. MJR has served on an advisory board for Pfizer. THB reports consulting fees from Grai-Matter, Paul Hartmann AG, and Verantos, Inc outside the submitted work and she is a board member of Athelo Health. This study was funded in part by Microsoft, which is an investor in OpenAI, the developer of the GPT-4 model. There are no conflicts of interest for any of the other authors.
Funding Statement
Research reported in this publication was supported by the National Library of Medicine of the National Institutes of Health under Award Number K99LM014099, the National Center for Advancing Translational Sciences, National Institutes of Health, through UCSF-CTSI Grant Number UL1 TR001872, National Institutes of Health T32 DK007762, as well as the UCLA Clinical and Translational Science Institute through grant number UL1TR001881. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH. This research project has benefitted from the Microsoft Accelerate Foundation Models Research (AFMR) grant program through which leading foundation models hosted by Microsoft Azure along with access to Azure credits were provided to conduct the research.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Human Research Protection Program Institutional Review Board at UCSF (IRB#18-24588) and Stanford University (IRB #47644) approved this study.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Potential Conflicts of Interest: VAR receives research support from Alnylam, Takeda, Merck, Genentech, Blueprint Medicines, Stryker, Mitsubishi Tanabe, and Janssen. He also is a shareholder of ZebraMD. MJR has served on an advisory board for Pfizer. THB reports consulting fees from Grai-Matter, Paul Hartmann AG, and Verantos, Inc outside the submitted work and she is a board member of Athelo Health. This study was funded in part by Microsoft, which is an investor in OpenAI, the developer of the GPT-4 model. There are no conflicts of interest for any of the other authors.
Study Materials: Protocols, code, and models are available on GitHub (Supplemental Methods).
Data Availability
All data produced in the present study are available upon reasonable request to the authors