ABSTRACT
Background Subject screening is a key aspect of all clinical trials; however, traditionally, it is a labor-intensive and error-prone task, demanding significant time and resources. With the advent of large language models (LLMs) and related technologies, a paradigm shift in natural language processing capabilities offers a promising avenue for increasing both quality and efficiency of screening efforts. This study aimed to test the Retrieval-Augmented Generation (RAG) process enabled Generative Pretrained Transformer Version 4 (GPT-4) to accurately identify and report on inclusion and exclusion criteria for a clinical trial.
Methods The Co-Operative Program for Implementation of Optimal Therapy in Heart Failure (COPILOT-HF) trial aims to recruit patients with symptomatic heart failure. As part of the screening process, a list of potentially eligible patients is created through an electronic health record (EHR) query. Currently, structured data in the EHR can only be used to determine 5 out of 6 inclusion and 5 out of 17 exclusion criteria. Trained, but non-licensed, study staff complete manual chart review to determine patient eligibility and record their assessment of the inclusion and exclusion criteria. We obtained the structured assessments completed by the study staff and clinical notes for the past two years and developed a workflow of clinical note-based question answering system powered by RAG architecture and GPT-4 that we named RECTIFIER (RAG-Enabled Clinical Trial Infrastructure for Inclusion Exclusion Review). We used notes from 100 patients as a development dataset, 282 patients as a validation dataset, and 1894 patients as a test set. An expert clinician completed a blinded review of patients’ charts to answer the eligibility questions and determine the “gold standard” answers. We calculated the sensitivity, specificity, accuracy, and Matthews correlation coefficient (MCC) for each question and screening method. We also performed bootstrapping to calculate the confidence intervals for each statistic.
Results Both RECTIFIER and study staff answers closely aligned with the expert clinician answers across criteria with accuracy ranging between 97.9% and 100% (MCC 0.837 and 1) for RECTIFIER and 91.7% and 100% (MCC 0.644 and 1) for study staff. RECTIFIER performed better than study staff to determine the inclusion criteria of “symptomatic heart failure” with an accuracy of 97.9% vs 91.7% and an MCC of 0.924 vs 0.721, respectively. Overall, the sensitivity and specificity of determining eligibility for the RECTIFIER was 92.3% (CI) and 93.9% (CI), and study staff was 90.1% (CI) and 83.6% (CI), respectively.
Conclusion GPT-4 based solutions have the potential to improve efficiency and reduce costs in clinical trial screening. When incorporating new tools such as RECTIFIER, it is important to consider the potential hazards of automating the screening process and set up appropriate mitigation strategies such as final clinician review before patient engagement.
Competing Interest Statement
For this study, complimentary access to Azure OpenAI GPT-4V was provided by Microsoft. Microsoft had no access to the data used and had no involvement in the analysis, interpretation of data, or writing of our study. Samuel J Aronson, Alexander J Blood, Charlotte J Mailly, Michael F Oates, Benjamin M Scirica, Jiyeon Shin Michela R Tucci, Ozan Unlu, Matthew Varugheese, and Fei Wang report Research Grants and related funding via Brigham and Women's Hospital: Better Therapeutics, Boehringer Ingelheim, Eli Lilly, Milestone Pharmaceuticals and NovoNordisk. Samuel J Aronson reports consulting to Nest Genomics. Samuel Aronson, Charlotte J Mailly, Michael F Oates, Michela Tucci and Fei Wang also report unrelated NIH and PCORI support. Alexander J. Blood reports consulting income from Walgreens Health, Color Health, Novo Nordisk, Medscape, and Arsenal Capital Partners, and equity holdings in Knownwell health.Benjamin M Scirica reports consulting fees from Abbvie (DSMB), AstraZeneca (DSMB), Boehringer Ingelheim (DSMB), Better Therapeutics, Elsevier Practice Update Cardiology, Esperion, Hanmi (DSMB), Lexeo (DSMB), and NovoNordisk; and equity in Health [at] Scale. Ozan Unlu receives funding from the National Heart Lung and Blood Institute under award number T32HL007604.
Funding Statement
For this study, complimentary access to Azure OpenAI GPT-4V was provided by Microsoft. This work was conducted with support from Harvard Catalyst, The Harvard Clinical and Translational Science Center (National Center for Advancing Translational Sciences, National Institutes of Health Awards UL1 TR002541 and R01HL151643, and financial contributions from Harvard University and its affiliated academic healthcare centers. Dr. Unlu receives funding from the National Heart Lung and Blood Institute under award number T32HL007604.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Broad of Mass General Brigham gave ethical approval for this work which was performed as part of the COPILOT-HF Study.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors. Any data shared will be in strict compliance with the Health Insurance Portability and Accountability Act (HIPAA) and applicable data use agreements to ensure the protection of privacy and confidentiality of any potentially identifiable information.