ABSTRACT
In a randomized, pre-post intervention study, we evaluated the influence of a large language model (LLM) generative AI system on accuracy of physician decision-making and bias in healthcare. 50 US-licensed physicians reviewed a video clinical vignette, featuring actors representing different demographics (a White male or a Black female) with chest pain. Participants were asked to answer clinical questions around triage, risk, and treatment based on these vignettes, then asked to reconsider after receiving advice generated by ChatGPT+ (GPT4). The primary outcome was the accuracy of clinical decisions based on pre-established evidence-based guidelines. Results showed that physicians are willing to change their initial clinical impressions given AI assistance, and that this led to a significant improvement in clinical decision-making accuracy in a chest pain evaluation scenario without introducing or exacerbating existing race or gender biases. A survey of physician participants indicates that the majority expect LLM tools to play a significant role in clinical decision making.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Bryan Bunning, MS, Funding: National Library of Medicine (2T15LM007033); Elaine Khoong, MD, MS, Funding: National Heart Lung and Blood Institute of the NIH under Award Number K23HL157750.; Robert Gallo, MD, Funding: Dr. Gallo is supported by a VA Advanced Fellowship in Medical Informatics. The views expressed are those of the authors and not necessarily those of the Department of Veterans Affairs or those of the United States government. Arnold Milstein, MD, Funding: Pooled philanthropic gifts to Stanford University. Research funding from Stanford Healthcare and Stanford Childrens Health; Damon Centola, Funding: DC gratefully acknowledges support from a Robert Wood Johnson Pioneer Grant.; Jonathan H. Chen, Funding: NIH/National Institute of Allergy and Infectious Diseases (1R01AI17812101), NIH/National Institute on Drug Abuse Clinical Trials Network (UG1DA015815 - CTN-0136), Gordon and Betty Moore Foundation (Grant #12409), Stanford Artificial Intelligence in Medicine and Imaging - Human-Centered Artificial Intelligence (AIMI-HAI) Partnership Grant, Doris Duke Charitable Foundation - Covid-19 Fund to Retain Clinical Scientists (20211260), Google, Inc. Research collaboration Co-I to leverage EHR data to predict a range of clinical outcomes, American Heart Association - Strategically Focused Research Network - Diversity in Clinical Trials
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Board, IRB gave Stanford University ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors.