PT - JOURNAL ARTICLE AU - Liu, Siru AU - McCoy, Allison B. AU - Wright, Aileen P. AU - Carew, Babatunde AU - Genkins, Julian Z. AU - Huang, Sean S. AU - Peterson, Josh F. AU - Steitz, Bryan AU - Wright, Adam TI - Leveraging Large Language Models for Generating Responses to Patient Messages AID - 10.1101/2023.07.14.23292669 DP - 2023 Jan 01 TA - medRxiv PG - 2023.07.14.23292669 4099 - http://medrxiv.org/content/early/2023/07/16/2023.07.14.23292669.short 4100 - http://medrxiv.org/content/early/2023/07/16/2023.07.14.23292669.full AB - Objective This study aimed to develop and assess the performance of fine-tuned large language models for generating responses to patient messages sent via an electronic health record patient portal.Methods Utilizing a dataset of messages and responses extracted from the patient portal at a large academic medical center, we developed a model (CLAIR-Short) based on a pre-trained large language model (LLaMA-65B). In addition, we used the OpenAI API to update physician responses from an open-source dataset into a format with informative paragraphs that offered patient education while emphasizing empathy and professionalism. By combining with this dataset, we further fine-tuned our model (CLAIR-Long). To evaluate the fine-tuned models, we used ten representative patient portal questions in primary care to generate responses. We asked primary care physicians to review generated responses from our models and ChatGPT and rated them for empathy, responsiveness, accuracy, and usefulness.Results The dataset consisted of a total of 499,794 pairs of patient messages and corresponding responses from the patient portal, with 5,000 patient messages and ChatGPT-updated responses from an online platform. Four primary care physicians participated in the survey. CLAIR-Short exhibited the ability to generate concise responses similar to provider’s responses. CLAIR-Long responses provided increased patient educational content compared to CLAIR-Short and were rated similarly to ChatGPT’s responses, receiving positive evaluations for responsiveness, empathy, and accuracy, while receiving a neutral rating for usefulness.Conclusion Leveraging large language models to generate responses to patient messages demonstrates significant potential in facilitating communication between patients and primary care providers.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by NIH grants: K99LM014097-01, R01AG062499-01, and R01LM013995-01.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:IRB of Vanderbilt University Medical Center waived ethical approval for this work.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.Yes