ABSTRACT
Background The application of Large Language Models (LLMs) to create conversational agents (CAs) that can aid health professionals in their daily practice is increasingly popular, mainly due to their ability to understand and communicate in natural language. Conversational agents can manage enormous amounts of information, comprehend and reason with clinical questions, extract information from reliable sources and produce accurate answers to queries. This presents an opportunity for better access to updated and trustworthy clinical information in response to medical queries.
Objective We present the design and initial evaluation of Vitruvius, an agent specialized in answering queries in healthcare knowledge and evidence-based medical research.
Methodology The model is based on a system containing 5 LLMs; each is instructed with precise tasks that allow the algorithms to automatically determine the best search strategy to provide an evidence-based answer. We assessed our system’s comprehension, reasoning, and retrieval capabilities using the public clinical question-answer dataset MedQA-USMLE. The model was improved accordingly, and three versions were manufactured.
Results We present the performance assessment for the three versions of Vitruvius, using a subset of 288 QA (Accuracy V1 86%, V2 90%, V3 93%) and the complete dataset of 1273 QA (Accuracy V2 85%, V3 90.3%). We also evaluate intra-inter-class variability and agreement. The final version of Vitruvius (V3) obtained a Cohen’s kappa of 87% and a state-of-the-art (SoTA) performance of 90.26%, surpassing current SoTAs for other LLMs using the same database.
Conclusions Vitruvius demonstrates excellent performance in medical QA compared to standard database responses and other popular LLMs. Future investigations will focus on testing the model in a real-world clinical environment. While it enhances productivity and aids healthcare professionals, it should not be utilized by individuals unqualified to reason with medical data to ensure that critical decision-making remains in the hands of trained professionals.
Competing Interest Statement
All authors work at Arkangel AI
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All data produced in the present study are available upon reasonable request to the authors