Abstract
Introduction Over the past two years, the use of Large Language Models (LLMs) in clinical medicine has expanded significantly, particularly in cardiology, where they are applied to ECG interpretation, data analysis, and risk prediction. This study evaluates the performance of five advanced LLMs—Google Bard, GPT-3.5 Turbo, GPT-4.0, GPT-4o, and GPT-o1-mini—in responding to cardiology-specific questions of varying complexity.
Methods A comparative analysis was conducted using four test sets of increasing difficulty, encompassing a range of cardiovascular topics, from prevention strategies to acute management and diverse pathologies. The models’ responses were assessed for accuracy, understanding of medical terminology, clinical relevance, and adherence to guidelines by a panel of experienced cardiologists.
Results All models demonstrated a foundational understanding of medical terminology but varied in clinical application and accuracy. GPT-4.0 exhibited superior performance, with accuracy rates of 92% (Set A), 88% (Set B), 80% (Set C), and 84% (Set D). GPT-4o and GPT-o1-mini closely followed, surpassing GPT-3.5 Turbo, which scored 83%, 64%, 67%, and 57%, and Google Bard, which achieved 79%, 60%, 50%, and 55%, respectively. Statistical analyses confirmed significant differences in performance across the models, particularly in the more complex test sets. While all models demonstrated potential for clinical application, their inability to reference ongoing clinical trials and some inconsistencies in guideline adherence highlight areas for improvement.
Conclusion LLMs demonstrate considerable potential in interpreting and applying clinical guidelines to vignette-based cardiology queries, with GPT-4.0 leading in accuracy and guideline alignment. These tools offer promising avenues for augmenting clinical decision-making but should be used as complementary aids under professional supervision.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
Figure 1 revised and updated with GPT o-series.
Data Availability
All data produced in the present study are available upon reasonable request to the authors