RT Journal Article SR Electronic T1 The Pulse of Artificial Intelligence in Cardiology: A Comprehensive Evaluation of State-of-the-Art Large Language Models for Potential Use in Clinical Cardiology JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.08.08.23293689 DO 10.1101/2023.08.08.23293689 A1 Novak, Andrej A1 Zeljković, Ivan A1 Rode, Fran A1 Lisičić, Ante A1 Nola, Iskra A. A1 Pavlović, Nikola A1 Manola, Šime YR 2024 UL http://medrxiv.org/content/early/2024/12/07/2023.08.08.23293689.abstract AB Introduction Over the past two years, the use of Large Language Models (LLMs) in clinical medicine has expanded significantly, particularly in cardiology, where they are applied to ECG interpretation, data analysis, and risk prediction. This study evaluates the performance of five advanced LLMs—Google Bard, GPT-3.5 Turbo, GPT-4.0, GPT-4o, and GPT-o1-mini—in responding to cardiology-specific questions of varying complexity.Methods A comparative analysis was conducted using four test sets of increasing difficulty, encompassing a range of cardiovascular topics, from prevention strategies to acute management and diverse pathologies. The models’ responses were assessed for accuracy, understanding of medical terminology, clinical relevance, and adherence to guidelines by a panel of experienced cardiologists.Results All models demonstrated a foundational understanding of medical terminology but varied in clinical application and accuracy. GPT-4.0 exhibited superior performance, with accuracy rates of 92% (Set A), 88% (Set B), 80% (Set C), and 84% (Set D). GPT-4o and GPT-o1-mini closely followed, surpassing GPT-3.5 Turbo, which scored 83%, 64%, 67%, and 57%, and Google Bard, which achieved 79%, 60%, 50%, and 55%, respectively. Statistical analyses confirmed significant differences in performance across the models, particularly in the more complex test sets. While all models demonstrated potential for clinical application, their inability to reference ongoing clinical trials and some inconsistencies in guideline adherence highlight areas for improvement.Conclusion LLMs demonstrate considerable potential in interpreting and applying clinical guidelines to vignette-based cardiology queries, with GPT-4.0 leading in accuracy and guideline alignment. These tools offer promising avenues for augmenting clinical decision-making but should be used as complementary aids under professional supervision.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis study did not receive any funding.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors