Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management

View ORCID ProfilePeikai Chen, View ORCID ProfileJifu Cai, Jiaying Zhou, View ORCID ProfileShaoxi Chen, Chenguang Xu, Lihua Yuan, Xiaoying Dai, Xiaowei Chen, View ORCID ProfileYanzhe Wei, Xia Li, Shaofeng Gong, Xiaolong Liang, Jiancheng Yang, Jun Jin, Kanglin Dai, Yuzhen Cui, View ORCID ProfileGuan-Ming Kuang, Jianshen Xie, Libing Luo, Haibing Xiao, Shijie Yin, Jun Yang, Yulan Yan, Jianliang Chen, Yihua Chen, Qianshen Zhang, Qingshan Zhou, Lina Zhao, Min Wu, View ORCID ProfileXin Tang, Lei Rong, View ORCID ProfileZanxin Wang, Weifu Qiu, Yanli Wang, Liwen Cui, Xiangyang Li, View ORCID ProfileYong Hu, View ORCID ProfileHuiren Tao, View ORCID ProfileNan Wu, View ORCID ProfilePearl Pai, Minxin Wei, View ORCID ProfileMichael Kai-tsun To, View ORCID ProfileKenneth M.C. Cheung
doi: https://doi.org/10.1101/2025.06.20.25326623
Peikai Chen
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
3AIBD Lab, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Peikai Chen
  • For correspondence: puekai{at}gmail.com
Jifu Cai
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
4Dept. Neurology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jifu Cai
Jiaying Zhou
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
5Dept. Pediatrics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shaoxi Chen
6Dept. Accidents and Emergency, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Shaoxi Chen
Chenguang Xu
7Neonatal ICU (NICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lihua Yuan
8Dept. Pediatric Surgery, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiaoying Dai
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
9Dept. Prenatal Diagnosis (Clinical Genetics), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiaowei Chen
10Dept. Nephrology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yanzhe Wei
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yanzhe Wei
Xia Li
11Dept. Respiratory Medicine, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shaofeng Gong
12Intensive Care Unit (ICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiaolong Liang
13Dept. Cardiac Surgery, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jiancheng Yang
14Dept. Cardiology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jun Jin
12Intensive Care Unit (ICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kanglin Dai
8Dept. Pediatric Surgery, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yuzhen Cui
4Dept. Neurology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Guan-Ming Kuang
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guan-Ming Kuang
Jianshen Xie
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
9Dept. Prenatal Diagnosis (Clinical Genetics), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Libing Luo
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
9Dept. Prenatal Diagnosis (Clinical Genetics), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Haibing Xiao
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
4Dept. Neurology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shijie Yin
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jun Yang
5Dept. Pediatrics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yulan Yan
11Dept. Respiratory Medicine, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jianliang Chen
5Dept. Pediatrics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yihua Chen
7Neonatal ICU (NICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qianshen Zhang
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
7Neonatal ICU (NICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Qingshan Zhou
12Intensive Care Unit (ICU), The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lina Zhao
14Dept. Cardiology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Min Wu
14Dept. Cardiology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xin Tang
15Dept. Pediatric Orthopedics, Children’s Hospital, Zhejiang University School of Medicine, Hangzhou 310052, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xin Tang
Lei Rong
11Dept. Respiratory Medicine, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zanxin Wang
13Dept. Cardiac Surgery, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Zanxin Wang
Weifu Qiu
6Dept. Accidents and Emergency, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yanli Wang
6Dept. Accidents and Emergency, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Liwen Cui
10Dept. Nephrology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xiangyang Li
10Dept. Nephrology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yong Hu
3AIBD Lab, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Yong Hu
Huiren Tao
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Huiren Tao
Nan Wu
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
16Dept. Orthopedics, Peking Union Medical College Hospital (PUMCH), Chinese Academy of Medical Sciences, Beijing 100730, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Nan Wu
Pearl Pai
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
10Dept. Nephrology, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Pearl Pai
Minxin Wei
13Dept. Cardiac Surgery, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Kai-tsun To
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Michael Kai-tsun To
  • For correspondence: puekai{at}gmail.com
Kenneth M.C. Cheung
1Dept. Orthopedics, The University of Hong Kong - Shenzhen Hospital (HKU-SZH), Shenzhen 518053, China
2Shenzhen Clinical Research Center for Rare Diseases, Shenzhen 518053, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Kenneth M.C. Cheung
  • For correspondence: puekai{at}gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background State-of-the-art (SOTA) large language models (LLMs) are poised to revolutionize clinical medicine by transforming diagnostic, therapeutic, and interdisciplinary reasoning. Despite their promising capabilities, rigorous benchmarking of these models is essential to address concerns about their clinical proficiency and safety, particularly in high-risk environments.

Methods This study implemented a multi-disciplinary, randomized, single-blind evaluation framework involving 27 experienced specialty clinicians with an average of 25.9 years of practice. The assessment covered 685 simulated and real clinical cases across 13 subspecialties, including both common and rare conditions. Evaluators rated LLM responses on medical strength (0–10 scale, where >9.5 signified leading expert proficiency) and hallucination severity (0–5 scale for fabricated or misleading medical elements). Seven SOTA LLMs were tested, including top-ranked models from the ARENA leaderboard, with statistical analyses applied to adjust for confounders such as response length.

Findings The evaluation revealed clinical plausibility in general-purpose LLMs, with Gemini 2.0 Flash leading raw scores and DeepSeek R1 excelling in adjusted analyses. Top models demonstrated proficiency comparable to a physician of 6 years post qualification experience (score ∼6.0), yet significant risks were noted. Instances of incompetence (scores ≤4) were detected across specialties, and 40 hallucination instances involving fabricated conditions, medications, and classification errors. These findings underscore the importance of implementing stringent safeguards to mitigate potential adverse outcomes in clinical applications.

Interpretation While SOTA LLMs show substantial promise in enhancing clinical reasoning and decision-making, their unguarded application in medicine could present serious risks, such as misinformation and diagnostic errors. Human expert oversight remains crucial, particularly given reported incompetence and hallucination risks. Larger, multi-center studies are warranted to evaluate their real-world performance and track their evolution before broader clinical adoption.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by the Sanming Project of Medicine in Shenzhen (SZSM202311022), Shenzhen Clinical Research Center for Rare Diseases (LCYSSQ20220823091402005), Shenzhen Key Medical Discipline Construction Fund (SZXK2020084 and SZXK077), and the Shenzhen Science and Technology Major Project of China (KJZD20240903102759061).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval for this study was obtained from The University of Hong Kong - Shenzhen Hospital Ethics Committee with ID Ethics-2025-078.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • The authors declare that they do not have any conflict interests.

Data Availability

The code for processing the LLM responses, running the web-system, and for analyzing the resulting data were deposited on an open repository (\url{https://github.com/HKUSZH/LLMMed}). The JSON files for the model responses, evaluation scores, and sample questions, were also uploaded to the same link. Complete question-sets are available upon request to the corresponding authors.

https://github.com/HKUSZH/LLMMed

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted June 24, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management
Peikai Chen, Jifu Cai, Jiaying Zhou, Shaoxi Chen, Chenguang Xu, Lihua Yuan, Xiaoying Dai, Xiaowei Chen, Yanzhe Wei, Xia Li, Shaofeng Gong, Xiaolong Liang, Jiancheng Yang, Jun Jin, Kanglin Dai, Yuzhen Cui, Guan-Ming Kuang, Jianshen Xie, Libing Luo, Haibing Xiao, Shijie Yin, Jun Yang, Yulan Yan, Jianliang Chen, Yihua Chen, Qianshen Zhang, Qingshan Zhou, Lina Zhao, Min Wu, Xin Tang, Lei Rong, Zanxin Wang, Weifu Qiu, Yanli Wang, Liwen Cui, Xiangyang Li, Yong Hu, Huiren Tao, Nan Wu, Pearl Pai, Minxin Wei, Michael Kai-tsun To, Kenneth M.C. Cheung
medRxiv 2025.06.20.25326623; doi: https://doi.org/10.1101/2025.06.20.25326623
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
An interdisciplinary, randomized, single-blind evaluation of state-of-the-art large language models for their implications and risks in medical diagnosis and management
Peikai Chen, Jifu Cai, Jiaying Zhou, Shaoxi Chen, Chenguang Xu, Lihua Yuan, Xiaoying Dai, Xiaowei Chen, Yanzhe Wei, Xia Li, Shaofeng Gong, Xiaolong Liang, Jiancheng Yang, Jun Jin, Kanglin Dai, Yuzhen Cui, Guan-Ming Kuang, Jianshen Xie, Libing Luo, Haibing Xiao, Shijie Yin, Jun Yang, Yulan Yan, Jianliang Chen, Yihua Chen, Qianshen Zhang, Qingshan Zhou, Lina Zhao, Min Wu, Xin Tang, Lei Rong, Zanxin Wang, Weifu Qiu, Yanli Wang, Liwen Cui, Xiangyang Li, Yong Hu, Huiren Tao, Nan Wu, Pearl Pai, Minxin Wei, Michael Kai-tsun To, Kenneth M.C. Cheung
medRxiv 2025.06.20.25326623; doi: https://doi.org/10.1101/2025.06.20.25326623

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4483)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15283)
  • Forensic Medicine (31)
  • Gastroenterology (1134)
  • Genetic and Genomic Medicine (6651)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4606)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1624)
  • Hematology (545)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15965)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (675)
  • Neurology (6699)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3370)
  • Ophthalmology (989)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (670)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5497)
  • Public and Global Health (9288)
  • Radiology and Imaging (2225)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1202)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (536)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)