Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports

Zain S. Hussain, Mohammad Delsoz, Muhammad Elahi, Brian Jerkins, Elliot Kanner, Claire Wright, Wuqaas M. Munir, Mohammad Soleimani, Ali Djalilian, Priscilla A. Lao, Joseph W. Fong, Malik Y. Kahook, Siamak Yousefi
doi: https://doi.org/10.1101/2025.03.14.25323836
Zain S. Hussain
1University of Arkansas for Medical Sciences, Little Rock, AR, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mohammad Delsoz
2Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Muhammad Elahi
3Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
BS
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian Jerkins
3Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elliot Kanner
3Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Claire Wright
3Quillen College of Medicine, East Tennessee State University, Johnson City, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Wuqaas M. Munir
4Department of Ophthalmology and Visual Sciences, University of Maryland School of Medicine, Baltimore, MD, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mohammad Soleimani
5Department of Ophthalmology, University of North Carolina, Chapel Hill, NC, USA
6Eye Research Center, Farabi Eye Hospital, Tehran University of Medical Sciences, Tehran, Iran
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Ali Djalilian
7Department of Ophthalmology and Visual Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Priscilla A. Lao
2Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph W. Fong
2Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Malik Y. Kahook
8Department of Ophthalmology, University of Colorado School of Medicine, Aurora, CO, USA
MD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Siamak Yousefi
2Hamilton Eye Institute, Department of Ophthalmology, University of Tennessee Health Science Center, Memphis, TN, USA
9Department of Genetics, Genomics, and Informatics, University of Tennessee Health Science Center, Memphis, TN, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Siamak.Yousefi{at}uthsc.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background This study evaluates the diagnostic performance of several AI models, including Deepseek, in diagnosing corneal diseases, glaucoma, and neuro□ophthalmologic disorders.

Methods We retrospectively selected 53 case reports from the Department of Ophthalmology and Visual Sciences at the University of Iowa, comprising 20 corneal disease cases, 11 glaucoma cases, and 22 neuro□ophthalmology cases. The case descriptions were input into DeepSeek, ChatGPT□4.0, ChatGPT□01, and Qwens 2.5 Max. These responses were compared with diagnoses rendered by human experts (corneal specialists, glaucoma attendings, and neuro□ophthalmologists). Diagnostic accuracy and interobserver agreement, defined as the percentage difference between each AI model’s performance and the average human expert performance, were determined.

Results DeepSeek achieved an overall diagnostic accuracy of 79.2%, with specialty-specific accuracies of 90.0% in corneal diseases, 54.5% in glaucoma, and 81.8% in neuro□ophthalmology. ChatGPT□01 outperformed the other models with an overall accuracy of 84.9% (85.0% in corneal diseases, 63.6% in glaucoma, and 95.5% in neuro□ophthalmology), while Qwens exhibited a lower overall accuracy of 64.2% (55.0% in corneal diseases, 54.5% in glaucoma, and 77.3% in neuro□ophthalmology). Interobserver agreement analysis revealed that in corneal diseases, DeepSeek differed by –3.3% (90.0% vs 93.3%), ChatGPT□01 by –8.3%, and Qwens by –38.3%. In glaucoma, DeepSeek outperformed the human expert average by +3.0% (54.5% vs 51.5%), while ChatGPT□4.0 and ChatGPT□01 exceeded it by +12.1%, and Qwens was +3.0% above the human average. In neuro□ophthalmology, DeepSeek and ChatGPT□4.0 were 9.1% lower than the human average, ChatGPT□01 exceeded it by +4.6%, and Qwens was 13.6% lower.

Conclusions ChatGPT□01 demonstrated the highest overall diagnostic accuracy, especially in neuro□ophthalmology, while DeepSeek and ChatGPT□4.0 showed comparable performance. Qwens underperformed relative to the other models, especially in corneal diseases. Although these AI models exhibit promising diagnostic capabilities, they currently lag behind human experts in certain areas, underscoring the need for a collaborative integration of clinical judgment.

Plain Language Summary This study evaluated how well several artificial intelligence (AI) models diagnose eye diseases compared to human experts. We tested four AI systems across three types of eye conditions: diseases of the cornea, glaucoma, and neuro-ophthalmologic disorders. Overall, one AI model, ChatGPT-01, performed the best, correctly diagnosing about 85% of cases, and it excelled in neuro-ophthalmology by correctly diagnosing 95.5% of cases. Two other models, DeepSeek and ChatGPT-4.0, each achieved an overall accuracy of around 79%, while the Qwens model performed lower, with an overall accuracy of about 64%. When compared with human experts, who achieved very high accuracy in corneal diseases (93.3%) and neuro-ophthalmology (90.9%) but lower in glaucoma (51.5%), the AI models showed mixed results. In glaucoma, for instance, some AI models even outperformed human experts slightly, while in corneal diseases, all AI models were less accurate than the experts. These findings indicate that while AI shows promise as a supportive tool in diagnosing eye conditions, it still needs further improvement. Combining AI with human clinical judgment appears to be the best approach for accurate eye disease diagnosis.

Key summary points

  • Why carry out this study? With the rising burden of eye diseases and the inherent diagnostic challenges for complex conditions like glaucoma and neuro-ophthalmologic disorders, there is an unmet need for innovative diagnostic tools to support clinical decision-making.

  • What did the study ask? This study evaluated the diagnostic performance of four AI models across three ophthalmologic subspecialties, testing the hypothesis that advanced language models can achieve accuracy levels comparable to human experts.

  • What was learned from the study? Our results showed that ChatGPT-01 achieved the highest overall accuracy (84.9%), excelling in neuro-ophthalmology with a 95.5% accuracy, while DeepSeek and ChatGPT-4.0 each achieved 79.2%, and Qwens reached 64.2%.

  • What specific outcomes were observed? In glaucoma, AI model accuracies ranged from 54.5% to 63.6%, with some models slightly surpassing the human expert average of 51.5%, underscoring the diagnostic difficulty of this condition.

  • What has been learned and future implications? These findings highlight the potential of AI as a valuable adjunct to clinical judgment in ophthalmology, although further research and the integration of multimodal data are essential to optimize these tools for routine clinical practice.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study did not receive any funding.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • ↵* signifies first co-authorship and equivalent work performed

Data Availability

All data produced in the present work are contained in the manuscript.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted March 17, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports
Zain S. Hussain, Mohammad Delsoz, Muhammad Elahi, Brian Jerkins, Elliot Kanner, Claire Wright, Wuqaas M. Munir, Mohammad Soleimani, Ali Djalilian, Priscilla A. Lao, Joseph W. Fong, Malik Y. Kahook, Siamak Yousefi
medRxiv 2025.03.14.25323836; doi: https://doi.org/10.1101/2025.03.14.25323836
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Performance of DeepSeek, Qwen 2.5 MAX, and ChatGPT Assisting in Diagnosis of Corneal Eye Diseases, Glaucoma, and Neuro-Ophthalmology Diseases Based on Clinical Case Reports
Zain S. Hussain, Mohammad Delsoz, Muhammad Elahi, Brian Jerkins, Elliot Kanner, Claire Wright, Wuqaas M. Munir, Mohammad Soleimani, Ali Djalilian, Priscilla A. Lao, Joseph W. Fong, Malik Y. Kahook, Siamak Yousefi
medRxiv 2025.03.14.25323836; doi: https://doi.org/10.1101/2025.03.14.25323836

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Ophthalmology
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (867)
  • Anesthesia (306)
  • Cardiovascular Medicine (4480)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (614)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15276)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6643)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4602)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1622)
  • Hematology (544)
  • HIV/AIDS (1275)
  • Infectious Diseases (except HIV/AIDS) (15959)
  • Intensive Care and Critical Care Medicine (1110)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6692)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1152)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (668)
  • Pediatrics (1703)
  • Pharmacology and Therapeutics (699)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5494)
  • Public and Global Health (9284)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (720)
  • Sports Medicine (535)
  • Surgery (720)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (266)