Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Assessing DxGPT: Diagnosing Rare Diseases with Various Large Language Models

Juanjo do Olmo, Javier Logroño, Carlos Mascías, Marcelo Martínez, Julián Isla
doi: https://doi.org/10.1101/2024.05.08.24307062
Juanjo do Olmo
1Foundation29
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: juanjodoolmo{at}foundation29.org
Javier Logroño
1Foundation29
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Carlos Mascías
1Foundation29
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marcelo Martínez
1Foundation29
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Julián Isla
1Foundation29
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Diagnosing rare diseases is a significant challenge in healthcare, with patients often experiencing long delays and misdiagnoses. The large number of rare diseases and the difficulty for doctors to be familiar with all of them contribute to this problem. Artificial intelligence, particularly large language models (LLMs), has shown promise in improving the diagnostic process by leveraging their extensive knowledge to help doctors navigate the complexities of diagnosing rare diseases.

Foundation 29 presents a comprehensive evaluation of DxGPT, a web-based platform designed to assist healthcare professionals in the diagnostic process for rare diseases. The platform currently utilizes GPT-4, but this study also compares its performance with other large language models, including Claude 3, Gemini 1.5 Pro, Llama, Mistral, Mixtral, and Cohere Command R+. It is crucial to emphasize that DxGPT is not a medical device but rather a decision support tool that aims to aid in clinical reasoning.

This study extends beyond initial synthetic patient cases, incorporating real-world data from the RAMEDIS and Peking Union Medical College Hospital (PUMCH) datasets. The analysis followed two main metrics: Strict Accuracy (P1), how often the first diagnostic suggestion agreed with the real diagnosis, and Top-5 Accuracy (P1 + P5), how often the right diagnosis was in the top five suggestions. The results show a complex picture of diagnostic accuracy, with performance varying significantly across models and datasets:

  • On the synthetic dataset, closed models like GPT-4, Claude, and Gemini exhibited relatively high accuracy. Open models like Llama 3 and Mixtral performed reasonably well, though lagging behind.

  • On the RAMEDIS rare disease cases, Claude 3 Opus model demonstrated 55% Strict Accuracy and 70% Top-5 Accuracy, outperforming other closed models. Open models like Llama 3 and Mixtral showed moderate accuracy.

  • The PUMCH dataset proved challenging for all models, with the highest Strict Accuracy at 59.46% (GPT-4 Turbo 1106) and Top-5 Accuracy at 64.86%.

These findings demonstrate the potential of DxGPT and LLMs in improving diagnostic methods for rare diseases. However, they also emphasize the need for further validation, particularly in real-world clinical settings, and comparison with human expert diagnoses. Successful integration of AI into medical diagnostics will require collaboration between researchers, clinicians, and regulatory bodies to ensure safety, efficacy, and ethical deployment.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Foundation 29 received a grant from Takeda to develop pilot #1 for Global Rare Disease Commission https://www.globalrarediseasecommission.com/Report/. Pilot #1 is about exploring how to use artificial intelligence for RD diagnosis. Produced work is publically available at www.dx29.ai. Foundation 29 received a grant from GW Pharma to develop www.dx29.ai. This an open source tool and free of charge tool for physicians to accelerate time to diagnosis for patients with rare diseses. Produced work is publically available at www.dx29.ai. Foundation 29 received a grant from UCB Pharma to develop https://dxgpt.app/. This an open source tool based on GPT-4 Azure OpenAI model and is free of charge tool for physicians and patients to accelerate time to diagnosis for patients with rare diseses. Produced work is publically available at https://dxgpt.app/ and https://github.com/foundation29org/dxgpt_testing. Foundation 29 received a grant from Italfarmaco to develop https://dxgpt.app/. This an open source tool based on GPT-4 Azure OpenAI model and is free of charge tool for physicians and patients to accelerate time to diagnosis for patients with rare diseses. Produced work is publically available at https://dxgpt.app/ and https://github.com/foundation29org/dxgpt_testing. These grants are not related to any of these pharma's products. This study was funded by all these 4 grants as part of DxGPT development.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

https://huggingface.co/datasets/chenxz/RareBench

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • juanjodoolmo{at}foundation29.org

Data Availability

All data produced are available online at: https://huggingface.co/datasets/chenxz/RareBench and https://github.com/foundation29org/dxgpt_testing

https://huggingface.co/datasets/chenxz/RareBench

https://github.com/foundation29org/dxgpt_testing

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted May 09, 2024.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Assessing DxGPT: Diagnosing Rare Diseases with Various Large Language Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Assessing DxGPT: Diagnosing Rare Diseases with Various Large Language Models
Juanjo do Olmo, Javier Logroño, Carlos Mascías, Marcelo Martínez, Julián Isla
medRxiv 2024.05.08.24307062; doi: https://doi.org/10.1101/2024.05.08.24307062
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Assessing DxGPT: Diagnosing Rare Diseases with Various Large Language Models
Juanjo do Olmo, Javier Logroño, Carlos Mascías, Marcelo Martínez, Julián Isla
medRxiv 2024.05.08.24307062; doi: https://doi.org/10.1101/2024.05.08.24307062

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (430)
  • Allergy and Immunology (756)
  • Anesthesia (221)
  • Cardiovascular Medicine (3298)
  • Dentistry and Oral Medicine (365)
  • Dermatology (280)
  • Emergency Medicine (479)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1173)
  • Epidemiology (13384)
  • Forensic Medicine (19)
  • Gastroenterology (899)
  • Genetic and Genomic Medicine (5157)
  • Geriatric Medicine (482)
  • Health Economics (783)
  • Health Informatics (3274)
  • Health Policy (1143)
  • Health Systems and Quality Improvement (1193)
  • Hematology (432)
  • HIV/AIDS (1019)
  • Infectious Diseases (except HIV/AIDS) (14636)
  • Intensive Care and Critical Care Medicine (913)
  • Medical Education (478)
  • Medical Ethics (127)
  • Nephrology (525)
  • Neurology (4930)
  • Nursing (262)
  • Nutrition (730)
  • Obstetrics and Gynecology (886)
  • Occupational and Environmental Health (795)
  • Oncology (2524)
  • Ophthalmology (727)
  • Orthopedics (282)
  • Otolaryngology (347)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (544)
  • Pediatrics (1302)
  • Pharmacology and Therapeutics (551)
  • Primary Care Research (557)
  • Psychiatry and Clinical Psychology (4218)
  • Public and Global Health (7511)
  • Radiology and Imaging (1708)
  • Rehabilitation Medicine and Physical Therapy (1016)
  • Respiratory Medicine (980)
  • Rheumatology (480)
  • Sexual and Reproductive Health (498)
  • Sports Medicine (424)
  • Surgery (549)
  • Toxicology (72)
  • Transplantation (236)
  • Urology (205)