Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Evaluating General-Purpose LLMs for Patient-Facing Use: Dermatology-Centered Systematic Review and Meta-Analysis

View ORCID ProfileIrene S. Gabashvili
doi: https://doi.org/10.1101/2025.08.11.25333149
Irene S. Gabashvili
Aurametrix, USA
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Irene S. Gabashvili
  • For correspondence: irene{at}aurametrix.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Preview PDF
Loading

Abstract

Background General-purpose large language models (LLMs) have rapidly evolved from experimental tools into widely adopted components of healthcare. Their proliferation – accelerated by the “ChatGPT effect” – has sparked intense interest across patient-facing specialties. Among these, dermatology provides a high-visibility use case through which to assess LLM capabilities, evaluation practices, and adoption trends.

Objective To systematically review and meta-analyze quantitative evaluations of general-purpose LLMs in dermatology, while extracting broader insights applicable to patient-centered use of AI across medical fields.

Methods We conducted a multi-phase systematic review and meta-analysis, incorporating studies published through August 1, 2025. A total of 88 studies met inclusion criteria, covering over 100 dermatology-related tasks and yielding more than 2,500 normalized performance scores across metrics such as accuracy, sensitivity, readability, and clinical safety. This review also re-evaluates previously tested benchmarks to assess reproducibility and model improvement over time. Statistical analyses focused on heterogeneity (Cochran’s Q, I²), evaluator effects, and evolving methodological practices.

Results LLM performance varied by architecture, prompt design, and task complexity. No single model demonstrated universal superiority, though retrieval-augmented and hybrid systems consistently outperformed others on complex reasoning tasks. Performance also varied by task, with smaller models sometimes outperforming flagships and “thinking” modes occasionally over-reasoning. Dermatology-specific models excelled in narrow contexts but lacked generalizability. Evaluation practices matured over time – shifting from static benchmarks to multi-rubric frameworks and simulations – yet high heterogeneity persisted (I² ≈ 90%) due to differences in study design and evaluator type.

Sentiment toward LLMs evolved from early skepticism (2022), to over-optimism (2023), to a more critical and diverse perspective by 2025. Preliminary ChatGPT-5 data, though limited to a small set of challenging conditions, suggest lower hallucination rates and better recognition of dermatological presentations on darker skin.

Conclusions LLMs are entering clinical workflows rapidly, yet static evaluation methods often fail to keep pace. Our findings underscore the need for dynamic, modular, and evaluator-aware frameworks that reflect real-world complexity, patient interaction, and personalization. As traditional benchmarks lose relevance in the face of rapidly evolving model architectures, future evaluation strategies must embrace living reviews, human-in-the-loop simulations, and transparent meta-evaluation. Although dermatology serves as the focal domain, the challenges and recommendations articulated here are broadly applicable to all patient-facing fields in medicine.

Limitations High heterogeneity, frequent model deprecation, and inconsistent study designs limit generalizability. While preliminary evidence from ChatGPT-5 shows improved performance for rare diseases and underrepresented skin tones, comprehensive, multi-model validation remains lacking. AI reliance on indexed literature continues to restrict the incorporation of patient-led research and independent evidence.

Protocol Registration PROSPERO registration no. CRD42023417336

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://www.crd.york.ac.uk/PROSPERO/view/CRD42023417336

Funding Statement

This study did not receive any funding

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted August 11, 2025.
Download PDF

Supplementary Material

Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Evaluating General-Purpose LLMs for Patient-Facing Use: Dermatology-Centered Systematic Review and Meta-Analysis
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Evaluating General-Purpose LLMs for Patient-Facing Use: Dermatology-Centered Systematic Review and Meta-Analysis
Irene S. Gabashvili
medRxiv 2025.08.11.25333149; doi: https://doi.org/10.1101/2025.08.11.25333149
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Evaluating General-Purpose LLMs for Patient-Facing Use: Dermatology-Centered Systematic Review and Meta-Analysis
Irene S. Gabashvili
medRxiv 2025.08.11.25333149; doi: https://doi.org/10.1101/2025.08.11.25333149

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (614)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15277)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6644)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4603)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15960)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5494)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (535)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)