Abstract
With an estimated 3 billion people globally lacking access to dermatological care, technological solutions leveraging artificial intelligence (AI) have been proposed to improve access1. Diagnostic AI algorithms, however, require high-quality datasets to allow development and testing, particularly those that enable evaluation of both unimodal and multimodal approaches. Currently, the majority of dermatology AI algorithms are built and tested on proprietary, siloed data, often from a single site and with only a single image type (i.e., clinical or dermoscopic). To address this, we developed and released the Melanoma Research Alliance Multimodal Image Dataset for AI-based Skin Cancer (MIDAS) dataset, the largest publicly available, prospectively-recruited, paired dermoscopic- and clinical image-based dataset of biopsy-proven and dermatopathology-labeled skin lesions. We explored model performance on real-world cases using four previously published state-of-the-art (SOTA) models and compared model-to-clinician diagnostic performance. We also assessed algorithm performance using clinical photography taken at different distances from the lesion to assess its influence across diagnostic categories.
We prospectively enrolled 796 patients through an IRB-approved protocol with informed consent representing 1290 unique lesions and 3830 total images (including dermoscopic and clinical images taken at 15-cm and 30-cm distance). Images represented the diagnostic diversity of lesions seen in general dermatology, with malignant, benign, and inflammatory lesions that included melanocytic nevi (22%; n=234), invasive cutaneous melanomas (4%; n=46), and melanoma in situ (4%; n=47). When evaluating SOTA models using the MIDAS dataset, we observed performance reduction across all models compared to their previously published performance metrics, indicating challenges to generalizability of current SOTA algorithms. As a comparative baseline, the dermatologists performing biopsies were 79% accurate with their top-1 diagnosis at differentiating a malignant from benign lesion. For malignant lesions, algorithms performed better on images acquired at 15-cm compared to 30-cm distance while dermoscopic images yielded higher sensitivity compared to clinical images.
Improving our understanding of the strengths and weaknesses of AI diagnostic algorithms is critical as these tools advance towards widespread clinical deployment. While many algorithms may report high performance metrics, caution should be taken due to the potential for overfitting to localized datasets. MIDAS’s robust, multimodal, and diverse dataset allows researchers to evaluate algorithms on our real-world images and better assess their generalizability.
Competing Interest Statement
AC was an investigator for Skin Analytics. JK is consultant to Hims, Enspectra Health, research collaborator with Google Research, and on advisory board for Skin Analytics. PT received honoraria from Silverchair, unrestricted grants for education projects from Lilly, and honoraria for lectures from AbbVie, Lilly, FotoFinder and Novartis. SSH is the founder, chief executive officer, and chief technical officer of IDerma, Inc. RD has served as an advisor to MDAlgorithms and Revea and received consulting fees from Pfizer, L'Oreal, Frazier Healthcare Partners, and DWA, and research funding from Union Chimique Belge (UCB), and was an investigator for Skin Analytics. RN is a consultant to Enspectra Health and Sanctum, LLC and was an investigator for Skin Analytics.
Funding Statement
This publication is based on research supported by the Melanoma Research Alliance (MRA)-L'Oreal Dermatological Beauty Brands Team Science Award, along with philanthropic funding from the David Mair and Vanessa Vu-Mair Artificial Intelligence in Skin Cancer Fund and the Tal & Cinthia Simon Melanoma Research Fund at Stanford Medicine. We also received a Stanford Human-Centered Artificial Intelligence Google Cloud Credits Grant to support compute efforts.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Institutional Review Board of Stanford University under IRB#36050 gave ethical approval for this work. Institutional Review Board of Cleveland Clinic Foundation under IRB#20-666 gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Data produced are available online at https://stanfordaimi.azurewebsites.net/datasets/f4c2020f-801a-42dd-a477-a1a8357ef2a5





