PT  - JOURNAL ARTICLE
AU  - Adriel Saporta
AU  - Xiaotong Gui
AU  - Ashwin Agrawal
AU  - Anuj Pareek
AU  - Steven QH Truong
AU  - Chanh DT Nguyen
AU  - Van-Doan Ngo
AU  - Jayne Seekins
AU  - Francis G. Blankenberg
AU  - Andrew Y. Ng
AU  - Matthew P. Lungren
AU  - Pranav Rajpurkar
TI  - Deep learning saliency maps do not accurately highlight diagnostically relevant regions for medical image interpretation
AID  - 10.1101/2021.02.28.21252634
DP  - 2021 Jan 01
TA  - medRxiv
PG  - 2021.02.28.21252634
4099  - http://medrxiv.org/content/early/2021/03/02/2021.02.28.21252634.short
4100  - http://medrxiv.org/content/early/2021/03/02/2021.02.28.21252634.full
AB  - Deep learning has enabled automated medical image interpretation at a level often surpassing that of practicing medical experts. However, many clinical practices have cited a lack of model interpretability as reason to delay the use of “black-box” deep neural networks in clinical workflows. Saliency maps, which “explain” a model’s decision by producing heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. In this work, we demonstrate that the most commonly used saliency map generating method, Grad-CAM, results in low performance for 10 pathologies on chest X-rays. We examined under what clinical conditions saliency maps might be more dangerous to use compared to human experts, and found that Grad-CAM performs worse for pathologies that had multiple instances, were smaller in size, and had shapes that were more complex. Moreover, we showed that model confidence was positively correlated with Grad-CAM localization performance, suggesting that saliency maps were safer for clinicians to use as a decision aid when the model had made a positive prediction with high confidence. Our work demonstrates that several important limitations of interpretability techniques for medical imaging must be addressed before use in clinical workflows.Competing Interest StatementThe authors have declared no competing interest.Funding StatementN/AAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The project did not involve human subjects researchAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesCheXpert data is available at https://stanfordmlgroup.github.io/competitions/chexpert/. The validation set and corresponding benchmark radiologist annotations will be available online for the purpose of extending the study. https://stanfordmlgroup.github.io/competitions/chexpert/