PT - JOURNAL ARTICLE AU - Adriel Saporta AU - Xiaotong Gui AU - Ashwin Agrawal AU - Anuj Pareek AU - Steven QH Truong AU - Chanh DT Nguyen AU - Van-Doan Ngo AU - Jayne Seekins AU - Francis G. Blankenberg AU - Andrew Y. Ng AU - Matthew P. Lungren AU - Pranav Rajpurkar TI - Deep learning saliency maps do not accurately highlight diagnostically relevant regions for medical image interpretation AID - 10.1101/2021.02.28.21252634 DP - 2021 Jan 01 TA - medRxiv PG - 2021.02.28.21252634 4099 - http://medrxiv.org/content/early/2021/03/02/2021.02.28.21252634.short 4100 - http://medrxiv.org/content/early/2021/03/02/2021.02.28.21252634.full AB - Deep learning has enabled automated medical image interpretation at a level often surpassing that of practicing medical experts. However, many clinical practices have cited a lack of model interpretability as reason to delay the use of “black-box” deep neural networks in clinical workflows. Saliency maps, which “explain” a model’s decision by producing heat maps that highlight the areas of the medical image that influence model prediction, are often presented to clinicians as an aid in diagnostic decision-making. In this work, we demonstrate that the most commonly used saliency map generating method, Grad-CAM, results in low performance for 10 pathologies on chest X-rays. We examined under what clinical conditions saliency maps might be more dangerous to use compared to human experts, and found that Grad-CAM performs worse for pathologies that had multiple instances, were smaller in size, and had shapes that were more complex. Moreover, we showed that model confidence was positively correlated with Grad-CAM localization performance, suggesting that saliency maps were safer for clinicians to use as a decision aid when the model had made a positive prediction with high confidence. Our work demonstrates that several important limitations of interpretability techniques for medical imaging must be addressed before use in clinical workflows.Competing Interest StatementThe authors have declared no competing interest.Funding StatementN/AAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The project did not involve human subjects researchAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesCheXpert data is available at https://stanfordmlgroup.github.io/competitions/chexpert/. The validation set and corresponding benchmark radiologist annotations will be available online for the purpose of extending the study. https://stanfordmlgroup.github.io/competitions/chexpert/