Abstract
As the use of AI grows in clinical medicine, so does the need for better explainable AI (XAI) methods. Model based XAI methods like GradCAM evaluate the feature maps generated by CNNs to create visual interpretations (like heatmaps) that can be evaluated qualitatively. We propose a simple method utilizing the most important (highest weighted) of these feature maps and evaluating it with the most important clinical feature present on the image to create a quantitative method of evaluating model performance. We created four Residual Neural Networks (ResNets) to identify clinically significant prostate cancer on two datasets (1. segmented prostate image and 2. full cross sectional pelvis image (CSI)) and two model training types (1. transfer learning and 2. from-scratch) and evaluated the models on each. Accuracy and AUC was tested on one final full CSI dataset with the prostate tissue removed as a final test set to confirm results. Accuracy, AUC, and co-localization of prostate lesion centroids with the most important feature map generated for each model was tabulated and compared to co-localization of prostate lesion centroids with a GradCAM heatmap. Prostate lesion centroids co-localized with any model generated through transfer learning ≥97% of the time. Prostate lesion centroids co-localized with the segmented dataset 86 > 96% of the time, but dropped to 10% when segmented model was tested on the full CSI dataset and 21% when model was trained and tested on the full CSI dataset. Lesion centroids co-localized with GradCAM heatmap 98% > 100% on all datasets except for that trained on the segmented dataset and tested on full CSI (73%). Models trained on the full CSI dataset performed well (79% > 89%) when tested on the dataset with prostate tissue removed, but models trained on the segmented dataset did not (50 > 51%). These results suggest that the model trained on the full CSI dataset uses features outside of the prostate to make a conclusion about the model, and that the most important feature map better reflected this result than the GradCAM heatmap. The co-localization of medical region of abnormality with the most important feature map could be a useful quantitative metric for future model explainability.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
The Cancer Imaging Archive (TCIA) ProstateX collection: https://www.cancerimagingarchive.net/collection/prostatex/
Abbreviations
- ML
- Machine Learning
- ResNet
- Residual Neural Network
- DL
- Deep Learning
- pCA
- Prostate Cancer
- CS
- Clinically Significant Prostate Cancer
- NCS
- Not Clinically Significant Prostate Cancer
- CSI
- Cross Sectional Image