Adversarial Attack Vulnerability of Deep Learning Models for Oncologic Images

Background: Deep learning (DL) models have shown the ability to automate the classification of medical images used for cancer detection. Unfortunately, recent studies have found that DL models are vulnerable to adversarial attacks which manipulate models into making incorrect predictions with high confidence. There is a need for better understanding of how adversarial attacks impact the predictive ability of DL models in the medical image domain. Methods: We studied the adversarial attack susceptibility of DL models for three common imaging tasks within oncology. We investigated how PGD adversarial training could be employed to increase model robustness against FGSM, PGD, and BIM attacks. Finally, we studied the utility of adversarial sensitivity as a metric to improve model performance. Results: Our experiments showed that medical DL models were highly sensitive to adversarial attacks, as visually imperceptible degrees of perturbation (<0.004) were sufficient to deceive the model the majority of the time. DL models for medical images were more vulnerable to adversarial attacks compared to DL models for non-medical images. Adversarial training increased model performance on adversarial samples for all classification tasks. We were able to increase model accuracy on clean images for all datasets by excluding images most vulnerable to adversarial perturbation. Conclusion: Our results indicated that while medical DL systems are extremely susceptible to adversarial attacks, adversarial training show promise as an effective defense against attacks. Adversarial susceptibility of individual images can be used to increase model performance by identifying images most at-risk for misclassification. Our findings provide a useful basis for designing more robust and accurate medical DL models as well as techniques to defend models from adversarial attack.


Introduction:
Deep learning (DL) models have demonstrated increasing utility within the field of oncology and particular promise in analyzing growing amounts of oncologic imaging [1,2]. DL models have been validated across a variety of diagnostic imaging modalities including MRI, CT, and X-ray images with classification accuracy often rivaling trained clinicians [3][4][5][6][7][8]. As widespread clinical implementation of DL models becomes a more realistic possibility, the safety of such models in healthcare is becoming a topic of increasing importance [9][10][11].
One concerning limitation of DL models that may hinder safe implementation is their susceptibility to adversarial attacks. Adversarial attacks on DL models occur when data is manipulated with small perturbations specifically designed to cause a decrease in DL model performance [12][13][14][15]. The vulnerability of DL models to adversarial attacks stems from the fact that DL models are often unstable locally, generating significantly different outputs on inputs which vary only slightly [16,17]. Although adversarial images created from adversarial attacks are often imperceptible from unmodified images to the human eye, they have been shown to significantly reduce the accuracy of DL models [18][19][20].
Previous work concerning adversarial attacks on DL models has largely focused on non-medical images, and the vulnerability of oncologic images to such attacks is relatively unknown [20,21].
Although techniques to defend against adversarial attacks have been proposed for non-medical images, their generalizability on medical images is unclear. Recent rises in cyberattacks on hospital systems shed light on the weaknesses of healthcare systems and suggest that adversarial attacks on clinically implemented DL models have genuine potential for harm [22][23][24][25][26]. There is a pressing need to understand both the vulnerability of medical DL models to adversarial attacks and the relative effectiveness of current adversarial defense techniques in the medical setting.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01. 17.21249704 doi: medRxiv preprint In this study we explored adversarial attacks on DL models trained on common imaging modalities in clinical oncology. Specifically, we investigated the relative susceptibility of different oncologic imaging modalities to adversarial attacks and the effectiveness of adversarial defense techniques in mitigating DL model performance when facing adversarial attacks. Lastly, we explored the utility of adversarial attack susceptibility as a metric to improve the overall performance of DL models used on oncologic images.

Datasets
We identified three medical imaging modalities commonly used in oncology-computed tomography (CT), mammography, and magnetic resonance imaging (MRI) -as the basis of our analysis.
CT images were acquired from the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) collection [27], which contains thoracic CT scans from 1018 cases collected from 15 clinical sites across the US. The dataset included XML annotations from 4 experienced radiologists outlining the nodules, which were rated from 1-5 (1 being least likely for malignancy and 5 being most). Some nodules had associated pathological reports which classified the nodules as cancer or non-cancer, and for those without pathology information, nodules with a rating of 4 or 5 were assumed as cancer. We used 1702 lung CT images from the LIDC-IDRI dataset, and the images were divided into a training and test set containing 1140 images and 562 images, respectively.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 20, 2021.  [28], which contains mammograms from 1522 cases (753 calcification cases and 891 mass cases, both benign and malignant) from 4 sites across the US. The dataset, a standardized and updated subset of the original DDSM, contains algorithmically obtained segmentations of regions of interest (ROI) and metadata information for each case. We obtained 1696 scanned film mammograms from the CBIS-DDSM and split the images into two subsets for training (1318 images) and testing (378 images).
MRI images were obtained from a MRI dataset of 831 patients with 3,596 brain metastases treated with primary stereotactic radiosurgery (SRS) at our institution between 2000-2018 [29].
Patients with prior resection or prior radiation treatment were excluded along with metastases <5 mm. We sampled 2000 slices of normal brain tissue and 2000 slices containing tumor. The final dataset of 4000 samples contains two classes ("tumor" and "non-tumor") and is split into 2680 training images and 1320 test images.
For all medical datasets, the primary outcome of interest was the presence of cancer in each image. Additionally, we used two common non-medical imaging datasets, MNIST [30] and CIFAR-10 [31], as a control for our study as a means to compare adversarial susceptibility between DL models for medical and non-medical images. The MNIST dataset is composed of 70,000 square 28x28 pixel grayscale images of handwritten single digits (60,000 training and 10,000 testing images). The images are divided into 10 classes representing the digits 0-9. The CIFAR-10 dataset consists of 60,000 32x32 color images in 10 classes (50,000 training images and 10,000 test images). For each dataset, the classes were balanced. All images were centercropped and resized, and pixels were normalized to have unit variance.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 20, 2021.

Models
For all classification tasks, we used an Imagenet pretrained VGG16 [32] as the base network, followed by a dropout layer of rate 0.5, a flatten layer, a dense layer of 4096 neurons, another dropout layer of rate 0.5, a dense layer of 1024 neurons, and a K-neuron dense layer for classification. Details regarding model architecture and hyperparameters for model training are shown in the Supplement. We also used simple data augmentations including random rotations and horizontal/vertical flips.

Adversarial Attack Vulnerability
Three first-order adversarial attack methods-Fast Gradient Sign Method (FGSM) [15], Projected Gradient Descent (PGD) [20], and Basic Iterative Method (BIM) [12]-were used to generate adversarial images on the medical and non-medical image datasets. For each attack method, the aim is to maximize the model's classification error while minimizing the difference between the adversarial sample and original sample. All the attacks considered are bounded under a predefined maximum perturbation ε with respect to the L∞ norm. Full descriptions of the adversarial attacks are detailed in the Supplement. Example images from each dataset, along with their adversarial counterparts, are shown in Figure 1.
We investigated whether adversarial attacks on DL models for medical images were more effective than those on DL models for non-medical images. We measured attack difficulty by determining the smallest maximum perturbation ε required for attacks to generally succeed.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Effectiveness of Adversarial Training
One successful defense mechanism against adversarial attacks is adversarial training, which aims to improve model robustness by integrating adversarial samples into the training stage [20,21]. In particular, PGD adversarial training has been shown to achieve state-of-the-art robustness against L∞ attacks on MNIST, CIFAR-10, and ImageNet datasets [20,33,34]. When Athalye et . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint al. broke an ensemble of adversarial defenses using novel attack techniques, PGD adversarial training was one of the few defenses which remained effective [35].
While adversarial training is a promising approach to defend models against attacks, its effectiveness varies by classification task. The success of adversarial training is correlated with the distance from the test image to the manifold of training images [36]. When the test images are far away from the training data, adversarial training becomes less effective at increasing model robustness. The selection of hyperparameters also influences the success of adversarial training [36]. Additionally, adversarial training for DL models have been primarily designed for non-medical image datasets with large labeled training sets, whereas medical datasets usually have small labeled training sets [37]. Due to these limitations of adversarial training, it's important to study the usefulness of adversarial training on a case-by-case basis.
Here, we proposed using multi-step PGD adversarial training to increase the robustness of our DL models against adversarial attacks. Under this K-PGD adversarial training algorithm, the inner loop constructs adversarial examples by PGD-K and the outer loop updates the model using minibatch SGD on the generated samples [20,38]. The hyperparameters for adversarial training are detailed in the Supplement. We measured the effectiveness of adversarial training by comparing model accuracy on adversarial samples of varying maximum perturbation size before and after adversarial training.

Image Level Adversarial Vulnerability and Model Performance
We examined the effect of individual images' adversarial vulnerability (i.e., level of adversarial perturbation necessary for model prediction on the adversarial image to differ from that on the clean image) on model performance. We hypothesized that images most sensitive to adversarial . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint attack are more likely to be misclassified by the model in the absence of adversarial perturbation.
By excluding images most sensitive to adversarial perturbation, we aimed to improve model performance on the remaining dataset under normal conditions (no attack). For each image, we calculated the minimum perturbation size necessary to flip the model prediction on the adversarial image (from the model prediction on the clean image). We identified the 20% of images most vulnerable to adversarial attack and excluded them from the test set. We then tested the performance of the original model on the reduced test set.
The proposed networks were implemented in Python 2.7 using TensorFlow 1.15.3 framework [39]. Adversarial attacks and training experiments were conducted via the Adversarial Robustness Toolbox (ART) version 1.4.1, a Python library for machine learning security (https://github.com/Trusted-AI/adversarial-robustness-toolbox) [40]. The code to reproduce the analyses and results is available online at https://github.com/Aneja-Lab-Yale/Aneja-Lab-Public-Adversarial-Imaging.

Adversarial Attack Sensitivity
Both medical and non-medical DL models were highly susceptible to adversarial attacks, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  Medical DL models appeared significantly more vulnerable to adversarial attacks compared to non-medical DL models. When examining the minimum perturbation size necessary for majority of attacks to be effective, a smaller ε was required for attacks to be successful on DL models for medical images compared to DL models for non-medical images. For the medical datasets, strong attacks (PGD and BIM) succeeded most of the time with tiny, visually imperceptible perturbations (ε < 0.004), while the non-medical datasets required much larger perturbations for attacks to be highly effective. Furthermore, the medical datasets themselves also varied in levels of susceptibility to adversarial attacks. For example, for a PGD attack with perturbation size of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021.  Figure 2).

Effectiveness of Adversarial Training
By reducing the rate of misclassification on adversarial samples, adversarial training led to increased robustness of DL models against adversarial attacks. After adversarial training was applied to our original VGG16 model, classification accuracy on adversarial examples increased for all datasets (Figure 3). Adversarial training caused model accuracy to increase by up to 48% for CT, 29% for mammogram, and 53% for MRI (Table 1).  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion:
The role of diagnostic imaging is increasing throughout clinical oncology. DL models are an emerging technique which have shown promise in helping analyze diagnostic images and represent a cost-effective tool to supplement human-decision making [41,42]. However, vulnerability to adversarial attacks remains a potential barrier to fulfilling the promise of DL models in oncology. It is increasingly important to understand how adversarial images can be crafted to deceive DL models in the medical domain and whether proposed defenses against adversarial attacks from the non-medical domain represent a viable solution. In this study, we found that DL models for medical images are more vulnerable to adversarial attacks compared to DL models for non-medical images. Specifically, attacks on medical images require smaller perturbation sizes to generally succeed compared to attacks on non-medical images.
Furthermore, we found adversarial training methods commonly used on non-medical imaging datasets to have limited effectiveness against adversarial attacks on medical images commonly found in oncology. Finally, we showed that identifying images most susceptible to adversarial attacks maybe helpful in improving overall performance of DL models on medical images.
This study corroborates previous literature showing that medical DL models are highly susceptible to adversarial attacks. Several recent works have found that state-of-the-art DL architectures like Inception and UNet perform poorly on medical imaging analysis tasks when placed under adversarial attack [9,14,[43][44][45]. Our findings demonstrate that DL models for medical images (CT, mammography, MRI) are extremely vulnerable to adversarial attacks, with small perturbations (< 0.004) being associated with sharp drops in model performance (>50%).
Adversarial perturbations on such a scale were confirmed to be imperceptible to the human eye, despite being sufficient to fool the DL model with great success. Our work extends the findings . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint of previous studies by showing that DL models for medical images across various imaging modalities all exhibit vulnerability to adversarial attack and that there are different levels of sensitivity to adversarial attack between imaging modalities. Furthermore, while most studies used only one fixed perturbation size for adversarial attack, we varied perturbation size along a broad range in order to examine the relationship between model performance and attack strength.
In addition, our results reinforce previous work which showed that DL models for medical images are more vulnerable to adversarial attack than DL models for non-medical images. Ma et al. showed that for adversarial attacks to generally succeed on medical images (fundoscopy, dermoscopy, and chest X-ray), the perturbation size necessary was much smaller than what was reported for non-medical images in another study by Shafahi et al. [14,38]. We compared the adversarial sensitivity of DL models for medical images to that of DL models for non-medical images by using two non-medical image datasets (MNIST, CIFAR-10) as a control. By applying the same attack settings to all datasets, we determined that DL models for medical images were significantly more susceptible to adversarial attacks than DL models for non-medical images.
One reason for this behavior could be that highly medical images are highly standardized and small adversarial perturbations dramatically distort their distribution in latent feature space [37,46]. Another factor could be the overparameterization of DL models for medical image analysis, as sharp loss landscapes around medical images lead to higher adversarial vulnerability [14].
Our findings also support previous studies which found adversarial training to be an effective defense against adversarial attacks on DL image classification models [20]. In recent works, adversarial training has shown to improve DL model robustness for multiple medical imaging modalities like lung CT [47] and retinal OCT [37]. Our work proposed applying a PGD adversarial training approach to DL models for lung CTs, mammograms, and brain MRIs, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. While PGD adversarial training is effective in defending against weaker attacks, it showed less success with strong attacks with greater perturbation sizes.
Our results differed from previous work which found very limited success with applying adversarial training to DL models for medical imaging. For instance, Hirano et al. found that adversarial training did not lead to any increased model robustness for classifying dermoscopy, OCT, and chest X-ray images [48]. We found significant increases in model performance (up to 50% increases in model accuracy) on adversarial samples due to adversarial training. The difference in effectiveness of adversarial training can be attributed to the different adversarial training protocols used. At the same time, our adversarial training results on our medical DL models were far from satisfactory. Because medical images have fundamentally differently properties than non-medical images, adversarial defenses well-suited for non-medical images may not be generalizable to medical images [14,37]. Still, our study is novel in its application of Madry's PGD training algorithm to DL models for medical image classification.
There are a number of limitations to our study which may bias our findings. First, we only used 2-class medical imaging classification tasks. Thus, our findings might not generalizable to multi-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint class or regression problems using medical images. Given a number of medical diagnoses problems involve two or a small number of classes, our findings are likely still widely applicable to a large portion of medical imaging classification tasks. Also of note is the fact that we focused on first-order adversarial attacks rather than higher-order attacks, which have been shown to be more resistant against PGD adversarial training [49]. While the most commonly used adversarial attacks are first-order attacks, there is still need for additional research on how to defend DL models for medical images against higher-order attacks. A final limitation is that we used traditional supervised adversarial training to improve model robustness, while other nuanced methods like semi-supervised adversarial training and unsupervised adversarial training exist [37,50,51]. While we demonstrated that supervised adversarial training is an effective method to improve model performance on adversarial examples, an interesting direction for future work would be to compare the utility of supervised adversarial training with that of semi-supervised or unsupervised adversarial training on DL models for medical images.

Conclusion
In this work, we explored the issue of adversarial attacks on DL models used for medical image analysis in clinical oncology. The paper first examined the sensitivity of DL models to adversarial attacks across increasing perturbation magnitudes for different medical and nonmedical imaging datasets, demonstrating that DL models for medical images are more susceptible to attacks than DL models for non-medical images. We then showed that PGD adversarial training effectively improved model robustness against adversarial attack. Finally, we saw that adversarial sensitivity of individual images could be used as a metric to improve model performance. By shedding light on the behavior of adversarial attacks on medical DL systems in . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint oncology, the findings from this paper can help facilitate the development of more secure medical imaging DL models and techniques to circumvent adversarial attacks.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 20, 2021. ; https://doi.org/10.1101/2021.01.17.21249704 doi: medRxiv preprint