Abstract
Diabetic foot ulcers develop for up to 1 in 3 patients with diabetes. While ulcers are costly to manage and often necessitate an amputation, they are preventable if intervention is initiated early. However, with current standard of care, it is difficult to know which patients are at highest risk of developing an ulcer. Recently, thermal monitoring has been shown to catch the development of complications around 35 days in advance of onset. We seek to use thermal scans of patients’ with diabetes feet to automatically detect and classify a patient’s risk for foot ulcer development so that intervention may be initiated. We began by comparing performance of various architectures (backbone: DFTnet, ResNet50, and Swin Transformer) trained on visual spectrum images for monofilament task. We moved forward with the highest accuracy model which used ResNet50 as backbone (DFTNet acc. 68.18%, ResNet50 acc. 81.81%, Transformers: acc. 72.72%) to train on thermal images for the risk prediction task and achieved 96.4% acc. To increase interpretability of the model, we then trained this same architecture to predict two standard of care risk scores: high vs low-risk monofilament scores (81.8% accuracy) and high vs low-risk biothesiometer score (77.4% accuracy). We then sought to improve performance by facilitating the model’s learning. By annotating feet bounding boxes, we trained our own YoloV4 detector to automatically detect feet in our images (mAp accuracy of 99.7% and IoU of 86.%). By using these bounding box predictions as input to the model, this improved performance of our two classification tasks: MF 84.1%, BT 83.9%. We then sought to further improve the accuracy of these classification tasks with two further experiments implementing visual images of the feet: 1) training the models only on visual images (Risk: 97.6%, MF: 86.3%, BT: 80.6%), 2) concatenating visual images alongside the thermal images either early (E) or late (L) fusion in the architecture (Risk, E: 99.4%, L: 98.8% ; MF, E: 86.4%, L: 90.9%; BT, E: 83.9%, L: 83.9%). Our results demonstrate promise for thermal and visible spectrum images to be capable of providing insight to doctors such that they know which patients to intervene for in order to prevent ulceration and ultimately save the patient’s limb.
1 Introduction
Diabetic foot ulcers (DFU), or open sores on the bottom of the feet, develop for 1 in 3 patients with diabetes [1]. If left untreated, infection will soon threaten a patient’s life and may require amputation [2]. These amputations precede up to 80% of all lower-limb amputations [3], with one occurring every 20 seconds worldwide. Intervention at the earliest warning sign of a developing ulcer is key to preventing them from forming, but it’s difficult to know which patients are at highest risk of developing an ulcer [4]. In addition, barriers to access treatment in remote or low-resource extends the time until treatment, making it more likely patients will require an amputation. It is estimated that between of 50-90% of people living with diabetes are undiagnosed in rural areas of India [1].
The first indication of risk is the development of neuropathy. Standard neuropathy tests include the monofilament score (0-10 points, >6 is healthy) or biothesiometer score (0-50V, <25V is healthy). However, even with a positive neuropathy score, it may be months or years until an ulcer forms. In addition, current neuropathy test methods are time consuming, require training and AC power, are labor intensive, and outside the reach of low-resource healthcare settings.
In pursuit of a more powerful test for ulcer risk assessment, it has been shown that pre-ulceration areas become warm and inflamed around 35 days before the ulcer actually appears [5,8,14]. This warning sign is invisible to the naked eye, but visible with thermal imaging (Figure 1) and could give doctors enough lead time to intervene and prevent the ulcer from forming. Our team member Kayla moved to India on a Fulbright Fellowship and collected thermal images of patients at all different stages of ulcer development. She then trained a deep learning model to classify high risk patients (those having previous history of ulceration) from low-risk ulcer patients (those without diabetic neuropathy) based on thermal scans alone, with the model performing with 83% accuracy on the balanced dataset.
1.1 Overall problem plan
Improve upon 83% baseline accuracy
- Facilitate learning in our models by using segmentation or detection of the feet to then be used as input to our ulcer risk classification model. We hypothesize that this will help the model learn faster by only being presented with the regions of interest (the pixels pertaining to the feet).
- We plan to do this by experimenting with other pre-trained models and approaches (ResNet50, Transformer models, DFTNet and increased compute power).
- Understand if there is benefit to augmenting the input with a visible spectrum image. We will experiment with 1) early fusion: concatenating the images as input and 2) late-stage fusion: concatenating outputs of parallel models in the last layer of the model.
Increase the model’s explainability
- Our idea is to understand if the thermal images can also give accurate prediction of the patient’s neuropathy score (current standard of care for diagnosing neuropathy). This would make the model’s risk prediction clinically interpretable if it’s accompanied by a predicted neuropathy score.
Predict a patient’s monofilament score: use of a cat-like whisker, tapping 10 designated areas on the foot for detecting loss of sensation on the patient’s foot. We’ve labeled the patients neuropathy status as positive if a patient cannot feel at least 6/10 spots.
Predict a patient’s biothesiometer score: instrument used to identify vibration sensory loss, another component of peripheral sensory neuropathy. The instrument is operated from 0-50V of vibration. We’ve labeled the patient’s neuropathy status as positive if a patient cannot feel prior to 25V of vibration.
1.2 A summary of our results
By 1) experimenting with different models 2) implementing detection of feet in images as preprocessing to the model, and 3) augmenting input to the model with visual images, we were able to achieve over 99.4% accuracy on our test set in predicting a patient’s risk for ulcer development. In addition, our attempts to improve explainability of the model by predicting a patient’s neuropathy risk (current standard of care) achieved 83.9% accuracy in predicting monofilament score and 90.9% accuracy in predicting biothesiometer scores.
2 Related Work
Thermal Prediction classification
Traditional methods like Support Vector Machines (SVM), Adaboost, Random Forest, k-nearest neighbors (k-NN) etc. are very common classifiers for image classification. In [20], Ribeiro has applied SVM to the redundancy reduced diabetes data and achieved a high accuracy of 98.47%. In recent years, deep learning based methods have shown more promising performance in image classification. In [21], six deep CNN models have been used in thermograms image classification including MobilenetV2[22], Resnet18[23], Resnet50[23], DenseNet201[24], InceptionV3[25] and VGG19[26]. Among these models, MobilenetV2 outperforms others for a two feet thermogram image classification. In [7], Cruz-Vega et al. proposed the DFTNet for the five-level classification of diabetic foot thermograms, although their results were not tied to a clinical classification of foot ulcer risk.
More recently, Transformers have garnered more attention in the image classification field. In [27], Vision Transformer based on self-attention has been applied in several image recognition benchmarks (ImageNet, CIFAR-100, VTAB, etc.) and attains great results even compared with the state-of-the-art CNNs. Swin Transformer based on ViT [8] is a hierarchical Transformer with shifted windows when computing representation and has achieved excellent performance on several image classification, object detection, and semantic segmentation tasks.
The idea to use deep learning begin to classify thermograms of diabetic feet was first published by Goyal et al [19] who built a CNN architecture which they call DFUNet (Diabetic Foot Ulcer Net) built to classify thermal scans of patients with diabetes into one of 5 classes. Namely, they separate control patient thermal scans from 4 categories of thermal variation. These thermal categories simply bin from a histogram stratifying the variation seen in a patient’s thermal scan (0 = least variance in temperature, 5 = highest variance). However, they give no discussion of the clinical relevance of these 5 classes (healthy vs diabetic vs neuropathic vs ulceration). DFUNet performance was evaluated by the model’s accuracy at classifying each pair of classes from each other (1 vs 5, 1 vs 4, 1 vs 3 etc). Because our model’s output will identify the clinical risk category of the patient, it will therefore have more clinical relevance be poised to suggest if a patient should receive intervention.
Detection
Object detection locates and categorizes objects of interest in an image - challenging and core problem in the field of computer vision. Nowadays, object detection in deep learning is mainly divided into two categories, i.e., two-stage and one-stage object detection algorithms. Two-stage detectors are proposal-based detectors. It will first generate region proposals and then do classification on these proposals. In R-CNN[30], the regional proposals are produced based on Selective Search. Based on it, Fast R-CNN[31] then adopted the ROI pooling layer to convert feature maps with different sizes to a fixed size and enables end-to-end detector training on shared convolutional features which also shows compelling accuracy and speed[32]. In Faster R-CNN, the Region Proposal Network is developed to generate proposal regions. As for one-stage detectors, they are all region-free methods, category probabilities and position coordinates of objects are generated in one stage. SSD[33] uses multiscale feature maps, by combining these feature maps with different resolutions, it could adapt to do detections on targets in different sizes. YOLOv3[34] adopts residual blocks and also uses multiscale feature maps to get the bounding box output. As for YOLOv4[35], it has a high accuracy in real-time target detection and meanwhile finds the balance between accuracy and speed. More recently, anchor-free objection detectors have developed as well. FCOS[36] is an anchor box free and proposal-free one-stage object detector. It uses the Center-ness to reduce the number of low-quality bounding boxes which are far away from targets.
3 Data
Our team member Kayla spent a year living in India as a Fulbright scholar studying how healthcare workers identify and care for patients at high-risk for ulceration. She worked to identify and quantify the early thermal warning sign on patients’ feet in hopes that it could improve time to intervention for patients with diabetes. She captured thermal scans from March through May of 2019. The Ethics committee/IRB of CMC Vellore Hospital gave ethical approval for this work (IRB Min. No. 11644; dated 28.11.2018).
Images (thermal and visible)
The dataset include 241 patients’ who either were consented for study through the endocrinology deparment of CMC Vellore (Tamil Nadu, India) or the outpatient clinic at CMC Vellore - Chittoor campus (Andhra Pradesh, India). The study population includes 5 different clinical categories:
No diabetes - low ulcer risk
Diabetes, no neuropathy - low ulcer risk
Diabetes and neuropathy - unknown ulcer risk
Diabetes, ulcerated patients - high risk for new ulceration
Diabetes, neuropathy patients with previously healed ulcer - high risk for re-ulceration
For each patient, we recorded their status of diabetes, neuropathy score (monofilament 0-10, biothesiometer 0-50V), and whether there was currently and ulcer present. In addition, we captured a phone camera picture of the plantar region of their feet as well as a thermal scan using an FLIPOne Pro thermal camera [44]. These images were 640 × 480 pixels (an example of low and high-risk patients shown in Figure 2.)
The dataset includes 112 scans of low-risk feet, 68 images of neuropathy feet, and 55 scans of ulcerated feet. For risk classification, groups 1 and 2 are labeled low risk whereas groups 4 and 5 are labeled high risk. Group 3 is eliminated from this classification task because their risk is unknown.
Clinical data
Relevant to our experiments here, we also have access to the patients’ neuropathy test scores: both their monofilament and biothesiometer readings. Monofilament scores are on a scale from 0-10, with 6 or fewer being high risk for ulceration. Bioethesiometers scores are on a scale from 0-50, with greater than 25 being high risk for ulceration. Therefore, we’ve assigned 2 additional labels alluding to the monofilament and biothesiometer scores.
4 Methods
The goal of our project is to predict diabetic foot ulceration risk based on the thermal and visible spectrum images. To achieve the goal, we aim to provide a method consisting of ROI (region of interest) extraction and classification. The high-level process flow of our end-to-end model is shown in Fig 3.
4.1 Detection
Since the background in thermal images may have no contribution to the classification for the risk of having a new ulcer, we plan to extract a bounding box of each foot in hopes that inputting only the ROI of the feet will yield better performance. After thoroughly reviewing the data, we found that some images have limited contrast between the feet and the background, compromising performance of an unsupervised segmentation algorithm. For example, in some visible spectrum images, there are shadows on the feet and in thermal images, the temperature in some parts of the feet are similar to the background. Therefore, instead of creating a segmentation algorithm, we plan to train a YOLOv4[35] detection network to extract bounding boxes of the feet in our images. We will then experiment with using these bounding boxes as input to our classification models.
4.2 Classification
For the classification, we proposed a Convolution Neural Network based on ResNet-50[40] which is pre-trained on ImageNet. The input is first fed to the backbone to get the feature map. Then we use the Average Pooling layer to reduce the size of the extracted feature map from 2048 × 7 × 7 to 2048 × 1. After this, a two-cascade fully connected layer is adopted for classification. In this two-cascade fully connected layer, we also use Batch Normalization and ELU[41] to enhance the robustness of our proposed model. The structure of this proposed model is shown in Fig 4A.
Since we have both thermal images and visual spectrum images for each patient, we also developed multi-modal models in our experiments, implemented using two different strategies: early fusion and late fusion. For early fusion, the thermal images and visual spectrum images are concatenated at the beginning which doubles the size of the input channel. For the late fusion, the two images are first fed into the shared-weight backbone separately, and the outputs are then concatenated along the dimension of channels. These concatenated features then become input of the Average Pooling layer, followed by the two-cascade fully connected layers to get the final prediction. These two different multi-modal models are shown in Fig 4B.
We also used the TRIPOD checklist as suggested by the EQUATOR network when writing our report and is attached as supplemental material [43].
5 Results
5.1 Detection
In total, we have 212 visual spectrum images (train: 169, val: 43) which we used to train the YOLOv4 model in order to detect the left and right feet. During training, we used Adam optimizer and pretrained YOLO v4 model. First, we froze the backbone and trained the free layers for 20 epochs (with an initial lr = 0.001, decay: λ = 0.94). Then we unfreezed the backbone parameters and trained the whole network for 100 epochs (lr = 0.0001, decay: λ = 0.94). All the inputs are resized to 416 × 416. The output bounding boxes of YOLO v4 are shown in Figure 5. The detection network achieves very high mAP and also good IoU performance shown in Figure 5 and Figure 6.
5.2 Classification
Selection of backbone for classification
To begin with, we have chosen three candidate models as backbone which are DFTNet, ResNet50 and Swin Transform. Among them, we will select the best performing model as our backbone.
We ground our approach beginning with DFTNet which was developed to identify visible warning signs of ulceration. Although this classification is obvious and not particularly helpful since the ulcer is already visible, this is a good baseline because the pre-ulceration areas or “hot-spots” that show up in thermal images as shown in Figure 1 look geometrically very similar to ulcers that have broken through the skin. In addition, they are located in the same area as the already-formed ulcers. Therefore, beginning with the DFTNet architecture is a good baseline. As for ResNet50, it not only has an accessible pretrained model to use, but it also shows very promising results in other classification tasks. As for transformers, they have now become a hot trend in image classification tasks and Swin Transformer also shows good performances in classification and has accessible pre-trained models which could also be used in our task.
To pick the best model, we evaluate it’s baseline performance on predicting monofilament scores using visual spectrum images. Since Resnet50 outperformed the other two models, we chose it as our backbone, results displayed in Figure 7. We believe such poor performance of the Swin Transformer may be due to the small size of the dataset.
Prediction tasks
In addition to our main task predicting ulcer risk based on the thermal images, to enhance the interpretability of this model, we also conducted experiments on other tasks to predict Monofilament and Biothesiometer scores which are related to patient ulcer risk and used to evaluate the neuropathy of a patient. By showing the risk predicted label combined with monofilament and biothesiometer predicted labels, we believe it could help the clinicians better understand the risk prediction results of our model. In all three training tasks, we adopted pretrained ResNet50, trained by an Adam optimizer (initial lr = 0.0001, decay: λ = 150/(150 + epoch)). We also resize all the input to 256 × 256 and adopted focal loss [42] to balance positive and negative samples.
Risk Prediction
Our classes consist of low risk for ulceration (Group 1, 2) and high-risk patients (Group 4, 5). Namely, we leave out Group 3 (neuropathy group) since their risk for ulceration is an ambiguous group and not well understood. Once we train the model to differentiate between the well-defined low-risk and high-risk groups, we can then use this model to infer the risk score of the neuropathy patients (Group 3). The hypothesis is that those neuropathy patients which get classified as high-risk (i.e. their thermal scans are very similar to patients already with an ulcer) would be flagged as high-risk. Those patients who are classified to the low-risk category therefore have thermal scans are similar to those without neuropathy and without diabetes - two groups known to be at low-risk for ulceration.
For the risk prediction task, we had 167 patients from Groups 1, 2, 4, 5 in our datasets. The split of the dataset with labels is shown in Figure 8.
In order to understand what value the visual spectrum images may have on our classification task, we’ve design an ablation study to distill the ability of either image, or early or late combination of the two, in prediction of ulcer risk. For single modality (SM), we conducted experiments on thermal (T), as well as visual spectrum (VS) images separately. As for multi-modality (MM), we carried out both early fusion and late fusion MM models. In each experiment, we used 5-Fold cross-validation. The results of the average are shown in Figure 9.
From these results, both thermal and visual spectrum show good performance. Combining the two did improve the results of the SM model, with early fusion performing better than the late fusion model.
Monofilament
For the monofilament prediction task, we used the same architecture chosen for ulcer risk classification. In this task, 200 patients in total had monofilament scores recorded. The split of the dataset with labels is shown in Figure 10.
For the ablation study, we have conducted several experiments for the SM and MM models to distill whether applying bounding boxes of feet to the input(s) improve performance. The results are shown in Figure 10.
From Figure 11, we can see that applying bounding boxes improved performance, confirming our hypothesis that the background doesn’t help with learning. In addition, visual spectrum images outperformed thermal images. However both early and late fusion MM models didn’t improve results.
Biothesiometer
We used the same architecture chosen for ulcer risk classification. In this task, included the 139 patients that had a recorded biothesiometer reading. The split of the dataset with labels is shown in Figure 12. Following the same pattern of the monofilament ablation experiments, we carried out SM and MM model experiments. We also compared whether applying bounding boxes would improve performance. From the results (Figure 13), we see that thermal images with bounding boxes showed best performance but that both MM models didn’t improve the performance.
6 Conclusion
For various experiments we performed, our best accuracy is 99.43% when classifying high-risk patients from low-risk patients, resulting from the early fusion MM model. Using this model, we were able to perform inference on what the neuropathy patient’s risk scores would be (Figure 14). Interestingly, a few patients did in-fact score high, while a majority were classified as low-risk. The limitation here is that since we are unable to follow-up with these patients, we do not know their outcome to validate our prediction. For our future work, we would like to tconduct a longitudinal study where patients are tracked over time, collecting time series data which will allow the prediction task to be a time-until-ulceration prediction.
Despite our attempt to increase interpretability of our model by predicting monofilament and biothesiometer test scores to give transparency regarding the patient’s predicted neuropathy score, both early and late fusion multi-modality models didn’t show better performance as the risk classification task. We think it might be because of our chosen fusion mechanism. Since early or late fusion are just simple concatenation operations, we might need better algorithms to take thorough advantage of both modalities. For example we could add attention mechanisms in our future study.
One other major learning from this project is that visible spectrum images performed just as well if not better than the thermal images on all our classification tasks. If we could validate this in a larger study to show that our model is robust and generalizable, this would be revolutionary. This risk stratification would therefore not be limited by who have access to a thermal camera - theoretically, a patient could snap a picture from home to receive more understanding of their risk which could prompt them to seek care while the ulcer can still be prevented.To this end, we believe this technology has the potential to enable preventative care for patients with diabetes and ultimately prevent millions of amputations around the world.
Data Availability
All data produced in the present study may be made available upon reasonable request to the authors.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Kayla Huemer received a US Fulbright-Nehru Research Grant to complete the study at CMC Vellore Hospital in Tamil Nadu, India.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study was approved by the CMC Vellore IRB ethics committee. IRB Min. No. 11644; dated 28.11.2018
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes