Abstract
Due to the high availability of large-scale annotated image datasets, knowledge transfer from pre-trained models showed outstanding performance in medical image classification. However, building a robust image classification model for datasets with data irregularity or imbalanced classes can be a very challenging task, especially in the medical imaging domain. In this paper, we propose a novel deep convolutional neural network, we called Self Supervised Super Sample Decomposition for Transfer learning (4S-DT) model. 4S-DT encourages a coarse-to-fine transfer learning from large-scale image recognition tasks to a specific chest X-ray image classification task using a generic self-supervised sample decomposition approach. Our main contribution is a novel self-supervised learning mechanism guided by a super sample decomposition of unlabelled chest X-ray images. 4S-DT helps in improving the robustness of knowledge transformation via a downstream learning strategy with a class-decomposition layer to simplify the local structure of the data. 4S-DT can deal with any irregularities in the image dataset by investigating its class boundaries using a downstream class-decomposition mechanism. We used 50,000 unlabelled chest X-ray images to achieve our coarse-to-fine transfer learning with an application to COVID-19 detection, as an exemplar. 4S-DT has achieved a high accuracy of 99.8% (95% CI: 99.44 %, 99.98%) in the detection of COVID-19 cases on a large dataset and an accuracy of 97.54% (95% CI: 96.22%, 98.91%) on an extended test set enriched by augmented images of a small dataset, out of which all real COVID-19 cases were detected, which was the highest accuracy obtained when compared to other methods.
I. Introduction
Diagnosis of COVID-19 is associated with the symptoms of pneumonia and chest X-ray tests [1]. Chest X-ray is the essential imaging technique that plays an important role in the diagnosis of COVID-19 disease. Fig. 1 shows examples of a) a normal chest X-ray, a positive one with COVID-19, a positive image with the severe acute respiratory syndrome (SARS), and b) some examples of other unlabelled chest X-ray images used in this work.
Several statistical machine learning methods have been previously used for automatic classification of digitised lung images [2], [3]. For instance, in [4], a small set of three statistical features were calculated from lung texture to distinguish between malignant and benign lung nodules using a Support Vector Machine SVM classifier. A statistical co-occurrence matrix method was used with Backpropagation Network [5] to classify samples from being normal or cancerous. With the high availability of enough annotated image data, deep learning approaches [6]-[8] usually provide a superiority performance over the statistical machine learning approaches. Convolutional Neural Networks (CNN) is one of the most commonly used deep learning approaches with superior achievements in the medical imaging domain [9]. The primary success of CNN is due to its capability to learn local features automatically from domain-specific images, unlike the statistical machine learning methods. One of the popular strategies for training a CNN model is to transfer learned knowledge from a pre-trained network that fulfilled one generic task into a new specific task [10]. Transfer learning is faster and easy to apply without the need for a huge annotated dataset for training; therefore many scientists tend to adopt this strategy especially with medical imaging. Transfer learning can be accomplished with three main scenarios [11]: a) “shallow tuning”, which adapts only the classification layer in a way to cope with the new task, and freezes the weights of the remaining layers without updating; b) “deep tuning” which aims to retrain all the weights of the adopted pre-trained network from end-to-end; and (c) “fine-tuning” that aims to gradually train layers by tuning the learning parameters until a significant performance boost is achieved. Transfer knowledge via fine-tuning scenario demonstrated outstanding performance in chest X-ray and computed tomography image classification [12], [13].
The emergence of COVID-19 as a pandemic disease dictated the need for faster detection methods to contain the spread of the virus. As aforementioned, chest X-ray imaging comes in as a promising solution, particularly when combined with an effective machine learning model. In addition to data irregularities that can be dealt with through class decomposition, scarcity of data, especially in the early months of the pandemic, made it hard to realise the adoption of chest X-ray images as a means for detection. On the other hand, self-supervised learning is being popularised recently to address the expensive labelling of data acquired at an unprecedented rate. In self-supervised learning, unlabelled data is used for feature learning by assigning each example a pseudo label. In the case of convolutional neural networks (CNN) applied on image data, each image is assigned a pseudo label, and CNN is trained to extract visual features of the data. The training of a CNN by pseudo labelled images as input is called pretext task learning. While the training of the produced CNN from the pretext training using labelled data is called downstream task training. Inherently such a pipeline allows for effective utilisation of large unlabelled data sets. The success of the pretext task learning relies on pseudo labelling methods. In [14], four categories of methods were identified. Context-based image feature learning by means of context similarity has demonstrated a particularly effective pseudo labelling mechanism. DeepCluster [15] is the state-of-the-art method under this category. DeepCluster is a super sample decomposition method that generates pseudo labels through the clustering of CNN features. Sample decomposition is the process of applying clustering on the whole training set as a step for improving supervised learning performance [16]. When the clustering is performed on a larger data sample, we refer to this process as a super sample decomposition. However, we argue that the coupling of the pretext task and the pseudo labelling can limit the effectiveness of the pretext task in the self-supervised learning process. In our proposed super sample decomposition, the pretext task training uses cluster assignments as pseudo labels, where the clustering process is decoupled from the pretext training. We propose the clustering of encoded images through an auto-encoder neural network, allowing flexibility of utilising different features and clustering methods, as appropriate. We argue that this can be most effective in medical image classification, evident by the experimentally validated use of class decomposition for transfer learning in a method coined as DeTraC [17].
In this paper, we propose a novel deep convolutional neural network, we term Self Supervised Super Sample Decomposition for Transfer learning (4S-DT) model for the detection of COVID-19 cases1. 4S-DT has been designed in a way to encourage a coarse-to-fine transfer learning based on a self-supervised sample decomposition approach. 4S-DT can deal with any irregularities in the data distribution and the limited availability of training samples in some classes. The contributions of this paper can be summarised as follows. We provide
a novel mechanism for self-supervised sample decomposition using a large set of unlabelled chest X-ray images for a pretext training task;
a generic coarse-to-fine transfer learning strategy to gradually improve the robustness of knowledge transformation from large-scale image recognition tasks to a specific chest X-ray image classification task;
a downstream class-decomposition layer in the down-stream training phase to cope with any irregularities in the data distribution and simplify its local structure; and
a thorough experimental study on COVID-19 detection, pushing the boundaries of state-of-the-art techniques in terms of accuracy, and robustness of the proposed model.
The paper is organised as follow. In Section II, we review the state-of-the-art methods for COVID-19 detection. Section III discusses the main components of our proposed 4S-DT model. Section IV describes our experiments on several chest X-ray images collected from different hospitals. In Section V, we discuss our findings and conclude the work.
II. Previous work on COVID-19 detection from chest X-ray
In February 2020, the World Health Organisation (WHO) has declared that a new virus called COVID-19 has started to spread aggressively in several countries [18]. Diagnosis of COVID-19 is typically associated with pneumonia-like symptoms, which can be revealed by both genetic and imaging tests. Fast detection of the virus will directly contribute to managing and controlling its spread. Imaging tests, especially chest X-ray, can provide fast detection of COVID-19 cases. The historical conception of medical image diagnostic systems has been comprehensively explored through an enormous number of approaches ranging from statistical machine learning to deep learning. A convolutional neural network is one of the most effective approaches in the diagnosis of lung diseases including COVID-19 directly from chest X-ray images. Several recent reviews have been carried out to highlight significant contributions to the detection of COVID-19 [19]-[21]. For instance, in [22], a modified version of ResNet-50 pre-trained CNN model has been used to classify CT images into three classes: healthy, COVID-19 and bacterial pneumonia. In [23], a CNN model, called COVID-Net, based on transfer learning was used to classify chest X-ray images into four classes: normal, bacterial infection, non-COVID, and COVID-19 viral infection. In [24], a weakly-supervised approach has been proposed using 3D chest CT volumes for COVID-19 detection and lesion localisation relying on ground truth masks obtained by an unsupervised lung segmentation method and a 3D ResNet pre-trained model. In [25], a dataset of chest X-ray images from patients with pneumonia, confirmed COVID-19 disease, and normal incidents, was used to evaluate the performance of the state-of-the-art CNN models based on transfer learning. The study suggested that transfer learning can provide important biomarkers for the detection of COVID-19 cases. It has been experimentally demonstrated that transfer learning can provide a robust solution to cope with the limited availability of training samples from confirmed COVID-19 cases [26].
In [27], self-supervised learning using context distortion is applied for classification, segmentation, and localisation in different medical imaging problems. When used in classification, the method was applied for scan plane detection in fetal 2D ultrasound images, showing classification improvement in some settings. However, we argue that our proposed method is more effective in image segmentation and localisation, because context distortion is able to generate localised features, instead of global image features that can be more effective for classification tasks.
Having reviewed the related work, it is evident that despite the great success of deep learning in the detection of COVID-19 cases from chest X-ray images, data scarcity and irregularities have not been explored. It is common in medical imaging in particular that datasets exhibit different types of irregularities (e.g. overlapping classes with imbalance problems) that affect the resulting accuracy of deep learning models. With the unfolding of COVID-19, chest X-ray images are rather scarce. Thus, this work focuses on coping with data irregularities through class decomposition, and data scarcity through super sample decomposition, as detailed in the following section.
III. 4S-DT MODEL
This section describes, in sufficient details, our proposed deep convolutional neural network, Self Supervised Super Sample Decomposition for Transfer learning (4S-DT) model for detecting COVID-19 cases from chest X-ray images. Starting with an overview of the architecture through to the different components of the model, the section discusses the workflow and formalises the method. 4S-DT model consists of three training phases, see Fig. 2. In the first phase, we train an autoencoder model to extract deep local features from each sample in a super large set of unlabelled generic chest X-ray images. Then we adapted a sample decomposition mechanism to create pseudo labels for the generic chest X-ray images. In the second phase, we use the pseudo labels to achieve a coarse transfer learning using an ImageNet pre-trained CNN model for the classification of pseudo-labelled chest X-ray images (as a pretext training task), resulting in a chest X-ray-related convolutional features. In the last phase, we use trained convolutional features to achieve downstream training. The downstream training task is more task-specific by adapting a fine transfer learning from chest X-ray recognition to COVID-19 detection. In this stage, we also adapt a class-decomposition layer to simplify the local structure of the image data distribution, where a sophisticated gradient descent optimisation method is used. Finally, we apply a class-composition to refine the final classification of the images.
A. Super sample decomposition
Given a set of unlabelled images X = {x1, x2,…, xn}, our super sample decomposition component aims to find and use pseudo labels during the pretext training task of 4S-DT. To this end, an autoencoder (AE) is first used to extract deep features associated to each image. For each input image x, the representation vector hd and the reconstructed image x can be defined as where W(1) and W(2) are the weight matrices, b(1) and b(2) are the bias vectors, and f is the active function. The reconstruction error between and x is defined as
The overall cost function of the n′ unlabelled images, EAE (W, b), can be defined as where the first term denotes the reconstruction error of the whole datasets, and the second term is the regularisation weight penalty term, which aims to prevent over-fitting by restraining the magnitude of the weights. λ is the weight decay parameter, nl is the layer number of the network, sl denotes the neuron number in layer l, and is the connecting weight between neuron i in layer I + 1 and neuron j in layer l.
Once the training of the AE has been accomplished, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used to cluster the image data distribution X into a number of classes c based on the extracted features hd. DBSCAN is an unsupervised clustering algorithm, which is a considerably representative density-based clustering algorithm that defines clusters as the largest set of points connected by density.
Let the image dataset X be mapped into a low-dimensional feature space denoted by H e Rn′ × d, where H = (h1, h2,…, hn′). An image xj (represented by hj) is density-connected to image xj (represented by hj) with respect to Eps (i.e. neighbourhood radius) and MinPts (i.e. the minimum number of objects within the neighbourhood radius of core object) if there exists a core object xk such that both xj and xj are directly density-reachable from xk with respect to Eps and MinPts. An image xi is directly density-reachable from an image xj if xi is within the Eps-neighbourhood of NEps(xj), and xj is a core object, where Eps-neighbourhood can be defined as
DBSCAN results in C clusters, where each cluster is constructed by maximising the density reachability relationship among images of the same cluster. The C cluster labels will be assigned to the n′ unlabelled images and will be presented as pseudo labels for the pretext training task and hence the downstream training task. The pseudo-labelled image dataset can then be defined as X′ = {(xi,yc)|c ∊ C}.
B. Pretext training
With the high availability of large-scale annotated image datasets, the chance for the different classes to be well-represented is high. Therefore, the learned in-between class-boundaries are most likely to be generic enough to new samples. On the other hand, with the limited availability of annotated medical image data, especially when some classes are suffering more compared to others in terms of the size and representation, the generalisation error might increase. This is because there might be a miscalibration between the minority and majority classes. Large-scale annotated image datasets (such as ImageNet) provide effective solutions to such a challenge via transfer learning where tens of millions of parameters (of CNN architectures) are required to be trained.
A shallow-tuning mode was used during the adaptation and training of an ImageNet pre-trained CNN model using the collected chest X-ray image dataset. We used the off-the-shelf CNN features of pre-trained models on ImageNet (where the training is accomplished only on the final classification layer) to construct the image feature space.
Mini-batch of stochastic gradient descent (mSGD) was used to minimise the categorical cross entropy loss function, Ecoarse(·) where xj is the set of self-labelled images in the training, yc is their associated self labels while z′ (xj,W′) is the predicted output from a softmax function, where W′ is the converged weight matrix associated to the ImageNet pre-trained model (i.e. we used W′ of ImageNet pre-trained CNN model for weight initialisation to achieve a coarse transfer learning).
C. Downstream training
A fine-tuning mode was used during the adaptation of ResNet model using feature maps from the coarse transfer learning stage. However, due to the high dimensionality associated with the images, we applied PCA to project the high-dimension feature space into a lower-dimension, where highly correlated features were ignored. This step is important for the downstream class-decomposition process in the downstream training phase to produce more homogeneous classes, reduce the memory requirements, and improve the efficiency of the framework.
Now assume that our feature space (PCA’s output) is represented by a 2-D matrix (denoted as dataset A), and L is a class category. A and L can be rewritten as where n is the number of images, m is the number of features, and c′ is the number of classes. For downstream class-decomposition, we used k-means clustering [28] to further divide each class into homogeneous sub-classes (or clusters), where each pattern in the original class L is assigned to a class label associated with the nearest centroid based on the squared euclidean distance (SED): where centroids are denoted as cj. Once the clustering is accomplished, each class in L will further be divided into k subclasses, resulting in a new dataset (denoted as dataset B). Accordingly, the relationship between dataset A and B can be mathematically described as: where the number of instances in A is equal to B while C encodes the new labels of the subclasses (e.g. C′ = {l11, l12,…, l1k, l21, l22,…, l2k,… lc′k}).
For transfer learning, we used ResNet [29] model, which showed excellent performance with only 18 layers. Here we consider freezing the weights of low-level layers and update weighs of high-level layers. With the limited availability of training data, stochastic gradient descent (SGD) can heavily be fluctuating the objective/loss function and hence overfitting can occur. To improve convergence and overcome overfitting, the mini-batch of stochastic gradient descent (mSGD) was used to minimise the objective function, Efine(·), with categorical cross-entropy loss where oj is the set of input labelled images in the training, gli is the ground truth labels, while is the predicted output from a softmax function, where is the converged weight matrix associated to the coarse transfer learning model.
Performance evaluation
In the downstream class-decomposition layer of 4S-DT, we divide each class within the image dataset into several subclasses, where each subclass is treated as a new independent class. In the composition phase, those sub-classes are assembled back to produce the final prediction based on the original image dataset. For performance evaluation, we adopted Accuracy (ACC), Specificity (SP) and Sensitivity (SN) metrics for multiclasses confusion matrix, the input image can be classified into one of (c′) non-overlapping classes. As a consequence, the confusion matrix would be a (Nc′ × Nc′) matrix and the matrices are defined as: where c′ is the original number of classes in the dataset, TP is the true positive in case of COVID-19 case and TN is the true negative in case of normal or other disease, while F P and F N are the incorrect model predictions for COVID-19 and other cases. Also, the TP, TN, F P and F N for a specific class i are defined as: where xii is an element in the diagonal of the matrix. Having discussed and formalised the 4S-DT model in this section in detail, the following section validates the model experimentally. The model establishes the effectiveness of self-supervised super sample decomposition in detecting COVID-19 from chest X-ray images.
IV. Experimental Results
This section presents the datasets used in training and evaluating our 4S-DT model, and discusses the experimental results.
A. Datasets
In this work, we used three datasets of labelled and unlabelled chest X-ray images, defined respectively as:
Unlabelled chest X-ray dataset, a large set of chest X-ray images used as an unlabelled dataset: A set of 50,000 unlabelled chest X-ray images collected from three different datasets: 1) 336 cases with a manifestation of tuberculosis, and 326 normal cases from [30], [31]; 2) 5,863 chest X-Ray images with 2 categories: pneumonia and normal from [32]; and 3) a set of 43,475 chest X-ray images randomly selected from a total of 112,120 chest X-ray images, including 14 diseases, available from [33].
COVID-19 dataset-A, an imbalanced set of labelled chest X-ray with COVID-19 cases: 80 normal cases from [34], [35], and chest X-ray dataset from [36], which contains 105 and 11 cases of COVID-19 and SARS, respectively. We divided the dataset into two groups: 70% for training and 30% for testing. Due to the limited availability of training images, we applied different data augmentation techniques (such as: flipping up/down and right/left, translation and rotation using random five different angles) to generate more samples, see Table I.
COVID-19 dataset-B, we used a public chest X-ray dataset that already divided into two sets (training and testing), each set consists of three classes (e.g. COVID-19, Normal, and pneumonia), see Table II. The dataset is available for download at: (https://www.kaggle.com/prashant268/chest-xray-covid19-pneumonia).
Note that chest X-ray images of dataset-A and datset-B are progressively updated and the distributions of images in these datasets (e.g. Tables I and II) can be considered as a snapshot at the time of submitting this paper. Therefore, any attempt to compare the performance of methods on such datasets at different points in time would be misleading. Moreover, the performance of the methods reported in this paper is expected to improve in the future with the growing availability of labelled images.
All the experiments in our work have been carried out in MATLAB 2019a on a PC with the following configuration: 3.70 GHz Intel(R) Core(TM) i3-6100 Duo, NVIDIA Corporation with the donation of the Quadra P5000GPU, and 8.00 GB
B. Self supervised training of 4S-DT
We trained our autoencoder with 80 neurons in the first hidden layer and 50 neurons in the second hidden layer for the reconstruction of input unlabelled images, see Fig 3. The trained autoencoder is then used to extract a set of deep features from the unlabelled chest X-ray images. The extracted features were fed into the DBSCAN clustering algorithm for constructing the clusters (and hence the pseudo-labels). Since DBSCAN is sensitive to the neighbourhood radius, we employed a k-nearest-neighbour (k-NN) [37] search to determine the optimal (Eps) value. As demonstrated in Fig 4, the optimal value for Eps was 1.861. MinPts parameter has been derived from the number of features (d) such that MinPts ≥ d + 1. Consequently, we used and tested different values for MinPts parameter such as 51, 54, and 56 resulting in 13, 6, and 4 clusters respectively. For the coarse transfer learning, we used ResNet18 pre-trained CNN model. The classification performance, on the pseudo-labelled samples, associated with the 13, 6, and 4 clusters were 48.1%, 53.26%, and 64.37%, respectively. Therefore, we fix the number of clusters (and hence the number of pseudo labels) to be 4 in all experiments in this work.
1) Downstream class-decomposition of 4S-DT: We used AlexNet [38] pre-trained network based on a shallow learning mode to extract discriminative features of the labelled dataset. We set a value of 0.0001 for the learning rate, except the last fully connected layer (was 0.01), the min-batch size was 128 with the minimum of 256 epochs, 0.001 was set for the weight decay to prevent the overfitting through training the model, and 0.9 for the momentum speed. At this stage, 4096 attributes were obtained, therefore we used PCA to reduce the dimension of feature space. For the class decomposition step, we used k-means clustering [28], where k has been selected to be 2 and hence each class in L has been further divided into two subclasses, resulting in a new dataset with six classes. The adoption of k-means for class decomposition with k = 2 is based on the results achieved by the DeTraC model in [17].
C. Classification performance on COVID-19 dataset-A
We first validate the performance of 4S-DT with ResNet18 (as the backbone network) on the 58 test images (i.e. testing set 1), where augmented training set is used for training, see Table I. Our ResNet architecture consists of residual blocks and each block has two 3 x 3 Conv layers, where each layer is followed by batch normalisation and a ReLU activation function. Our ResNet architecture consists of residual blocks and each block has two 3 x 3 Conv layers, where each layer is followed by batch normalisation and a ReLU activation function. Table III illustrates the adopted architecture used in our experiment.
During the training of the backbone network, the learning rate for all the CNN layers was fixed to 0.0001 except for the last fully connected layer (was 0.01) to accelerate the learning. The mini-batch size was 256 with a minimum of 200 epochs, 0.0001 was set for the weight decay to prevent the overfitting through training the model, and the momentum value was 0.95. The schedule of drop learning rate was set to 0.95 every 5 epochs. The results were summarised in Table V. Moreover, we also compare the performance of the proposed model without the self supervised sample decomposition component (i.e. w/o 4S-D or DeTraC-ResNet18 [17]) and without both 4S-D and class-decomposition (w/o 4S-D+CD or ResNet18 [29] pre-trained network on) the 58 testing set. 4S-DT has achieved 100% accuracy in the detection of COVID-19 cases with 100% (95% confidence interval (CI): 96.4%, 98.7%) for sensitivity and specificity (95% CI: 94.5%, 100%), see Fig. 5. As illustrated by Fig. 5 and Table IV, 4S-DT shows a superiority and a significant contribution in improving the transfer learning process with both the self supervised sample decomposition and downstream class-decomposition components. Also, we applied 4S-DT based on ResNet18 pre-trained network on the original classes of COVID-19 dataset with an imbalance classes after eliminating the samples from the training set. As we see in Fig. 5, 4S-DT has achieved 96.43% accuracy (95% CI: 92.5%, 98.6%) in the detection of COVID-19 cases with sensitivity 97.1% (95% CI: 92.24%, 97.76%) and 95.60 % (95% CI: 93.41%, 96.5%) for specificity.
To allow for further investigation and make testing of COVID-19 detection more challenging, we applied the same data augmentation techniques (used for the training samples) to the small testing set to increase the number of testing samples. The new test sample distribution, we called testing set 2, contains 283 COVID-19 images, 30 SARS images, and 216 normal images, see Table I. Consequently, we used testing set 2 for testing and augmented training set for training (see Table I), unless otherwise mentioned, for the performance evaluation of all methods in the experiments described below. We validated the performance of a) the full version of 4S-DT with 4S-D component and b) without 4S-D. For a fair comparison, we used the same backbone network (i.e. ResNet18) with the downstream class-decomposition component, where both versions have been trained in a shallow and fine-tuning mode. As illustrated by Table V, 4S-D component shows significant improvement in shallow- and fine-tuning modes in all cases. More importantly, our full version model with 4S-D demonstrates better performance, in each case, with less number of epochs, confirming its efficiency and robustness at the same time.
Moreover, we compared the classification performance of 4S-DT with other models used for COVID-19 detection, including GoogleNet [39], DeTraC. 4S-DT has achieved a high accuracy of 97.54% (95% CI: 96.22%, 98.91%) with a specificity of 97.15% (95% CI: 94.23%, 98.85%) and sensitivity of 97.88% (95% CI: 95.46%, 99.22%) on the 529 test chest X-ray images of test set 2, see Table VI. Moreover, as shown by Table VI, 4S-DT has demonstrated superiority in performance, confirming its effectiveness in improving the classification accuracy of transfer learning models. Finally, Fig. 6 shows the Area Under the receiver curve (AUC) between the true positive rate and false positive rate obtained by 4S-DT, with AUC value of 99.58% (95% CI: 99.01%, 99.95%), to confirm its robustness behaviours during the training process.
D. Classification performance on COVID-19 dataset-B
To evaluate the performance of 4S-DT on COVID-19 dataset-B, we applied different ImageNet pre-trained CNN networks such as: VGG19 [41], ResNet [42], GoogleNet [39], and Mobilenetv2 [43]. Parameter settings for each pre-trained model during the training process are reported in Table VII.Transfer learning has been accomplished via deep tuning scenario (with 15 epochs and SGD was the optimiser). The classification performance on COVID19 cases was reported in Table VIII. Fig 7 illustrates the confusion matrix obtained by each pre-trained Networks for each class in the dataset. As demonstrated by Table VIII, 4S-DT has achieved a high accuracy of 99.8% (95% CI: 99.44 %, 99.98%), with sensitivity of 99.3%(95%CI: 93.91%, 99.79%), andspecificityof 100% (95% CI: 99.69%, 100%) in the detection of COVID-19 cases.
V. Discussion and conclusion
The diagnosis of COVID-19 is associated with the pneumonia-like symptoms that can be revealed by genetic and imaging tests. Chest X-ray imaging test provides a promising fast detection of COVID-19 cases and consequently can contribute to controlling the spread of the virus. In medical image classification, paramount progress has been made using ImageNet pre-trained convolutional neural networks (CNNs), exploiting the high availability of large-scale annotated image datasets. The historical conception of such approaches has been comprehensively explored through several transfer learning strategies, including fine-tuning and deep-tuning mechanisms. They usually require an enormous number of balanced anno-tated images distributed over several classes/diseases (which is impractical in the medical imaging domain). In medical image analysis, data irregularities still remain a challenging problem, especially with the limited availability of confirmed samples with some diseases such as COVID-19 and SARS, which usually results in miscalibration between the different classes in the dataset. Consequently, COVID-19 detection from chest X-ray images presents a challenging problem due to the irregularities and the limited availability of annotated cases.
Here, we propose a new CNN model, we called Self Supervised Super Sample Decomposition for Transfer learning (4S-DT) model. 4S-DT has been designed to cope with such challenging problems by adapting a self-supervised sample decomposition approach to generate pseudo-labels for the classification of unlabelled chest X-ray images as a pretext learning task. 4S-DT has also the ability to deal with data irregularities by a class-decomposition adapted in its downstream learning component. 4S-DT has demonstrated its effectiveness and efficiency in coping with the detection of COVID-19 cases in a dataset with irregularities in its distribution. In this work, we used 50,000 unlabelled chest X-ray images for the development of our self-supervised sample decomposition approach to perform transfer learning with an application to COVID-19 detection. We achieved an accuracy of 97.54% with a specificity of 97.15% and sensitivity of 97.88% on 529 test chest X-ray images (of COVID-19 dataset-A), i.e. testing set 2, with 283 COVID-19 samples. We also achieved a high accuracy of 99.8% in the detection of COVID-19 cases of COVID-19 dataset-B.
With the continuous collection of data, we aim in the future to extend the development and validation of 4S-DT with multi-modality datasets, including clinical records. As a future development, we also aim to add an explainability component to increase the trustworthiness and usability of 4S-DT. Finally, one can use model pruning and quantisation to improve the efficiency of 4S-DT, allowing deployment on handheld devices.
Data Availability
The developed code in a test mode is available at https://github.com/asmaa4may/4S-DT. Datasets: In this work, we used two datasets of labelled and unlabelled chest X-ray images, defined respectively as: 1-Unlabelled chest X-ray dataset, a large set of chest X-ray images used as an unlabelled dataset: A set of 50,000 unlabelled chest X-ray images collected from three different datasets: 1) 336 cases with a manifestation of tuberculosis,and 326 normal cases from [30, 31]; 2) 5,863 chest X-Ray images with 2 categories: pneumonia and normal from [32]; and 3) aset of 43,475 chest X-ray images randomly selected from a total of 112,120 chest X-ray images, including 14 diseases, available from [33]. 2-COVID-19 dataset, an imbalanced set of labelled chest X-ray with COVID-19 cases: 80 normal cases from [34, 35], and chest X-ray dataset from [36], which contains 105 and 11 cases of COVID-19 and SARS, respectively.
Footnotes
↵1 The developed code is available at https://github.com/asmaa4may/4S-DT.