Detection of COVID-19 in smartphone-based breathing recordings using CNN-BiLSTM: a pre-screening deep learning tool

This study was sought to investigate the feasibility of using smartphone-based breathing sounds within a deep learning framework to discriminate between COVID-19, including asymptomatic, and healthy subjects. A total of 480 breathing sounds (240 shallow and 240 deep) were obtained from a publicly available database named Coswara. These sounds were recorded by 120 COVID-19 and 120 healthy subjects via a smartphone microphone through a website application. A deep learning framework was proposed herein the relies on hand-crafted features extracted from the original recordings and from the mel-frequency cepstral coefficients (MFCC) as well as deep-activated features learned by a combination of convolutional neural network and bi-directional long short-term memory units (CNN-BiLSTM). Analysis of the normal distribution of the combined MFCC values showed that COVID-19 subjects tended to have a distribution that is skewed more towards the right side of the zero mean (shallow: 0.59{+/-}1.74, deep: 0.65{+/-}4.35). In addition, the proposed deep learning approach had an overall discrimination accuracy of 94.58% and 92.08% using shallow and deep recordings, respectively. Furthermore, it detected COVID-19 subjects successfully with a maximum sensitivity of 94.21%, specificity of 94.96%, and area under the receiver operating characteristic (AUROC) curves of 0.90. Among the 120 COVID-19 participants, asymptomatic subjects (18 subjects) were successfully detected with 100.00% accuracy using shallow recordings and 88.89% using deep recordings. This study paves the way towards utilizing smartphone-based breathing sounds for the purpose of COVID-19 detection. The observations found in this study were promising to suggest deep learning and smartphone-based breathing sounds as an effective pre-screening tool for COVID-19 alongside the current reverse-transcription polymerase chain reaction (RT-PCR) assay. It can be considered as an early, rapid, easily distributed, time-efficient, and almost no-cost diagnosis technique complying with social distancing restrictions during COVID-19 pandemic.

with the ability of the virus to develop more genomic variants and spread more readily 10 among people. India, which is one of the world's biggest suppliers of vaccines, is now 11 severely suffering from the pandemic after the explosion of cases due to a new variant of 12 COVID-19. It has reached more than 17.5 million confirmed cases, setting it behind the 13 US as the second worst hit country [2,3]. 14 COVID-19 patients usually range from being asymptomatic to developing pneumonia 15 and in severe cases, death. In most reported cases, the virus remains incubation 16 for a period of 1 to 14 days before the symptoms of an infection start arising [4]. 17 Patients carrying COVID-19 have exhibited common signs and symptoms including 18 cough, shortness of breath, fever, fatigue, and other acute respiratory distress syndromes 19 (ARDS) [5,6]. Most infected people suffer from mild to moderate viral symptoms, 20 however, they end up by being recovered. On the other hand, patients who develop 21 severe symptoms such as severe pneumonia are mostly people over 60 years of age 22 with conditions such as diabetes, cardiovascular diseases (CVD), hypertension, and 23 cancer [4,5]. On most cases, the early diagnosis of COVID-19 helps in preventing its 24 spreading and development to severe infection stages. This is usually done by following 25 steps of early patient isolation and contact tracing. Furthermore, timely medication and 26 efficient treatment reduces symptoms and results in lowering the mortality rate of this 27 pandemic [7]. 28 The current gold standard in diagnosing COVID-19 is the reverse-transcription 29 polymerase chain reaction (RT-PCR) assay [8,9]. It is the most commonly used technique 30 worldwide to successfully confirm the existence of this viral infection. Additionally, 31 examinations of the ribonucleic acid (RNA) in patients carrying the virus provide further 32 information about the infection, however, it requires longer time for diagnosis and is not 33 considered as accurate as other diagnostic techniques [10]. The integration of computed 34 tomography (CT) screening is another effective diagnostic tool (sensitivity ≥ 90%) that 35 often provides supplemental information about the severity and progression of COVID-19 36 in lungs [11,12]. CT imaging is not recommended for patients at the early stages of 37 the infection, i.e., showing asymptomatic to mild symptoms. It provides useful details 38 about the lungs in patients with moderate to severe stages due to the disturbance in the 39 pulmonary tissues and its corresponding functions [13]. However, CT imaging may not 40 be available in all public healthcare services, especially for countries who are swamped 41 with the pandemic, due to its costs and additional maintenance requirements. Therefore, 42 biological signals, such as coughing and breathing sounds, could be another promising 43 tool to indicate the existence of the viral infection [14]. In addition, due to the simplicity 44 in recording respiratory signals, lung sounds could carry useful information about the 45 viral infection, and thus, could set an early alert to the patient before moving on with 46 further medication procedures. In addition, the new emerging algorithms in artificial 47 intelligence (AI) could be a key to enhance the sensitivity of detection for positive cases 48 due to its ability to generalize over a wide set of data [15]. 49 Many studies have investigated the information carried by respiratory sounds in 50 patients tested positive for COVID-19 [16][17][18]. Furthermore, it has been found that 51 vocal patterns extracted from COVID-19 patients' speech recordings carry indicative 52 biomarkers for the existence of the viral infection [19]. In addition, a telemedicine 53 approach was also explored to observe evidences on the sequential changes in respiratory 54 sounds as a result of COVID-19 infection [20]. Most recently, AI was utilized in one 55 study to recognize COVID-19 in cough signals [21] and in another to evaluate the severity 56 of patients' illness, sleep quality, fatigue, and anxiety through speech recordings [22]. 57 September 18, 2021 2/20 A graphical abstract of the complete procedure followed in this study. The input data includes breathing sounds collected from an open-access database for respiratory sounds (Coswara [23]) recorded via smartphone microphone. The data includes a total of 240 participants, out of which 120 subjects were suffering from COVID-19, while the remaining 120 were healthy (control group). A deep learning framework was then utilized based on hand-crafted features extracted by feature engineering techniques, as well as deep-activated features extracted by a combination of convolutional and recurrent neural network. The performance was then evaluated and further discussed on the use of artificial intelligence (AI) as a successful pre-screening tool for COVID-19.
Despite of the high levels of performance achieved in the aforementioned AI-based 58 studies, further investigations on the capability of respiratory sounds in carrying useful 59 information about COVID-19 are still required, especially when embedded within the 60 framework of sophisticated AI-based algorithms. Furthermore, due to the explosion in 61 the number of confirmed positive COVID-19 cases all over the world, it is essential to 62 ensure providing a system capable of recognizing the disease in signals recording through 63 portable devices, such as computers or smartphones, instead of regular clinic-based 64 electronic stethoscopes.

65
Motivated by the aforementioned, a complete deep learning approach is proposed in 66 this paper for a successful detection of COVID-19 using only breathing sounds recorded 67 through a microphone of a smartphone device (Fig. 1). The proposed approach serves 68 as a rapid, no-cost, and easily distributed pre-screening tool for COVID-19, especially 69 for countries who are in a complete lockdown due to the wide spread of the pandemic. 70 Although the current gold standard, RT-PCR, provides high success rates in detecting 71 the viral infection, it has various limitations including the high expenses involved with 72 equipment and chemical agents, requirement of expert nurses and doctors for diagnosis, 73 violation of social distancing, and the long testing time required to obtain results (2-74 3 days). Thus, the development of a deep learning model overcomes most of these 75 limitations and allows for a better revival in the healthcare and economic sectors in 76 several countries.

77
Furthermore, the novelty of this work lies in utilizing smartphone-based breathing 78 recordings within this deep learning model, which, when compared to conventional 79 respiratory auscultation devices, i.e., electronic stethoscopes, are more preferable due 80 to their higher accessibility by wider population. This plays an important factor in 81 obtaining medical information about COVID-19 patients in a timely manner while at 82 the same time maintaining an isolated behaviour between people. Additionally, this 83 study covers patients who are mostly from India, which is severely suffering from a new 84 genomic variant (first reported in December 2020) of COVID-19 capable of escaping the 85 immune system and most of the available vaccines [2,24]. Thus, it gives an insight on 86 the ability of AI algorithms in detecting this viral infection in patients carrying this new 87 September 18, 2021 3/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; variant, including asymptomatic. Lastly, the study presented herein investigates signal 88 characteristics contaminated within shallow and deep breathing sounds of COVID-19 89 and healthy subjects through deep-activated attributes (neural network activations) of 90 the original signals as well as wide attributes (hand-crafted features) of the signals and 91 their corresponding mel-frequency cepstrum (MFC). The utilization of one-dimensional 92 (1D) signals within a successful deep learning framework allows for a simple, yet effective, 93 AI design that does not require heavy memory requirements. This serves as a suitable 94 solution for further development of telemedicine and smartphone applications for COVID-95 19 (or other pandemics) that can provide real-time results and communications between 96 patients and clinicians in an efficient and timely manner. Therefore, as a pre-screening 97 tool for COVID-19, this allows for a better and faster isolation and contact tracing than 98 currently available techniques. The dataset used in this study was obtained from Coswara [23], which is a project 102 aiming towards providing an open-access database for respiratory sounds of healthy 103 and unhealthy individuals, including those suffering from COVID-19. The project is a 104 worldwide respiratory data collection effort that was first initiated in August, 7th 2020. 105 Ever since, it has collected data from more than 1,600 participants (Male: 1185, Female: 106 415) from allover the world (mostly Indian population). The database was approved by 107 the Indian institute of science (IISc), human ethics committee, Bangalore, India, and 108 conforms to the ethical principles outlined in the declaration of Helsinki. No personally 109 identifiable information about participants was collected and the participants' data was 110 fully anonymized during storage in the database.

111
The database includes breath, cough, and voice sounds acquired via crowdsourcing 112 using an interactive website application that was built for smartphone devices [25]. The 113 average interaction time with the application was 5-7 minutes. All sounds were recorded 114 using the microphone of a smartphone and sampled with a sampling frequency of 48 kHz. 115 The participants had the freedom to select any device for recording their respiratory 116 sounds, which reduces device-specific bias in the data. The audio samples (stored in 117 .WAV format) for all participants were manually curated through a web interface that 118 allows multiple annotators to go through each audio file and verify the quality as well as 119 the correctness of labeling. All participants were requested to keep a 10 cm distance 120 between the face and the device before starting the recording.

121
So far, the database had a COVID-19 participants' count of 120, which is almost 122 1-10 ratio to healthy (control) participants. In this study, all COVID-19 participants' 123 data was used, and the same number of samples from the control participants' data 124 was randomly selected to ensure a balanced dataset. Therefore, the dataset used in this 125 study had a total of 240 subjects (COVID-19: 120, Control: 120). The demographic 126 and clinical information of the selected subjects is provided in Table 1. Furthermore, 127 only breathing sounds of two types, namely shallow and deep, were obtained from every 128 subject and used for further analysis (examples from the shallow breathing dataset are 129 shown in Fig. 2). To ensure the inclusion of maximum information from each breathing 130 recording as well as to cover at least 2-4 breathing cycles (inhale and exhale), a total of 131 16 seconds were considered, as the normal breathing pattern in adults ranges between 132 12 to 18 breaths per minute [26]. All recordings with less than 16 seconds were padded 133 with zeros. Furthermore, the final signals were resampled with a sampling frequency of 134 4 kHz. These features refer to signal attributes that are extracted manually through various 147 algorithms and functions in a process called feature engineering. The advantage of 148 following such process is that it can extract internal and hidden information within 149 input data, i.e., sounds, and represent it as single or multiple values. Thus, additional 150 knowledge about the input data can be obtained and used for further analysis and 151 evaluation. Hand-crafted features were initially extracted from the original breathing 152 recordings, then, they were also extracted from the MFCC transformation of the signals. 153 The features included in this study are,

154
Kurtosis and Skewness: In statistics, kurtosis is a quantification measure for 155 the degree of extremity included within the tails of a distribution relative to the tails 156 of a normal distribution. The more the distribution is outlier-prone, the higher the 157 kurtosis values, and vice-versa. A kurtosis of 3 indicates that the values follow a normal 158 distribution. On the other hand, skewness is a measure for the asymmetry of the data 159 that deviates it from the mean of the normal distribution. If the skewness is negative, 160 then the data are more spread towards the left side of the mean, while a positive skewness 161 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; indicates data spreading towards the right side of the mean [27]. A skewness of zero 162 indicates that the values follow a normal distribution. Kurtosis (k) and skewness (s) 163 can be calculated as, where X included input values, µ and σ are the mean and standard deviation values 165 of the input, respectively, and E is an expectation operator.

166
Sample entropy: In physiological signals, the sample entropy (SampEn) provides 167 a measure for complexity contaminated within time sequences. It can be calculated 168 though the negative natural logarithm of a probability that segments of length m match 169 their consecutive segments under a value of tolerance (r) [28] as follows, where segment A is the first segment in the time sequence and segment A+1 is the 171 consecutive segment.

172
Spectral entropy: To measure time series irregularity, spectral entropy (SE) pro-173 vides a frequency domain entropy measure as a sum of the normalize signal spectral 174 power [29]. Based on Shannon's entropy, the SE can be calculated as, 175 September 18, 2021 6/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
where N is the total number of frequency points and P (n) is the probability distri-176 bution of the power spectrum.

177
Fractal dimension: Higuchi and Katz [30,31] provided two methods to measure 178 statistically the complexity in a time series. More specifically, fractal dimension measures 179 provide an index for characterizing how much a time series is self-similar over some 180 region of space. Higuchi (HF D) and Katz (KF D) fractal dimensions can be calculated 181 as, where L(k) is the length of the fractal curve, r is the selected time interval, N is the 183 length of the signal, and d is the maximum distance between an initial point to other 184 points.

185
Zero-crossing rate: To measure the number of times a signal has passed through 186 the zero point, a zero-crossing rate (ZCR) measure is provided. In other words, ZCR 187 September 18, 2021 7/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; https://doi.org/10.1101/2021.09.18.21263775 doi: medRxiv preprint refers to the rate of sign-changes in the signals' data points. It can be calculated as 188 follows, where x t = 1 if the signal has a positive value at time step t and a value of 0 otherwise. 190 Mel-frequency cepstral coefficients (MFCC): To better represent speech and 191 voice signals, MFCC provides a set of coefficients of the discrete cosine transformed 192 (DCT) logarithm of a signal's spectrum (mel-frequency cepstrum (MFC)). It is considered 193 as an overall representation of the information contaminated within signals regarding 194 the changes in its different spectrum bands [32,33]. Briefly, to obtain the coefficients, 195 the signals goes through several steps, namely windowing the signal, applying discrete 196 Fourier transform (DFT), calculating the log energy of the magnitude, transforming the 197 frequencies to the Mel-scale, and applying inverse DCT.  has the ability to acquire the temporal (time changes) information carried through time 208 sequences [34,35]. Such optimized features can be considered as a complete representation 209 of the input data generated iteratively through an automated learning process. To achieve 210 this, we used an advanced neural network based on a combination of convolutional neural 211 network and bi-directional long short-term memory (CNN-BiLSTM).

212
Neural network architecture: The structure of the network starts by 1D convo-213 lutional layers. In deep learning, convolutions refer to a multiple number of dot products 214 applied to 1D signals on pre-defined segments. By applying consecutive convolutions, 215 the network extracts deep attributes (activations) to form an overall feature map for the 216 input data [35]. A single convolution on an input x 0 i = [x 1 , x 2 , ..., x n ], where n is the 217 total number of points, is usually calculated as, where l is the layer index, h is the activation function, b is the bias of the j th feature 219 map, M is the kernel size, w j m is the weight of the j th feature map and m th filter index. 220 In this work, three convolutional layers were used to form the first stage of the deep 221 neural network. The kernel sizes of each layer are [9,1], [5,1], and [3,1], respectively. 222 Furthermore, the number of filters increases as the network becomes deeper, that is 16, 223 32, and 64, respectively. Each convolutional layer was followed by a max-pooling layer 224 to reduce the dimensionality as well as the complexity in the model. The max-pooling 225 kernel size decreases as the network gets deeper with a [8,1], [4,1], and [2, 1] kernels 226 for the three max-pooling layers, respectively. It is worth noting that each max-pooling 227 layer was followed by a batch normalization (BN) layer to normalize all filters as well as 228 by a rectified linear unit (ReLU) layer to set all values less than zero in the feature map 229 to zero. The complete structure is illustrated in Fig. 3.
The network continues with additional extraction of temporal features through bi-231 directional LSTM units. In recurrent neural networks, LSTM units allows for the 232 detection of long short-term dependencies between time sequence data points. Thus, it 233 overcomes the issues of exploding and vanishing gradients in chain-like structures during 234 training [34,36]. An LSTM block includes a collection of gates, namely input (i), output 235 (o), and forget (f ) gates. These gates handle the flow of data as well as the processing 236 of the input and output activations within the network's memory. The information of 237 the main cell (C t ) at any instance (t) within the block can be calculated as, where c t is the input to the main cell and C t−1 includes the information at the 239 previous time instance. 240 In addition, the network performs hidden-units (h t ) activations on the output and 241 main cell input using a sigmoid function as follows, Furthermore, a bi-drectional functionality (BiLSTM) allows the network to process 243 data in both the forward and backward direction as follows, where − → h N and ← − h N are the outputs of the hidden layers in the forward and backward 245 directions, respectively, for all N levels of stack and b y is a bias vector.

246
In this work, a BiLSTM hidden units functionality was selected with a total number 247 of hidden units of 256. Thus, the resulting output is a 512 vector (both directions) of 248 the extracted hidden-units of every input.

249
BiLSTM activations: To be able to utilize the parameters that the BiLSTM units 250 have learned, the activations that correspond to each hidden-unit were extracted from 251 the network for each input signal. Recurrent neural network activations of a pre-trained 252 network are vectors that carry the final learned attributes about different time steps 253 within the input [37]. In this work, these activations were the final signal attributes 254 extracted from each input signal. Such attributes are referred to as deep-activated 255 features in this work (Fig. 3). Furthermore, they were concatenated with the hand-256 crafted features alongside age and sex information and used for the final predictions by 257 the network. Prior to deep learning model training, several data preparation and network fine-tuning 260 steps were followed including data augmentation, best features selection, deciding the 261 training and testing scheme, and network parameters configuration.

262
Data augmentation: Due to the small sample size available, it is critical for deep 263 learning applications to include augmented data. Instead of training the model on the 264 existing dataset only, data augmentation allows for the generation of new modified copies 265 of the original samples. These new copies have similar characteristics of the original data, 266 however, they are slightly adjusted as if they are coming from a new source (subject). 267 Such procedure is essential to expose the deep learning model to more variations in the 268 training data. Thus, making it robust and less biased when attempting to generalize the 269 parameters on new data [38]. Furthermore, it was essential to prevent the model from 270 over-fitting, where the model learns exactly the input data only with a very minimal 271 generalization capabilities for unseen data [39]. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted September 22, 2021. ; In this study, 3,000 samples per class were generated using two 1D data augmentation 273 techniques as follows, 274 • Volume control: Adjusts the strength of signals in decibels (dB) for the generated 275 data [40] with a probability of 0.8 and gain ranging between -5 and 5 dB.

276
• Time shift: Modifies time steps of the signals to illustrate shifting in time for the 277 generated data [41] with a shifting range of [-0.005 to 0.005] seconds.

278
Best features selection: To ensure the inclusion of the most important hand-279 crafted features within the trained model, a statistical univariate chi-square test (χ 2 -test) 280 was applied. In this test, a feature is decided to be important if the observed statistical 281 analysis using this feature matches with the expected one, i.e., label [42]. Furthermore, 282 an important feature indicates that it is considered significant in discriminating between 283 two categories with a p-value < 0.05. The lower the p-value, the more the feature is 284 dependent on the category label. The importance score can then be calculated as, 285 score = −log(p) (12) In this work, hand-crafted features extracted from the original breathing signals 286 and from the MFCC alongside the age and sex information were selected for this test. 287 The best 20 features were included in the final best features vector within the final 288 fully-connected layer (along with the deep-activated features) for predictions.

289
Training configuration: To ensure the inclusion of the whole available data, a 290 leave-one-out training and testing scheme was followed. In this scheme, a total of 240 291 iterations (number of input samples) were applied, where in each iteration, an i th subject 292 was used as the testing subject, and the remaining subjects were used for model's training. 293 This scheme was essential to be followed to provide a prediction for each subject in the 294 dataset.

295
Furthermore, the network was optimized using adaptive moment estimation (ADAM) 296 solver [43] and with a learning rate of 0.001. The L2-regularization was set to 10 6 and 297 the mini-batch size to 32.

Performance evaluation 299
The performance of the proposed deep learning model in discriminating COVID-19 from 300 healthy subjects was evaluated using traditional evaluation metrics including accuracy, 301 sensitivity, specificity, precision, and F1-score. These metrics can be calculated as, 302 Accuracy = T P + T N T P + T N + F P + F N (13) P recision = T P T P + F P (16) where T P is the true positive, T N is the true negative, F P is the false positive, and 303 F N is the false negative numbers in the confusion matrix.  Additionally, the area under the receiver operating characteristic (AUROC) curves 305 was analysed for each category to show the true positive rate (TPR) versus the false 306 positive rate (FPR).

309
Examples of the 13 MFCC extracted from the original shallow breathing signals are 310 illustrated in Fig. 4 for COVID-19 and healthy subjects. Furthermore, the figure shows 311 MFCC values (after summing all coefficients) distributed as a normal distribution. From 312 the figure, the normal distribution of COVID-19 subjects was slightly skewed to the right 313 side of the mean, while the normal distribution of the healthy subjects was more towards 314 the zero mean, indicating that it better in representing a normal distribution. Tables 2 315  and 3 show the values of the combined MFCC values, kurtosis, and skewness among all 316 COVID-19 and healthy subjects (mean±std) for the shallow and deep breathing datasets, 317 respectively. In both datasets, the kurtosis and skewness values for COVID-19 subjects 318 were slightly higher than healthy subjects. Furthermore, the average combined MFCC 319 values for COVID-19 were less than those for the healthy subjects. More specifically, 320 in the shallow breathing dataset, a kurtosis and skewness of 4.65±15.97 and 0.59±1.74 321 was observed for COVID-19 subjects relative to 4.47±20.66 and 00.19±1.75 for healthy 322 subjects. On the other hand, using the deep breathing dataset, COVID-19 subjects had 323 September 18, 2021 11/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; Table 2. Normal distribution analysis (mean±std) of the combined mel-frequency cepstral coefficients (MFCCs) using the shallow breathing dataset.

Category
Normal distribution analysis
-0. The overall performance of the proposed deep learning model is shown in Fig. 5. From 327 the figure, the model correctly predicted 113 and 114 COVID-19 and healthy subjects, 328 respectively, using the shallow breathing dataset out of the 120 total subjects (Fig. 5(a)). 329 In addition, only 7 COVID-19 subjects were miss-classified as healthy, whereas only 6 330 subjects were wrongly classified as carrying COVID-19. The correct predictions number 331 was slightly lower using the deep breathing dataset with a 109 and 112 for COVID-19 332 and healthy subjects, respectively. In addition, wrong predictions were also slightly 333 higher with 11 COVID-19 and 8 healthy subjects. Therefore, the confusion matrices 334 show percentages of proportion of 94.20% and 90.80% for COVID-19 subjects using 335 the shallow and deep datasets, respectively. On the other hand, healthy subjects had 336 percentages of 95.00% and 93.30% for both datasets, respectively.

337
The evaluation metrics (Fig. 5(b)) calculated from these confusion matrices returned 338 an accuracy measure of 94.58% and 92.08% for the shallow and deep datasets, respectively. 339 Furthermore, the model had a sensitivity and specificity measures of 94.21%/94.96% 340 for the shallow dataset and 93.16%/91.06% for the deep dataset. The precision was the 341 highest measure obtained for the shallow dataset (95.00%), where as the deep dataset 342 had the lowest value in the precision with a 90.83%. Lastly, the F1-score measures 343 September 18, 2021 12/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. returned 94.61% and 91.98% for both datasets, respectively.

344
To analyze the AUROC, Fig. 5(c) shows the ROC curves of predictions using both 345 the shallow and deep datasets. The shallow breathing dataset had an overall AUROC of 346 0.90 in predicting COVID-19 and healthy subjects, whereas the deep breathing dataset 347 had a 0.86 AUROC, which is slightly lower performance in the prediction process.
348 Additionally, the model had high accuracy measures in predicting asymptomatic 349 COVID-19 subjects (Fig 6). Using the shallow breathing dataset, the model had 350 a 100.00% accuracy by predicting all subjects correctly. On the other hand, using 351 the deep breathing dataset, the model achieved an accuracy of 88.89% by missing 352 two asymptomatic subjects. It is worth noting that few subjects had close scores 353 (probabilities) to 0.5 using both datasets, however, the model correctly discriminated 354 them from healthy subjects.

356
This study demonstrated the importance of using deep learning for the detection of 357 COVID-19 subjects, especially those who are asymptomatic. Furthermore, it elaborated 358 on the significance of biological signals, such as breathing sounds, in acquiring useful 359 information about the viral infection. Unlike the conventional lung auscultation tech-360 niques, i.e., electronic stethoscopes, to record breathing sounds, the study proposed 361 herein utilized breathing sounds recorded via a smartphone microphone. The observa-362 tions found in this study (highest accuracy: 94.58%) strongly suggest deep learning as 363 a pre-screening tool for COVID-19 as well as an early detection technique prior to the 364 gold standard RT-PCR assay.

365
Smartphone-based breathing recordings 366 Although current lung auscultation techniques provide high accuracy measures in de-367 tecting respiratory diseases [44][45][46], it requires subjects to be present at hospitals for 368 equipment setup and testing preparation prior to data acquisition. Furthermore, it 369 requires the availability of an experienced person, i.e., clinician or nurse, to take data 370 from patients and store it in a database. Therefore, utilizing a smartphone device 371 to acquire such data allows for a faster data acquisition process from subjects or pa-372 tients while at the same time, provides highly comparable and acceptable diagnostic 373 performance. In addition, smartphone-based lung auscultation ensures a better social 374 distancing behaviour during lock downs due to pandemics such as COVID-19, thus, it 375 allows for a rapid and time-efficient detection of diseases despite of strong restrictions. 376 September 18, 2021 14/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. By visually inspecting COVID-19 and healthy subjects' breathing recordings (Fig. 2), 377 an abnormal nature was usually observed by COVID-19 subjects, while healthy subjects 378 had a more regular pattern during breathing. This could be related to the hidden 379 characteristics of COVID-19 contaminated within lungs and exhibited during lung 380 inhale and exhale [47][48][49]. Additionally, the MFCC transformation of these recordings 381 (Fig. 4(a-c)) returned similar observations. By quantitatively evaluating these coefficients 382 when combined, COVID-19 subjects had a unique distribution (positively skewed) that 383 can be easily distinguished from the one of healthy subjects. This gives an indication 384 about the importance of further extracting the internal attributes carried not only by 385 the recordings themselves, but rather by the additional MFC transformation of such 386 recordings. Additionally, the asymptomatic subjects had a distribution of values that 387 was close in shape to the distribution of healthy subjects (Fig. 4(a)), however, it was 388 skewed towards the right side of the zero mean. This may be considered as a strong 389 attribute when analyzing COVID-19 patients who do not exhibit any symptoms and 390 thus, discriminating them easily from healthy subjects.

391
Diagnosis of COVID-19 using deep learning 392 It is essential to be able to gain the benefit of the recent advances in AI and computerized 393 algorithms, especially during these hard times of COVID-19 spread worldwide. Deep 394 learning not only provides high levels of performance, it also reduces the dependency 395 on experts, i.e., clinicians and nurses, who are now suffering in handling the pandemic 396 due to the huge and rapidly increasing number of infected patients [50][51][52]. Recently, 397 the detection of COVID-19 using deep learning has reached high levels of accuracy 398 through two-dimensional (2D) lung CT images [53][54][55]. Despite of such performance 399 in discriminating and detecting COVID-19 subjects, CT imaging is considered high 400 in cost and requires extra time to acquire testing data and results. Furthermore, it 401 utilizes excessive amount of ionizing radiations (X-ray) that are usually harmful to 402 the human body, especially for severely affected lungs. Therefore, the integration of 403 biological sounds, as in breathing recordings, within a deep learning framework overcomes 404 the aforementioned limitations, while at the same time provides acceptable levels of 405 performance.

406
The proposed deep learning framework had high levels of accuracy (94.58%) in 407 discriminating between COVID-19 and healthy subjects. The structure of the framework 408 was built to ensure a simple architecture, while at the same time to provide advanced 409 features extraction and learning mechanisms. The combination between hand-crafted 410 features and deep-activated features allowed for maximized performance capabilities 411 within the model, as it learns through hidden and internal attributes as well as deep 412 structural and temporal characteristics of recordings. The high sensitivity and specificity 413 measures (94.21% and 94.96%, respectively) obtained in this study prove the efficiency 414 of deep learning in distinguishing COVID-19 subjects (AUROC: 0.90). Additionally, it 415 supports the field of deep learning research on the use of respiratory signals for COVID-19 416 diagnostics [21,56]. Alongside the high performance levels, it was interesting to observe 417 a 100.00% accuracy in predicting asymptomatic COVID-19 subjects. This could enhance 418 the detection of this viral infection at a very early stage and thus, preventing it from 419 developing to mild and moderate conditions or spreading to other people. 420 Furthermore, this high performance levels were achieved through 1D signals instead 421 of 2D images, which allowed the model to be simple and not memory exhausting. In 422 addition, due to its simplicity and effective performance, it can be easily embedded 423 within smartphone applications and internet-of-things tools to allow real-time and direct 424 connectivity between the subject and family for care or healthcare authorities for services. 425 September 18, 2021 15/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The utilization of smartphone-based breathing recordings within a deep learning frame-427 work may have the potential to provide a non-invasive, zero-cost, rapid pre-screening tool 428 for COVID-19 in low-infected as well as servery-infected countries. Furthermore, it may 429 be useful for countries who are not able of providing the RT-PCR test to everyone due 430 to healthcare, economic, and political difficulties. Furthermore, instead of performing 431 RT-PCR tests on daily or weekly basis, the proposed framework allows for easier, cost 432 effective, and faster large-scale detection, especially for counties/areas who are putting 433 high expenses on such tests due to logistical complications. Alongside the rapid nature of 434 this approach, many healthcare service could be revived significantly by decreasing the 435 demand on clinicians or nurses. In addition, due to the ability of successfully detecting 436 asymptomatic subjects, it can decrease the need for extra equipment and costs associated 437 with further medication after the development of the viral infection in patients.

438
Clinically, it is better to have a faster connection between COVID-19 subjects and 439 medical practitioners or health authorities to ensure continues monitoring for such cases 440 and at the same time maintain successful contact tracing and social distancing. By 441 embedding such approach within a smartphone applications or cloud-based networks, 442 monitoring subjects, including those who are healthy or suspected to be carrying the 443 virus, does not require the presence at clinics or testing points. Instead, it can be 444 performed real-time through a direct connectivity with a medical practitioners. In 445 addition, it can be completely done by the subject himself to self-test his condition prior 446 to taking further steps towards the RT-PCR assay. Therefore, such approach could set 447 an early alert to people, especially those who interacted with COVID-19 subjects or are 448 asymptomatic, to go and further diagnose their case. Considering such mechanism in 449 detecting COVID-19 could provide a better and well-organized approach that results in 450 less demand for clinics and medical tests, and thus, enhances back the healthcare and 451 economic sectors in various countries worldwide.

453
This study suggests smartphone-based breathing sounds as a promising indicator for 454 COVID-19 cases. It further recommends the utilization of deep learning as a pre-455 screening tool for such cases prior to the gold standard RT-PCR tests. The overall 456 performance found in this study (accuracy 94.58%) in discriminating between COVID-19 457 and healthy subjects shows the potential of such approach. This study paves the way 458 towards implementing deep learning in COVID-19 diagnostics by suggesting it as a rapid, 459 time-efficient, and no-cost technique that does not violate social distancing restrictions 460 during pandemics such as COVID-19.