Retrospective Assessment of Deep Neural Networks for Skin Tumor Diagnosis

BACKGROUND: The aim of this study was to validate the performance of algorithm (http://rcnn.modelderm.com) for the diagnosis of benign and malignant skin tumors. METHODS: With external validation dataset (43 disorders; 40,435 clinical images from 10,450 patients; January 1, 2008 - March 31, 2019), we compared the prediction of algorithm with the clinical diagnosis of 65 attending physicians at the time of biopsy request. RESULTS: For binary-task classification of determining malignancy, the AUC of the algorithm was 0.863(95% CI 0.852-0.875) with unprocessed clinical photographs. The sensitivity/specificity/positive predictive value/negative predictive value of the algorithm at the predefined high-sensitivity and high-specificity threshold were 79.1%(76.8-81.3)/77.0%(76.1-77.8)/31.3%(30.3-32.3)/96.5%(96.2-96.9) and 62.6%(59.9-65.2)/90.0%(89.4-90.6)/45.3%(43.5-47.2)/94.8%(94.4-95.1), respectively. The sensitivity/specificity calculated by the clinical diagnosis of attending physicians were 88.1%/83.8% (Top-3) and 70.2%/95.6% (Top-1), which were superior to the algorithm. For multi-task classification, the mean Top-1,2,3 accuracies of the algorithm were 42.6{+/-}20.7%, 56.1{+/-}22.8%, 61.9{+/-}22.9%, and those of clinical diagnosis were 65.4{+/-}17.7%, 73.9{+/-}16.6%, 74.7{+/-}16.6%, respectively. CONCLUSIONS: The performance of the algorithm was lower than that of dermatologists in real practice. Not only image information, but also clinical information should be integrated for more accurate prediction.


Introduction
Deep learning algorithm has shown excellent results in analyzing fundoscopy images in ophthalmology 1 , chest X-ray 2 and CT images in radiology. 3 In dermatology, there have been remarkable advances that showed results comparable to dermatologists' diagnostic performance using both clinical photography 4-9 and dermoscopy images. 4,7,[10][11][12][13] However, it is unclear that these excellent results can be extrapolated in clinical setting as like product in other field 14 . On perspective of data relevancy, DeepMind's AlphaGo 15 was trained with Go data which had 100% data relevancy, and surpassed human champion. In speech recognition, gestures may be helpful in understanding conversation completely, but audio data shows data relevancy close to 100%. However, in medical field, clinical data used in training has relatively low data relevancy. For example, even if algorithm will be trained with countless brain MRIs, we cannot clearly answer whether the algorithm can truly diagnose Parkinson's disease with MRI alone. Although image of skin contains important relevant information, there is no study of how much important in skin cancer diagnosis. This study was designed to compare the performance of algorithm with that of attending physician in real practice for the diagnosis of benign and malignant tumors.
In addition, deep learning algorithm can produce reliable result only for the preselected diseases because algorithm may show epistemic uncertainty for untrained problems 16 . Because most of previous studies had a limitation that small number of disorders were validated 17 , we investigated the generalizability of our algorithm with almost all sorts of skin tumors.

Methods
With approvals of institutional review boards (Severance: #2019-0571 and Asan: #2017-0087), we gathered clinical photographs from Severance Hospital, Department of Dermatology, collected from January 1, 2008 to March 31, 2019. Attending physicians had recorded the clinical diagnosis on the biopsy request form considering patient's history and physical examinations. All skin lesions of patients over 19 years of age who were pathologically diagnosed among 43 primary skin tumors were included (Figure 1). We included cases with . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101//doi.org/10. /2019 4 / 28 single lesion because of the possible mismatch between the lesion and the diagnosis for patients with multiple lesions. We excluded the images of intraocular, postoperative, laser surgery, and other chief complaints, but we did not exclude normal images taken for record taking even if it did not include the lesion of interest. There were 66 cases with inadequate quality and 769 cases with which it was inherently impossible to detect lesions ( Figure   1). As a result, a total of 40,331 clinical images from 10,426 patients were included in this study (Table 1).

THE ALGORITHM
To validate the generalizability, the algorithm created in the previous study was tested again without modification in this study. The algorithm was trained not only with hospital archives but also with the archives generated with assistance of region-based convolutional neural network (RCNN) to reduce false positives. As a result, the algorithm was trained with 1,106,886 image crops. Algorithm has two parts: (A) lesion detection part and (B) disease classifier part. The lesion detection part was trained with faster RCNN (backbone=VGG-16) and CNN . The disease-classifier part was trained with CNNs (SENet and SE-ResNeXt-50).
The web-DEMO of the algorithm has been available in public (http://rcnn.modelderm.com) to facilitate scientific communications. We used the malignancy output which was predefined as: malignancy output = (basal cell carcinoma output + squamous cell carcinoma output + intraepithelial carcinoma output + keratoacanthoma output + malignant melanoma output) + 0.2 × (actinic keratosis output + ulcer output).

PREDEFINED HIGH-SENSITIVITY THRESHOLD AND HIGH-SPECIFICITY THRESHOLD
We used two predefined thresholds (high-sensitivity threshold which was formerly named as T 90 in the previous study and high-specificity threshold which was named as T 80 . These cut-off thresholds were defined as 90% or 80% sensitivity points with the dataset from Asan Medical Center, Department of Dermatology. In the dataset, all patients suspected of 10 major tumorous disorders (the same tumors as those of the Edinburgh dataset) from January 1, 2018 to June 30, 2018 were serially included. After pathologic confirmation, the dataset comprised of malignant tumors (81 patients), benign tumors (251 patients), and other various benign disorders (54 patients).

UNPROCESSED IMAGE ANALYSIS VERSUS CROPPED IMAGE ANALYSIS
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

/ 28
There are two different ways in using the algorithm for lesion diagnosis. The user or patient may localize the lesion of interest (cropped image analysis). If not, the algorithm needs to analyze unprocessed photograph without information on the lesion of interest (unprocessed image analysis). In unprocessed image analysis, our algorithm detected the lesion of interest in the photograph, and the highest malignancy output was used as the final output score among the malignancy outputs of suspected lesions. In cropped image analysis, the average malignancy output of the multiple lesions of interest was used as the final output score.

STATISTICAL ANALYSIS
We evaluated the performance of algorithm for the determination of malignancy. For the binary classification test, we draw ROC curve using the final output scores(the highest malignancy output in unprocessed image analysis and the average malignancy output in cropped image analysis ) and calculated area under the curve (pROC package version, 1.15.3; R version 3.4.4). The confidence interval was computed with 2,000 stratified bootstrap replicates.

BINARY CLASSIFICATION
With the Dataset A, the algorithm analyzed all photographs of each patient for determining malignancy. The AUC value for determining malignancy was 0.863(95% CI 0.852-0.875).
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The performances of two analysis methods (unprocessed image analysis versus cropped image analysis) were compared with 39,721 images from 10,315 patients ( Figure 3). The AUC of 0.881(0.870-0.891) in cropped image analysis was slightly better than the AUC of 0.870(0.858-0.881) in unprocessed image analysis (DeLong's test; P=0.0022).

MULTICLASS CLASSIFICATION WITH 32 SKIN TUMORS OF THE SEVERANCE DATASET
For multi-class classification test, we used AUC and Top-accuracy as evaluation metrics. We calculated both Top accuracy of total patients and Top accuracy of each classes because the number of patients varies in each classes.
The calculation of Top accuracy of each classes was performed with 32 disorders (Severance Dataset B; 39,721 images from 10,315 patients). We excluded 6 classes (angiofibroma, Café au lait macule, juvenile xanthogranuloma, milia, nevus spilus, and sebaceous hyperplasia) of which the number of patient was less than 10, and we also excluded 5 classes (Spiz nevus, dermatofibrosarcoma protuberans, angiosarcoma, Kaposi's sarcoma, and Merkel cell carcinoma) which were not trained to the algorithm. We calculated mean accuracy by averaging the accuracies of 32 disorders as follows: macro-averaged mean Top-(n) Accuracy = (Top-(n) Accuracy of actinic keratosis + Top-(n) Accuracy of angiokeratoma + … + Top-(n) Accuracy of xanthelasma) / 32. The macro-averaged mean Top-1 and 3 accuracies of clinical diagnosis were 65.4±17.7% and 74.7±16.6%, and those of the algorithm were 42.6±20.7% and 61.9±22.9%, respectively. In order to reflect the difference in the number of cases for each disease, the micro-averaged accuracy was calculated as follows : micro-averaged mean Top-(n) Accuracy = (Top-(n) matched cases in total) / 10,315. The micro-averaged mean Top-1 and 3 accuracies of clinical diagnosis were 68.2% and 78.7% and those of the algorithm were 49.2% and 71.2%, respectively.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019

MULTICLASS CLASSIFICATION WITH 10 SKIN TUMORS OF THE EDINBURGH DATASET
We performed an additional external validation with the Edinburgh dataset. The Edinburgh dataset consists mainly of Caucasians and is commercially available dataset with 10 benign and malignant skin tumor (Table 5; https://licensing.edinburgh-innovations.ed.ac.uk/i/software/dermofit-image-library.html). In previous studies 4,5 , the Edinburgh dataset was used as a validation dataset.
For the multi-class classification, the macro-averaged mean Top-1 and 3 accuracies were 53.0% and 77.6%, respectively and the mean AUC of each classes was 0.939±0.030 ( Figure 5 and Table 5).

Discussion
Deep learning algorithm has recently shown remarkable performance in dermatology. With clinical photographs 4-8 and dermoscopic images 4,7,10-13 , algorithm showed comparable or superior performance to dermatologists with regards to the problem of malignant melanoma vs nevus, and carcinoma vs benign keratotic tumors. Direct comparison between the results of various studies is difficult because inclusion and exclusion criteria of validation dataset are unclear and the validation datasets were private 18 . To date, there have been only few studies that compared the algorithm's performance with experts using external datasets. 19 It was impossible to test their algorithm in most studies and it was unknown whether the algorithm could reproduce similar performance with other datasets. In fact, several studies report that currently available smartphone applications did not show the dermatologist-level performance as shown in academic papers. 20,21 When investigating the performance of deep learning algorithm with skin diseases, there are various factors that result in poor reproducibility or generalizability in real practice.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019 8 / 28 1) Limited number of training classes: the number of disorders in validation dataset is usually much smaller than that in real practice 17 . Algorithm had shown good performance with a narrow range problem such as "melanoma vs nevus". However, as shown in Table 4, the mean Top-1 accuracy of clinical diagnosis was only 65.4%, which suggests that it is hard to choose the initial problem correctly. For example, the algorithm trained only with melanoma and nevus may be misused for the diagnosis of other disorder such as seborrheic keratosis, which result in epistemic uncertainty. 16 2) Innocent biases of internal validation: in internal validation, prediction may be drawn from unexpected false features instead of true features of disease. The composition of image and the presence of skin markings were likely to affect the prediction of algorithm. 22,23 From the perspective that algorithm may be trained with unexpected structured noise, the current convolutional neural network draws a statistical prediction from massive image data, rather than real intelligence.
3) Preselection of specialist: Preselection bias exists if the image is acquired in real time procedure, such as endoscopy, ultrasonography, and clinical photography. In dermatology, both training and test datasets are created by dermatologist and dermatologist determines which lesions need to be included and which composition is adequate. Because algorithm shows aleatory uncertainty, we may not get the same performance without the preselection of dermatologist. 4) Unclear inclusion and exclusion criteria: selection bias affects the result if the cases with negative results were excluded more frequently for any reason. All patients in the archives should be selected based on specific exclusion criteria to prevent selection bias. 4) Specific optimization: hyperparameters of algorithm may be specifically optimized to the validation dataset at the test time, and lack of generalizability for other dataset. 5) Data leakage: data leakage occurs when training and test dataset were not split completely, or when the data set framework is designed incorrectly. In case of data leakage, model uses inessential additional information for classification. 24 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint 9 / 28 7) Non-representativeness of training dataset 25 : hospital achieve include extraordinary case which requires biopsy. For example, there are many cases of small melanoma which does not have characteristic bizarre morphology. If algorithm was trained through these extraordinary cases, it may show many false positives. In addition, hospital achieve does not usually have enough number of images of common disorders, which reduces the diagnostic accuracy for common disorders. 8) Data relevancy: although algorithm analyze image better than dermatologist, dermatologist in real practice shows much better diagnostic accuracy because dermatologist considers all patient's information for diagnosis.
To minimize biases, this study was designed with the following considerations. 1) To prevent uncertainty for untrained classes, we included almost all kinds of skin tumors (43 disorders). 2) To avoid selection bias, all biopsied cases in one hospital were included with exclusion criteria. 3) In order to prevent innocent bias from data leakage, we tested with two external validation datasets, the Severance dataset and Edinburgh dataset. 4) To show generalizability, we verified previously created algorithm with new validation dataset built later. 5) The diagnostic accuracy of algorithm was compared with that of attending physician who considered all information of patient for diagnosis. 6) Algorithm was tested with unprocessed images without preselection of dermatologist.
To date, most comparisons between dermatologist and algorithm have been performed using single cropped image of lesions. Deep learning algorithm has shown comparable performance with dermatologists in the same settings. 4,5,7,8,[11][12][13] Meta-analysis of medical studies also showed that, under the same conditions, the performance of algorithms was comparable to that of healthcare specialists. 19 In this study, we investigated the diagnostic accuracy of the dermatologist in real practice with biopsied cases.
Skin cancer diagnosis is a difficult task for experienced dermatologists even when given all clinical information.
The sensitivity which calculated from three differential impressions of the dermatologists was 88.1%, meaning that 11.9% of malignant tumors were ignored by the dermatologists even after thorough examination, but the presence of malignancy was notified later after receiving the biopsy report. The malignancy prediction derived from the first clinical impression was 70.2%, implying that there was 29.8% misdiagnosis in the initial impression. In the multiclass classification which requires correct diagnosis, the mean Top-1 and 3 accuracy of the clinical diagnosis were lower as 65.4±17.7% and 74.7±16.6% as shown in Table 4. Deep learning algorithm . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint 10 / 28 shows uncertainty for untrained task 16 , therefore, if the algorithm is not an unified classifier but a classifier trained with limited classes, the precondition may be wrong 26 .
The performance of attending physician significantly improves after considering various information of patient.
In the actual medical practice, physician consider not only the visual information of the lesion but also various information such as previous medical history, referral note, demographics, family history, and physical examination, to improve the diagnostic accuracy. In radiology or pathology, the actual reader does not diagnose with image alone, but they also checks medical history through chart review and reflects clinicopathologic correlation. At present, there has been several reports 27-29 to reflect patient's metadata with multimodal approach.
The results obtained by combining images and patient clinical information showed a general improvement of around 7% in accuracy. 27 Algorithms may play an important role in the screening of suspected patients if mass diagnostics task will be done only with image information. Our algorithm showed a comparable performance in full automatic mode regardless of preselection of the lesion (AUC with unprocessed images = 0.870 vs AUC with cropped image = 0.881). In previous study 9 with unprocessed facial images, our algorithm showed an AUC of 0.896 with the internal validation (386 patients; 81 with malignant tumor and 305 with benign tumor) and the performance of algorithm was comparable with that of 13 dermatologists in terms of F1 score and Youden index. We reproduced a similar AUC of 0.863 with large scale external validation dataset.
Algorithm can work endlessly with minimal cost. In case of cancer screening, clinical photographs may be available alone without other information. In this study, the sensitivity at the high-sensitivity threshold was 79.1%, which was located between the sensitivities (70.2% and 84.9%) derived from the Top-2 and Top-1 clinical diagnoses. In the previous study, approximately 50% images of malignant cases could be misinterpreted to benign by general public 9 . We expect algorithm-based cancer screening helps appropriate referral to dermatology in the future.

LIMITATION
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019

/ 28
Although we obtained a mean AUC of 0.939±0.030 (Table 5) with the Edinburgh dataset (1,300 images of 10 benign and malignant tumors) which mainly consists of Caucasians, we need to test our algorithm with unprocessed clinical photographs of various race/ethnicity. With the identical image of dysplastic nevus, we should warn a high chance of melanoma for Caucasian, but may recommend close-observation for Asian because melanoma is a rare disorder in Asian. We also need to test with Asian in other regions because disease prevalence may vary from region to region even if the races are the same.
In this study, most of training and validation images were taken with adequate quality. The performance of algorithm could be affected by image quality more than that of human. 22,30 Therefore, the performance of the algorithm on the photographs taken by the public must be verified in further study.

CONCLUSION
In this study, the attending physician in real practice made more accurate diagnosis because not only image information but also clinical information was considered to make a diagnosis. Without preselection of lesion by dermatologist, our algorithm showed the AUC of 0.863, which implies unprecedented potential as a mass screening tool. More clinical information should be integrated for more accurate prediction of the algorithm in the future.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019 12 / 28 ACKNOWLEDGMENTS Thanks to PhD, Park Jeoongsung in Qualcomm for helping to stabilize the web-DEMO.

AUTHOR CONTRIBUTIONS
Han and Moon designed experiments.
Han, Moon, Lee, Na, Kim MS, Park GH, and Kim SH prepared validation datasets.
Han, Moon, and Chang performed experiments.
Han, Moon, Park, Kim MS, Na, and Kim KW interpreted results.
Han, Moon, and Kim KW wrote the manuscript and prepared the figures.
Chang and Lee supervised the study.
Han created and managed the web-DEMO.
All authors approved the final version of the manuscript.

DATA AVAILABILITY
The images used to test the neural networks described in the manuscript are subject to privacy regulations and cannot be made available in totality. The test subset may be available upon a reasonable request and an approval of IRB of the originating university hospital.

FUNDING
None . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

FIGURE LEGENDS
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint images contained lesion of interests, while the remaining 5,049 images were photographs without lesion, but these photographs were taken for the purpose of observation or comparison.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Figures 2. ROC curve of Algorithm Compared with Diagnostic Performance of the Clinical Diagnosis
Algorithm analyzed the Severance Dataset A (40,331 unprocessed images from 10,426 patients; 1,222 malignancy and 9,204 benigns).

Black curve -algorithm (unprocessed-images analysis)
Diamond -algorithm at the predefined high-sensitivity threshold Round -algorithm at the predefined high-specificity threshold Red, orange, and yellow circles -malignancy determination derived from Top-1, Top-2, and Top-3 clinical diagnosis of 65 attending physicians . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019

/ 28 Figures 3. Performance Comparison Between Unprocessed Image Analysis and Cropped Images Analysis
Black curve -algorithm (unprocessed-image analysis) Gray curve -algorithm (cropped-image analysis) Severance Dataset B (39,721 images from 10,315 patients) was used for the comparison. As shown in Figure 1, the images of small classes and untrained classes were not used for this test. In cropped-image analysis, algorithm analyzed cropped lesion of interest which was pointed out by user. In unprocessed-image analysis, algorithm detected location of lesion and analyzed the suspected lesions.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint 17 / 28 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint 19 / 28 All the images of the Edinburgh dataset were cropped images of lesion of interest.
We draw the ROC curve in a one-versus-rest manner.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101//doi.org/10. /2019  . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https: //doi.org/10.1101//doi.org/10. /2019 23 / 28 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019
The images analyzed were cropped around lesion of interest.
We calculated the AUC value of ROC curve in a one-versus-rest manner.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
All the images of the Edinburgh dataset were cropped images of lesion of interest.
We calculated the AUC value of ROC curve in a one-versus-rest manner.
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019 26 / 28 REFERENCES   1  .  G  u  l  s  h  a  n  V  ,  P  e  n  g  L  ,  C  o  r  a  m  M  ,  e  t  a  l  .  D  e  v  e  l  o  p  m  e  n  t  a  n  d  v  a  l  i  d  a  t  i  o  n  o  f  a  d  e  e  p  l  e  a  r  n  i  n  g  a  l  g  o  r  i  t  h p  e  r  f  o  r  m  a  n  c  e  m  e  d  i  c  i  n  e  :  t  h  e  c  o  n  v  e  r  g  e  n  c  e  o  f  h  u  m  a  n  a  n  d  a  r  t  i  f  i  c  i  a  l  i  n  t  e  l  l  i  g  e  n  c  e  .   N  a  t  u  r  e  m  e  d  i  c  i  n  e  2  0  1  9  ;  2  5  :  4  4  .   1  5  .  S  i  l  v  e  r  D  ,  H  u  a  n  g  A  ,  M  a  d  d  i  s  o  n  C  J  ,  e  t  a  l  .  M  a  s  t  e  r  i  n  g  t  h  e  g  a  m  e  o  f  G  o  w  i  t  h  d  e  e  p  n  e  u  r  a  l  n  e  t  w  o  r  k  s   a  n  d  t  r  e  e  s  e  a  r  c  h  .  N  a  t  u  r  e  2  0  1  6  ;  5  2  9 : 4 8 4 . . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2019.12.12.19014647 doi: medRxiv preprint 28 / 28   3  0  .  D  o  d  g  e  S  ,  K  a  r  a  m  L  .  A  s  t  u  d  y  a  n  d  c  o  m  p  a  r  i  s  o  n  o  f  h  u  m  a  n  a  n  d  d  e  e  p  l  e  a  r  n  i  n  g  r  e  c  o  g  n  i  t  i  o  n   p  e  r  f  o  r  m  a  n  c  e  u  n  d  e  r  v  i  s  u  a  l  d  i  s  t  o  r  t  i  o  n  s  .  2  0  1  7  2  6  t  h  i  n  t  e  r  n  a  t  i  o  n  a  l  c  o  n  f  e  r  e  n  c  e  o  n  c  o  m  p  u  t  e  r  c  o  m  m  u  n  i  c  a  t  i  o  n   a  n  d  n  e  t  w  o  r  k  s  (  I  C  C  C  N  )  ;  2  0  1  7  :  I  E  E  E  . p . 1 -7 .
. CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019 . CC-BY 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10. 1101/2019