Applying Deep Learning to Specific Learning Disorder Screening

Early detection is key for treating those diagnosed with specific learning disorder, which includes problems with spelling, grammar, punctuation, clarity and organization of written expression. Intervening early can prevent potential negative consequences from this disorder. Deep convolutional neural networks (CNNs) perform better than human beings in many visual tasks such as making a medical diagnosis from visual data. The purpose of this study was to evaluate the ability of a deep CNN to detect students with a diagnosis of specific learning disorder from their handwriting. The MobileNetV2 deep CNN architecture was used by applying transfer learning. The model was trained using a data set of 497 images of handwriting samples from students with a diagnosis of specific learning disorder, as well as those without this diagnosis. The detection of a specific learning disorder yielded on the validation set a mean area under the receiver operating characteristics curve of 0.89. This is a novel attempt to detect students with the diagnosis of specific learning disorder using deep learning. Such a system as was built for this study, may potentially provide fast initial screening of students who may meet the criteria for a diagnosis of specific learning disorder.


Introduction
Specific learning disorder is a neurodevelopmental disorder that can be detected only after formal education starts (American Psychiatric Association, 2013). About 10 percent of school-age children are diagnosed as having this disorder (Fortes et al., 2016;Gorker et al., 2017). Specific learning disorder can manifest in several different academic areas including reading, writing and mathematics (American Psychiatric Association, 2013). When this diagnosis is about an impairment in reading, symptoms may include difficulty with word accuracy, reading fluency, and reading comprehension. In impairment in written expression, symptoms may include difficulty with spelling, grammar, punctuation and organization.
Mathematical impairments may include memorization of mathematical facts, fluent calculation and mathematical reasoning (American Psychiatric Association, 2013). The aforementioned symptoms are further clarified according to severity of mild, moderate or severe (American Psychiatric Association, 2013). A diagnosis of specific learning disorder is complex and made through a combination of observation, interviews, family history, and school reports (American Psychiatric Association, 2013;McDonough et al., 2017).
Early detection is vital for children with specific learning disorder. If this diagnosis is undetected, detrimental consequences including high levels of psychological distress, depression, suicidality, and poorer overall mental health may ensue (American Psychiatric Association, 2013). On the other hand, early detection and intervention can significantly mitigate the negative impact of specific learning disorder on mental health (American Psychiatric Association, 2013). Early diagnosis helps in preventing the frustration and decrease in wellbeing caused by an undiagnosed specific learning disorder (Lombardi et al., 2019).

Deep Learning and Diagnosis
Deep learning algorithms are more accurate than human beings in many visual tasks such as strategic board games, human and chimpanzee facial recognition, plant disease identification, and object recognition (Esteva et al., 2017;Ferentinos et al., 2018;Schofield et al., 2019). In addition, deep learning algorithms perform better than humans in medical diagnosis based on visual data such as skin cancer classification, breast cancer screening, and pneumonia detection (Esteva et al., 2017;McKinney et al.,2020;Rajpurkar et al., 2017).
Advances in computation, very large datasets and emerging new techniques enable deep learning algorithms to recognize very complex patterns in data that are beyond human perception (Esteva et al., 2017).
The medical diagnostic world is fundamentally affected by this progress as we witness more and more successful deep learning applications that help with the medical diagnostic process (Esteva et al., 2017;Kermany et al., 2018;McKinney et al ., 2020;Rajpurkar et al., 2017). Deep learning applications for mental disorder screening have been based mainly on data from neuroimaging (Galatzer-Levy, Karstoft, Statnikov, & Shalev, 2014;Vieira, Pinaya, & Mechelli, 2017). A range of psychiatric and neurological disorders such as post-traumatic stress disorder, depression, schizophrenia and more, can be screened from neuroimaging data using deep learning (Vieira et al., 2017). In addition, neurodevelopmental disorders such as attention deficit hyperactivity disorder and autism spectrum disorder can be screened from neuroimaging data with deep learning (Heinsfeld, Franco, Craddock, Buchweitz, & Meneguzzi, 2018;Vieira et al., 2017).
Only a few studies (Gurovich et al., 2019;Mor & Dardeck, 2018;Rad et al., 2018;Shukla, Gupta, Saini, Singh, & Balasubramanian, 2017) have been published on using deep learning that do not employ neuroimaging to flag possible mental disorders. This fact impedes the implementation of deep learning in the diagnostic screening process of mental disorders because neuroimaging is rarely used in psychology because of its high cost (Galatzer-Levy et al., 2014). Mor and Dardeck (2018)  The purpose of the current study was to evaluate the ability of deep learning to distinguish between those who have a specific learning disorder and those who do not, from their handwriting. Outfitted with deep learning, mobile devices can assist with the rapid screening of students with specific learning disorder based on their handwriting. This in turn, may contribute to early detection and intervention after a careful follow-up evaluation.

Sample and Outcome Measure
The target population for this study included high school students between 15 and 18 years old from Hadash High School, Bat-Yam, Israel. Handwriting samples were collected from 152 students who volunteered to participate in this study. No remuneration was promised or given. Students volunteered to provide their old notebooks. About 500 pages of handwriting were scanned and saved as images. Two completely sealed and locked boxes were placed in one classroom, for a few hours after the school day, for 2 consecutive days.
One box was intended for notebooks of students who had been previously diagnosed as having specific learning disorder, while the other was designed for students without specific learning disorder. Diagnosing the students had previously been done and was unrelated to this study. The notebook collection process was voluntarily conducted with complete anonymity.
The outcome measure of this study is a dichotomized variable of no diagnosis of specific learning disorder versus diagnosis of specific learning disorder.

Modeling Approach
Deep convolutional neural networks (CNNs) are the state of the art technology in visual tasks (Esteva et al., 2017). MobileNetV2 is a deep CNN which achieves cutting edge results in visual tasks (Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018). The great benefit of MobileNet models is that they were designed to be deployed on mobile devices, allowing a rapid inference from a photo taken on a mobile device (Howard et al., 2017;Sandler et al., 2018). MobileNet models were trained on the ImageNet dataset which contains more than 14 million images with 1000 object categories (Howard et al., 2017;Sandler et al., 2018).
MobileNet models specialize and excel in several visual tasks including object detection, face attributes, fine-grain classification, and landmark recognition (Howard et al., 2017), as demonstrated in figure 1.
Transfer learning is a technique where a model developed for a task is reused as the starting point for a model on a second task. This technique involves removing the last layer of the pre-trained deep neural network, adding new layers suitable for a current specific task, and training with a new dataset (Esteva et al., 2017;Khan et al., 2019). Transfer learning is a very useful technique in which researchers can utilize pre-trained, state of the art deep neural networks (Khan, Islam, Jan, Din, & Rodrigues, 2019).
In this study, the pre-trained MobileNetV2 (Sandler et al., 2018) architecture was utilized using transfer learning. MobileNetV2 is a suitable architecture for transfer learning in visual tasks as needed in this study. The last SoftMax layer of the MobileNetV2 architecture designed for classification of 1000 different classes of the ImageNet dataset was removed, and 3 hidden layers of Relu neurons were added: layer 1 of 800 neurons, layer 2 of 400 neurons, and layer 3 of 200 neurons. Additionally, the last layer of a single sigmoid neuron for classifying the 2 desired classes in this study was added: no diagnosis of specific learning disorder versus diagnosis of specific learning disorder. Table 1   Area under the curve is the area between the receiver operating characteristic (ROC) curve and the x-axis. The receiver operating characteristic curve is defined by plotting the true positive rate against the false-positive rate at different thresholds (Majnik & Bosnić, 2013). The area between the receiver operating characteristic is an unbiased metric of performance and can be compared to AUC of different systems (Karstoft, Statnikov, Andersen, Madsen, & Galatzer-Levy, 2015). Precision is defined by true positives divided by the sum of true positives and false positives (Goutte & Gaussier, 2005). Recall is defined by true positives divided by the sum of true positives and false negatives (Goutte & Gaussier, 2005). The F-score is a balanced metric, defined by a weighted average of precision and recall (Hand & Christen, 2018). Accuracy is defined by all true predictions of the model divided by the total of all predictions (Sim et al., 2019).

Descriptive Statistics
All the students who provided their notebooks were high school students from Hadash High School, Bat-Yam, Israel. They were all between 15 and 18 years old. Consistent with the prevalence of specific learning disorder reported in the literature (American Psychiatric Association, 2013), 17 of the 152 students who participated (11%) had the diagnosis of specific learning disorder.

Main Analyses
The model was trained for 25 epochs. The model yielded the best accuracy after 21 epochs and started to decline from epoch 22, as expected because of overfitting (Cha et al., 2019). Figure 2 shows the working system. The model yielding the best accuracy was saved for further analysis of performance metrics. The model yielded: AUC= 0.89, precision=0.94, recall =0.89, F-score=0.91, and accuracy=0.92. Figure 3 presents the changes in accuracy during training.

Discussion
This study evaluated the ability of deep learning algorithms to screen students with specific learning disorder by using their handwriting. This was the first study that applied deep learning to screening for specific learning disorder classification from handwriting samples that were easily collected for fast inference and detection.  (Vieira et al., 2017). Values of performance metrics in other studies using deep learning to detect mental disorders from neuroimaging data were between 0.65 to 0.95 (Vieira et al., 2017). The reported accuracy of the model designed to identify facial phenotypes of genetic disorders using deep learning was 0.91 (Gurovich et al., 2019).
The AUC and F-score of the model designed to identify people at risk for PTSD using ecological factors and deep learning were 0.91 and 0.83 respectively (Mor & Dardeck, 2018).
The finding that deep learning applied to handwriting samples provides efficient initial screening of students for specific learning disorder is promising. About 10% of schoolage children have specific learning disorder (American Psychiatric Association, 2013;Fortes et al., 2016). Screening of specific learning disorder using handwriting and deep learning can make the complex task of specific learning disorder diagnosis faster and simpler.
It is important to mention that we are not suggesting that such a model would replace the essential diagnostic process in which mental health professionals consider a combination of information from observations, interviews, family history, and school reports (American Psychiatric Association, 2013). We are suggesting, however, that a model such as the one designed for this study can provide fast-initial screening of students for specific learning disorder. This could, therefore, significantly contribute to early detection and intervention.

Applicability of This Study
About 6 billion smartphone subscriptions will exist by the end of 2020 (Esteva et al., 2017). Smartphone applications that can help with the initial screening of medical or mental disorders would provide low-cost universal access to essential diagnostic care (Esteva et al., 2017). The deep learning model built in this study is based on MobileNet which was designed for smartphones (Howard et al., 2017). MobileNet provides fast and accurate performance deployed on mobile devices (Howard et al., 2017). Outfitted with a CNN, mobile devices can aid educators, reading specialists, and other relevant professionals with a means to achieve fast initial screening of specific learning disorder. Screening for students with specific learning disorder using this system, requires no more than taking a photo of handwriting on a smartphone, uploading, and sending it to the model, and receiving the model answer. For further edification, the system designed in this study may be viewed at https://colab.research.google.com/drive/1SUByhCjS29pR_njEwFKD7v3YFZ9C_i9H

Limitations and Recommendation for Future Work
This study was conducted on students of Hadash High School, Bat Yam, Isarel. The results of this initial study cannot be generalized beyond this specific Hebrew speaking population. It would be important and interesting to assess handwriting of students using multiple languages to get a picture as to how the algorithm holds up across different alphabets and writing systems. In order to increase the generalizability of our model, the main recommendations for future work include collecting handwriting samples from many different populations in many different languages, thereby significantly increasing the size of the handwriting data set. The size of the training data set is the most important factor for enhancing the generalizability of deep learning models (Perez & Wang, 2017).

Summary, Conclusion and Future Directions
This study demonstrated the feasibility of screening students with specific learning disorder from handwriting using a deep learning algorithm. The model designed in this study can be easily deployed on smartphones, enabling fast initial screening of students with specific learning disorder simply by taking a photo of their handwriting. Early intervention is essential for children with specific learning disorder, and such a system as developed in this study may significantly contribute to early detection, and subsequent intervention. The system in this study is far from a universal, optimal solution because the training data set was limited. It is hoped, however, that the study's findings will serve as an inspiration for the future development of a universal solution for early screening and detection of specific learning disorder, which would ideally include many different populations from across the world.