Abstract
The pre-adolescent growth period is the best time for the skeletal Class-III malocclusion treatment. Diagnosis and treatment during this period continue to be a complex orthodontic problem. Class-III malocclusion is complicated to treat with braces frequently requiring surgical intervention after a pubertal growth spurt. In addition, delayed recognition of the problem will yield significant functional, aesthetic, and psychological concerns. This study presents the first fully automated machine learning method to accurately diagnose Class-III malocclusion applied across mobile images, to the best of our knowledge. For this purpose, we comparatively evaluated three machine learning approaches: a deep learning algorithm, a machine learning algorithm, and a rule-based algorithm. We collected a novel profile image data set for this analysis along with their formal diagnosis from 435 orthodontics patients. The most successful method among the three was the machine learning method, with an accuracy of %76.
1. Introduction
Skeleton Class-III malocclusion is characterized by the sagittal developmental retardation and/or rearward positioning of the maxilla or the extreme development and/or forward positioning of the mandible [1]. Class-III malocclusion causing a concave profile causes aesthetic anxiety in the child and the parents [2]. The correction of the harmony between the mandible and maxilla can be provided by orthodontic/orthopedic appliances to be performed in early terms on the patients affecting the active sutures unclosed. Although the ideal age of treatment with orthopedic devices is preferably eight at the latest, it has also been reported that they can be effective up to 12 years of age [3]. It has been shown that early Class-III orthopedic treatment reduces the need for orthognathic surgery [4]. The American Association of Orthodontists (AAO) recommends that all children be checked by an orthodontist no later than age seven. However, the most common age group during which patients seek orthodontic treatment is 12 years and older [5]. Moreover, many children do not have access to an orthodontic pre-examination at seven and before. Advances in consumer electronics and portable communications systems, particularly mobile phones, have led to faster and less expensive approaches to developing Point of Care Diagnosis[6]. Number of mobile subscribers globally according to GSMA Intelligence 2021 data reached approximately 5.22 billion [7]. Triggered by the Covid-19 pandemic period, the tele-health industry and more specifically the mobile health industry has become a 100 billion dollar market by increasing five times since 2016 as of the end of 2021 [8]. There are several dental mobile applications in the current healthcare mobile application market [9]. Current applications are mainly developed for patient education about general dentistry [10], braces, Invisalign (Align Technology, San Jose, Calif), and oral health [11]. The commercial software Phimentum[12] claims using deep-learning for automatic landmark point detection on cephalometric images. Unlike Phimentum, our program has the advantage of not requiring cephalometric image input and/or an orthodontist’s supervised selection of landmark points can compute the probability of Class-III diagnosis without any other apparatus other than mobile phones.
Our previous study [13], implemented an unsupervised diagnostic method based on the angles computed for Turkish adult patients [14]. Our previous study’s disadvantage was that it was only applicable to Turkish adult patients. Our present study aimed to compare the accuracies of three alternative unsupervised machine learning approaches for malocclusion diagnosis. Best performing model of three will be adaptable to be trained on any given image set of any race or age rather than a fixed threshold determined for Turkish patients. For this purpose, we implemented and presented the accuracies of three methods; deeplearning algorithm, a machine learning algorithm, and a rule-based algorithm for an in-house image data set we collected from orthodontics patients.
2. Collection of Data
The data set created within the project’s scope consists of profile photos of patients visiting Orthodontics at Bezmialem Vakıf University Faculty of Dentistry. The Institutional Review Board approval (IRB# 54022451-050.05.04) was obtained from Bezmialem Vakıf University to use profile image data of patients who applied to Bezmialem Vakıf University in our project. The orthodontist with 20 years of orthodontics clinical experience diagnosed the patient images and classified them into three classes, Class-I, II, and III using Dolphin Imaging software (Version 11.95). The profile image taken for each patient is saved with an anonymized name in JPG/PNG format. Our anonymized dataset consists of 435 profile images and with their formal diagnosis from the orthodontics clinic.
3. Materials and Methods
Three alternative methods, namely the Rule-based, Deep Learning, and Machine Learning methods were implemented (Fig-1).
Flow-chart of the implemented pipeline
In this three methods we used mainly python programming language and it’s libraries scikit-learn for Machine Learning approach, OpenCV for image processing and pandas, seaborn etc. libraries for visualization and data analysis. In addition to that we used Tensorflow-Keras for deep learning approach. We used script base complier to run our python codes and other required python libraries.
3.1. Deep Learning Approach
For Deep Learning Method, firstly, the images were pre-processed, and then the deep learning model was created.
The images were first converted to gray-scale using the Python Open-CV library. Each image was scaled to 255×255 to standardize the images. Gaussian filtering was applied to scaled images using a 5×5 kernel matrix using OpenCV. Gaussian filtering (Gaussian Blur) was applied to reduce the noise and detail. Then, the median values were computed for each picture. With these median values, the contours of the profile images were extracted using OpenCV “Canny Edge detection” on each picture. Using dilation (Spreading and Expansion) and erosion operations pixels were added to the silhouette to make them more prominent. We applied dilation to expand the borders; thanks to this expansion, we enlarged the pixel groups and reduced the spaces between the pixels.
Next, the images were normalized using the min-max normalization formula (Fig-4), as all the images have a matrix with pixel values (0,255). As a result of this process, we converted our pictures into binary (0,1) form. Afterwards, we converted our matrix into a one-dimensional list by applying Flatten in order to use the 2D picture matrices we obtained in deep learning. As a result each picture’s silhouette, (Fig-3) was acquired in a single dimension.
Image processing pipeline.
Silhouette output of the deep-learning pre-processing.
Normalization formula.
The images were split into 80% training and 20% test set. 80% of the images were used for training the deep-learning model. Our deep learning model used a 4-layer structure, an input layer, two hidden layers, and an output layer. Since we were aiming a binary classification, we used “Relu” as the activation function in our input layer, and we used the sigmoid function in the intermediate layers. We used “Adam” as the optimizer and “cross entropy” as the optimizer loss function.
After creating the training model, we tested our model with our independent test data. As a result, a deep-learning model with 70% accuracy and 64% sensitivity was created for the disease classification. As a higher accuracy was required for a clinical setting, we experimented with additional unsupervised methods that do not require a pre-set angle coefficient for diagnosis. To achieve this goal, we implemented the rule based method as the next step.
3.2 Rule Based Approach
The same re-sizing and pre-processing steps were carried out for the critical value method on the images. Using the Python face-alignment library [15], 68 facial landmark points were selected., Po’, Sn’, A’, Ls, Li, B, Pg’, and Gn’ were determined among these landmarks. After determining the other points requested by the orthodontist (Fig-6), the area between the Po’ point at ear level and the area surrounding the Li (Lower Lip) part green area in Fig-7) and the remaining area between Ls - Sn’-Po’ (red zone in Fig-7) Python OpenCV After masking, the ratio of the regions were computed in pixels.
Deep learning model used.
Two regions compared.
Two regions compared.
For the area ratios of all patients, a total of 435 images, 162 Class-I, 163 Class-II, and 110 Class-III, we first plotted the descriptive statistical analysis and visualization.
Histogram graphs were plotted to show the distribution of each class; Class-I, Class-II, and Class-III (Fig-8). As seen in the histograms there is no left or right skewness (Left-Right Skewed Distribution) and the data show a normal distribution
Histogram of each class showing the normalized number of samples.
The boxplots of the three classes in Fig-9 imply that the area ratios of Class-III patients were distinctly different from the other classes. Outliers in the boxplot were filtered and checked for misdiagnosis.
Boxplots of Class-I, Class-II, and Class-III.
In Fig-10, scatter plot of the patients also shows that Class-III patients have distinct area ratio values. As seen in the density plot in Fig-11, the area values showed a normal distribution for all classes yet the mean of the Class-III area ratio was higher than the other two classes.
Scatter plot.
Density plot.
3.2.1. Evaluation of Class-III vs not
Classification of area ratios for our goal, we designed binary classification experiment of Class-III and not Class-III. In this setting, 110 patients were Class-III and 325 patients were not Class-III (Fig-12).
Distributions of Class-III and non-Class-III.
The area ratios of Class-III and non-Class-III followed a normal distribution (Figure 12-13).
Histograms of Class-III and non-Class-III.
3.2.2. Cleaning Up Outliers
Outliers were detected using the formula of Interquartile range (IQR) [Q1-1.5*IQR -Q3+1.5*IQR], we found 12 outliers and removed them from the dataset. We obtained a total of 423 patients. We repeated the analysis as in Fig-14.
Plots of Class-III and non-Class-III after cleaning outliers.
3.2.3 Experimental Setup
To check whether the distinct area ratio difference of Class-III is random or not, we have created an experimental setup (Fig-15) in the form of a critical value (threshold). In the experiment setup, we divided the Class-III and non-Class-III data into 5 parts with 5-fold separately, and the average of 4 parts of the data set was labeled training. The same process was repeated for the non-Class-III dataset. The resulting rule-based value was tested with the independent test dataset of Class-III and non-Class-III data. The process was repeated for the entire 5-fold. The classification was labeled not Class-III if the test data area ratio was greater than the critical value and Class-III if it was less than the critical value. As a result of classification
Experimental set up for classification.
3.2.4. Results
We shuffled our experimental dataset 100 times to avoid bias, dividing the dataset 5-fold each time. The average and standard deviation of these 100 training shuffles were computed below.
3.2.5. Results(without outliers)
The same calculation and method were also applied on the data without the outliers, which increased the performance by ∼%2.
After clearing the outliers we randomly shuffled our dataset each time, divided it into 5-Folds, repeated it 100 times, and calculated the average and standard deviation of these 100 training tests so that there would be no bias during training and testing.
According to the results of the area ratio rule based experiment, specific facial landmarks such as Po’, Sn’, A’, Ls, Li, B’, Pog’, Gn’, the area surrounding the Po’ point at the level of the Li (Lower Lip) and the remaining area between Li’ - Sn’ - Po’ (red zone in Fig-7) differs distinctly for skeletal Class-III malocclusion patients. The ratio method is an important distinguishing factor for Class-III classification; therefore, we decided to add it as a feature to our machine learning model.
3.3. Machine Learning Approach
The machine learning method was performed in two main steps.
3.3.1) Extraction of orthodontic features from images.
3.3.2) The data set was labeled and the machine learning method is applied.
3.3.1. Feature Extraction from Images
For calculation of H-Angle, we used the Python face-alignment library [15]. Firstly soft tissue nasion(N’), soft tissue chin(Pg’) and upper lib(Ls) points (see Table 2) were computed. A line was drawn between Pg’ and N’ and a line between Pg’ and Ls, and the angle between the lines passing through these points was calculated as in Fig-17.
Accuracy of the deep learning method.
Descriptions of facial landmarks.
Extraction of orthodontic features from images.
H-Angle of profile images.
To calculate the H angle in between, these points’ x and y coordinate values were extracted as a 2-dimensional plane. With these coordinates, the slopes of these two lines were calculated using the formula in Fig-18-b, and the angle H, which is the angle between the two lines, was calculated with the formula in Fig-18-a using these slopes.
(a) Angle between 2 points, (b) slope of a line.
The threshold value computed in section 3.2(rule based method) was incorporated as an additional feature.
For TVL line feature, 68 reference points on the face were computed as x and y values by drawing a parallel line to the Sn (Subnasale) point and perpendicular to the transverse plane.
The distances of each point to the TVL line were computed, and distance ratios were calculated as taking the TVL line as a reference. We aimed for a correct diagnosis from profile pictures taken by any mobile phone, but general variables of the images such as proximity, distance, location, and angle were not standard.. To achieve higher accuracy with the nonstandard photos, we decided that our machine learning model should handle the lengths of the points from the TVL as they were scalable. For this purpose, we scaled all the profile photos by dividing the distances between the landmark points by a reference length. To pick the best reference length to scale by, we experimented with all paired combinations of 68 points. As a result, when the TVL was divided by the length between G and Gn, the most accurate distance to TVL scaling was obtained. The distances between the “Gn’”, “Sn’”, “A’”, “UL’”, “LL’”, “B’” and “Pog’” points on the face and the TVL were computed.
3.3.2. Machine Learning Model
Firstly, we divided our data set into 75% training and 25% testing. Next, we selected the most appropriate model and parameter optimization on 75% of the training data. We calculated the correlations for the features, evaluated the features’ correlation against each other, and removed the “Pog’” data, as its correlation was greater than 0.80. No data was removed since we created our dataset by feature extraction from images.
Since the data set, we used in the machine learning method was not balanced, we created a balanced data set with random selection of the negative class and kept the entire positive class. We compared the results by repeating the experiment twice for our balanced (Table-6) and unbalanced data sets (Table-7). In this experiment, we used 5-fold cross-validation for model selection and repeated this experiment 100 times on 18 models (Fig-20). We selected the model with the best results after the cross-validation. After hyperparameter optimization, the best parameters were selected. By creating our models according to these parameters, we tested them on the test data and selected the model with the best results.
The result of the experiment (3.2.3) for the whole data set.
Summary of the mean accuracies of 100 repeats.
The result of the experiment (3.2.3) for the cleaned dataset.
Summary of the mean accuracies of 100 repeats.
Balanced dataset results
Unbalanced dataset results.
Hyperparameter optimization results for four models.
Balanced data test results for each model.
Unbalanced data test results for each model
True Vertical Line (TVL).
Experimental set-up for Machine Learning.
As the Random Forest, Logistic regression, Ridge Classifier, Extra Trees Classifier, Linear SVM performed best, and we optimized the parameters on these five models to select the most optimal parameters. In addition, we applied parameter optimization on a few more models with close cross validation performances.
3.3.3. Hyper parameter optimization
We made hyperparameter optimization over the selecting models.
3.3.4. Testing of Critical Models
Although the unbalanced dataset resulted in higher accuracy, F1 and precision were low. Therefore, balanced dataset’s accuracy, F1 and recall score more represent the actual production accuracy. The best result for the balanced machine learning model was Logistic Regression with 76% accuracy, 74% precision and 77% F1, 82% recall, and AUC of 76%.
4. Conclusion
This study presents the first fully automated machine learning method to accurately diagnose Class-III malocclusion applied across mobile images, to the best of our knowledge. For this purpose, we comparatively evaluated three machine learning approaches. We achieved 70% accuracy with the deep learning method, 74% accuracy with the area ratio method and 76% success with the machine learning method.
All three of our methods are flexible to be adapted to any dynamic training set of profile images of any ethnicity and age. Our next goal is to integrate our most accurate learning model among these three into a mobile application for parents and pediatricians seeking a second opinion on whether to reach out to an orthodontist at an early stage of developmental bone growth with a warning of Class-III malocclusion risk.
Data Availability
All data produced in the present study are available upon reasonable request to the authors.
5. Acknowledgment
We want to thank Gül Sude Demircan, who developed the previous prototype, and Tülay Sevinç, who assisted in collecting the patient images and the consent forms.
Appendix
Footnotes
bkilic{at}bezmialem.edu.tr
selahattinaksoy{at}mu.edu.tr
Cite Aksoy S,Kılıç B,Önal-Süzek T(2022). “Comparative Analysis of Deep Learning and Machine Learning Models for Early Prediction of Skeleton Class-III Malocclusion from Profile Photos”.