Abstract
The body mass index (BMI) provides essential medical information related to body weight for the treatment and prognosis prediction of different diseases. The main goal of the present study was to evaluate the performance of artificial neural network (ANN) and multiple linear regression (MLR) model in the prediction of BMI in children. The data from a total of 5,964 children aged 5 to 12 years were included in study. Age, gender, neck circumference (NC), waist circumference (WC), hip circumference (HpC), and mid upper arm circumference (MUAC) measurements were used to estimate the BMI of children. The ANN and MLR were utilized to predict the BMI. The predictive performance of these methods was also evaluated. Gender-wise average comparison showed that median values of all the anthropometric measurements (except BMI) were significantly higher in boys as compared to girls. For the overall sample, the BMI prediction model was, ― 0.147 X Age ― 0.367 X Gender + 0.176 X NC + 0.041 X WC + 0.060 X HpC + 0.404 X MUAC. A high R2 value and lower RMSE, MAPE, and MAD indicated that the ANN is the best method for predicting BMI in children. Our results confirm that the BMI of children can be predicted by using ANN and MLR regression methods. However, the ANN method has a higher predictive performance than MLR.
Introduction
Evaluation of nutritional status is necessary for understanding the health of an individual or population. Children are considered a crucial population in regard to detecting their health status. This is especially important during infancy and adolescence, when the amount of water and adipose tissue undergo considerable changes[1-3]. Body composition changes are directly reflected in anthropometric measures, and the body mass index (BMI) is the most renowned and well-known criterion for determining nutritional status in both children and adults [4]. BMI is calculated by dividing weight in kilograms by height in square meters (i.e., kg/m2), and its cut-offs classify people as underweight, normal weight, overweight or obese [4]. Elevated BMI is strongly linked to the risk of developing cardiovascular health diseases (CHD), including diabetes, hypertension, dyslipidemia, and certain forms of cancer [5].
However, the measurements of height or weight could be difficult or even impossible due to the non-availability of accurate portable height scales, weighing machines, and well-trained health workers for collection of measurements. Accordingly, in similar scenarios, the BMI could not be measured. Therefore, the estimation of the BMI preferably to be a simple, affordable method, to be calculated with minimal equipment.
To date, various indirect methods have been developed to estimate BMI by measuring different body segments. Several studies [6-9] found that BMI correlates with neck circumference (NC), waist circumference (WC), hip circumference (HpC), and mid-upper arm circumference (MUAC), and Marshall et al. [10] found that measurement of WC, NC and MUAC can be used to estimate BMI accurately. Another study with Pakistani type 2 diabetes patients presented the prediction model for BMI based on WC and HpC measurements using multiple linear regression (MLR) analysis for the BMI prediction [11].
Over the last few years, several machine learning (ML) methods have also been applied to predict BMI or obesity efficiently. For example, a more recent study by Lee et al. [12] applied linear regression and different data mining algorithms (i.e., random forest and artificial neural network (ANN)) for the prediction of a newborn’s BMI based on ultrasound measures and maternal delivery information. Another study used k-nearest neighbor (KNN), classification and regression tree (CART), support vector machine (SVM), and multi-layer perceptron (MLP) algorithms, etc., to predict the BMI based on psychological variables [13]. The BMI of 1568 subjects aged 20-60 years was also predicted using three different data mining algorithms from voice signals information [14]. Sancar and Tabrizi [15] used an adaptive neuro fuzzy inference system (ANFIS) to calculate BMI based on metabolic risk factors. The existing literature has shown that no study considered age, gender, and different anthropometric measures as input variables in order to predict BMI in children, and performance of two different prediction methods i.e., ANN and MLR were rarely evaluated. This considerable research gap led to the conduction of this research study. The main goal of the study was to evaluate the performance of artificial neural network (ANN) and multiple linear regression (MLR) model in the prediction of BMI in children. The performance of ANN with MLR were evaluated by estimating the values of co-efficient of determination (R2), root mean square error (RMSE), mean absolute deviation (MAD) and mean absolute percentage error (MAPE).
Materials and methods
A cross-sectional dataset collected from March to June 2016 multi-ethnic anthropometric survey (MEAS) was used in the present study. The data is also publicly available on the Mendeley website at https://data.mendeley.com/datasets/sxgymx5xjm/1. A detailed description of the study design, sampling methodology, and inclusion/exclusion criteria of the subjects in this survey has been described elsewhere [6, 16-18]. Briefly, in the MEAS, a total of 10,782 children and adolescents (aged 2-19 years) were recruited and the dataset of school-going children and adolescents (n= 9,929) aged 5 to 19 years was collected from 68 public and private schools. While data for subjects under the age of five were collected in public places such as markets, shopping malls and parks etc. This study only included 5,964 children aged 5 to 12 years. The raw dataset of different anthropometric measurements, i.e., body weight, height, NC, WC, HpC and MUAC, were taken in a comfortable standing position under standard procedure. The complete measurement protocols have been discussed in the previously published studies [6, 16-18]. From the height and weight measurements, BMI was calculated [BMI= weight (kg.) ÷ height (meters)2]. The dependent variable included in the study was the BMI of children while, age, gender of children and measurement of NC, WC, HpC and MUAC, were taken as independent variables.
The authors assert that the complete study was conducted in accordance with the ethical standards. The project was approved by the Institutional Ethics Research Board of Bahauddin Zakariya University, Multan under the registration number IRB# Stat-271/2017). Verbal informed consent was obtained from all participants and their parents. Verbal consent was witnessed and formally recorded. Researchers recorded on a form created specifically for documentation of verbal consent the name of the participant/parent who gave the verbal consent. The authors had access to information that could identify individual participants during or after data collection
Statistical analysis
The entire statistical analysis was performed using SPSS ver. 23 (SPSS for Windows, Chicago, IL, USA). The normality of continuous variables (age, BMI, NC, WC, HpC and MUAC) was tested by using the Kolmogorov-Smirnoff test. The significance level of p < 0.05 was considered and the results were expressed as median [interquartile range (IQR) = Q1-Q3]. The Mann-Whitney U test was used for average comparisons between the groups. The Spearman’s rank correlation (rs) was used to investigate the correlation between BMI and other anthropometric measurements. Since a correlation of less than 0.3 (i.e., r < 0.30) is generally described as a weak correlation[19], therefore, the independent variables having a correlation of less than 0.3 with BMI, were excluded from the analysis. The significance level was set at α = 5% for the whole analysis.
For BMI prediction in children, two different methods, the ANN and MLR were applied. An ANN is a computing system consisting of simple inter-connected processing elements called neurons. The input signals (input data) pass through the network of neurons to generate the network response(s). Each neuron (except the input ones) receives information from several neurons through a connection in proportion to their weights, sums them up and modifies the sum through a non-linear transfer function before passing the signal to other neurons [20]. We also present the block diagram of the current study (Fig 1).
When modeling the neural network, a multi-layer perceptron with an input layer, hidden layer and an output layer was used. Initially, data on 5,964 children were divided into two different parts, i.e., training data and testing data. The training dataset consists of 70% of all the data (i.e., 4181) and the rest of the data (n=1783) is used as a testing phase. The input data were normalized before training the model. The network was trained in 5000 epochs for different numbers of neurons in the hidden layer. In each epoch, a training data set was selected randomly to prevent learning the especial order of data. The commonly used back-propagation training algorithm “scaled conjugate gradient (SCG)” was used for the training of the models. An activation transfer function called “hyperbolic-tangent” was used in all cases.
It is important to adjust the learning rate and momentum terms during the learning process of the neural networks. High weights may destroy the learning behaviour of neural networks. The learning rate is set at a small value to prevent the selection of high weights. Small learning rates slowdowns the learning process. Following Heydari et al. [21], the learning rate and momentum were set at 0.1 and 0.7. The input layer consists of 6 neurons corresponding to independent measures (age, gender, NC, WC, HpC and MUAC). Such measures were used to predict BMI in children. In the hidden layer, different numbers of neurons were used for the optimal selection of network architecture and to prevent overtraining.
The MLR analysis was also performed to predict BMI. A linear regression model that involves more than one predictor (regressor) variable is called an MLR model. In an MLR, the relationship between dependent (regressand) and more than one independent variable (s) is expressed by a linear regression equation. An MLR equation with k regressors, is given as under: where
Yi = ith observation of the dependent variable,
Xij = ith observation of the jth regressor (j = 1, 2, …, k),
β0 = Y-intercept (the constant term),
βj = regression coefficient corresponding to the jth regressor,
εi = the error term, assumed to be normal with zero mean and constant variance.
Referring to the MLR equation (1), in our study, the MLR equation would be The gender of a child is coded as 1 for a boy and 0 for a girl.
The multicollinearity among the regressors was also determined using the variance inflation factor (VIF). The VIF between variables was < 5, suggesting that multicollinearity was not a problem in the models. For the evaluation of models’ prediction performance, different criteria, i.e., RMSE, MAPE, MAD and R2 were used in the literature [12, 13, 15]. For this study, the model prediction performance was also based on all the latter stated criteria. The model with the highest R2, lowest RMSE, MAPE and MAD was chosen as the final predictive model.
Results
The study included 5,964 children (boys = 2865; 48.0% and girls = 3099; 52.0%) with a median age of 9.0 (IQR: 7.0-11.0). The descriptive statistics of anthropometric characteristics and Spearman’s correlation coefficients between BMI and NC, WC, HpC, and MUAC were listed (Table 1).
A sex-based average comparison revealed that the median values of NC, WC, HpC and MUAC were significantly higher in boys than in girls. While the median BMI was not significantly different among the children of both sexes. In the study sample, significant positive correlations were observed between BMI and MUAC (r =0.63), followed by HpC (r =0.56), NC (r =0.56) and WC (r =0.51).
In order to predict the BMI values, the proposed MLR model (3) based on explanatory variables i.e., NC, WC, HpC, MUAC, age and gender (boys =1, girls=0) was used (Table 2).
For instance, the MLR model predicts a BMI value of 13.08 of a boy having age (5), NC (21.59), WC (50.80), HpC (48.26) and MUAC (13.97). Similarly, a BMI value of 14.97 was predicted for a girl having age (12), NC (26.67), WC (50.80), HpC (57.15) and MUAC (16.76). Moreover, an ANN model predicts the BMI values for the same boys and girls to be 13.13 and 15.52, respectively (see Table 3).
The value of co-efficient of determination (R2) using MLR analysis revealed that about 48.0% variation in BMI is explained due to the predictor variables including the age and sex. While R2 using the ANN algorithm exhibited that more (53.4%) variation in BMI is explained due to the predictor variables. The RMSE, MAPE, and MAD values were also found to be lower in the ANN as compared to the MLR model. Bases on these findings, we can conclude that ANN outperforms MLR in predicting BMI in children (Table 4).
Discussion
Obesity in children has now evolved into a severe public health issue, and its incidence has grown rapidly in recent years across the world [4]. Researchers used BMI as internationally accepted measure for defining overweight and obesity in both children and adults [22, 23]. Different studies in recent years have also utilized some other anthropometric measurements, i.e., WC, MUAC and NC for obesity screening purposes in children [7-9]. Because these measurements had a good correlation with BMI (r = ∼0.60 to ∼0.85). Based on receiver operating characteristics (ROC) curves analysis, the diagnostic ability of WC, MUAC and NC to detect children with overweight and obesity was very high i.e., areas under the curve (AUC) values between 75.0 % to 97.0 % [7-9]. Therefore, these measurements can be used to predict BMI in children. To the author’s knowledge, this is the first study that explores whether BMI in children can be predicted from anthropometric measurements by using MLR and ANN algorithms.
In this study, the neural network was designed using the information of 5,964 children living in different cities of Pakistan, including six input variables (age, gender, NC, WC, HpC and MUAC) and BMI as the output variable. A high R2 and lower RMSE, MAPE values indicated that ANN is the best method for predicting BMI in children than MLR. These findings are consistent with the earlier reports on the topic [24, 25]. In a study of 321 adult individuals in Iran, an RMSE (0.94) and R2 (0.890) using ANN were much better than MLR (RMSE= 1.31 and R2= 0.882) [24]. Another Iranian study estimated the BMI for 470 adult individuals and reported low RMSE values as compared to our results [25]. A high disparity in R2 and RMSE results may be due to the fact that they predicted the BMI for adult individuals based on different metabolic syndrome components (i.e., WC, SBP, DBP, FG, HDL and TG) and on different environmental and physical activity-related factors. However, our study predicted the BMI for children aged 5-12 years based on age, gender and anthropometric-related information as input variables. Some studies also employed the ANN method for obesity prediction e.g., an Iranian study with 414 adults found that ANN with an accuracy of 81.2% is a more efficient method for obesity prediction than logistic regression (accuracy = 80.2%) [21]. Another study compared the performance of ANN and logistic regression methods for obesity prediction among 82 individuals, and found that ANN performed better than logistic regression [26]. Using MLR analysis, we also found that NC, WC, HpC and MUAC are significant predictors for predicting BMI. These results are consistent with an earlier study by Marshall et al. [10], which reported that measurements of WC, NC and AC can be used to accurately estimate BMI and another study with 24,485 Pakistani type 2 diabetes patients aged 20 years and above also offered a BMI prediction model based on WC and HpC measurements [11].
The major strength of the study is that we predicted the BMI in children based on NC, WC, HpC and MUAC whose measurements are simple, quick, and just require a non-stretchable plastic tape. Our recommendation is to include the metabolic risk-related variables like SBP, DBP, FG, LDL, HDL and TG that affect BMI. Some studies predicted the BMI by using voice signals information [14], psychological [13], environmental and physical activity-related variables [25], and it would be an important contribution to extend this study based on these new variables. Lastly, the focus of this study was to predict BMI for children and however BMI estimation for adult individuals need to be studied and it will be a good topic for future research. However, the study has limitations. Since our study has a cross-sectional character, we cannot conclude a cause-and-effect relationship. The limitation of this study was a single anthropometric measurements, thus intra-observer variability could not be calculated. However, these measurement was always performed by the same researcher and there was no inter-observer variability between measurements.
Conclusion
The findings of this study imply that both methods, MLR and ANN can be used to predict BMI in children. The use of ANN, a high R2 and lower RMSE, MAPE and MAD values demonstrate that this method is biologically acceptable and more effective for predicting BMI based on age, gender and four different anthropometric variables. Our methods and results can be used for obesity prediction in Pakistani children as an alternative to the clinical findings and public health research. Further research to overcome the present study’s limitations is also required.
Data Availability
The dataset can be accessed through the following link: https://data.mendeley.com/datasets/sxgymx5xjm/1
Data Availability
The dataset can be accessed through the following link: https://data.mendeley.com/datasets/sxgymx5xjm/1
Acknowledgments
The author(s) are thankful to Mr. Muhammad Qasim, who reviewed the statistical interpretation and made corrections if required.
The author (s) received no specific funding for this work.