Abstract
Background Noncommunicable diseases (NCDs) continue to pose a significant health challenge globally, with hyperglycemia serving as a prominent indicator of potential diabetes. This study employed machine learning algorithms to predict hyperglycemia in a cohort of asymptomatic individuals and unraveled crucial predictors contributing to early risk identification.
Methods This dataset included an extensive array of clinical and demographic data obtained from 195 asymptomatic adults residing in a suburban community in Nigeria. The study conducted a thorough comparison of multiple machine learning algorithms to ascertain the most effective model for predicting hyperglycemia. Moreover, we explored feature importance to pinpoint correlates of high blood glucose levels within the cohort.
Results Elevated blood pressure and prehypertension were recorded in 8 (4%) and 18 (9%) individuals respectively. Forty-one (21%) individuals presented with hypertension (HTN), of which 34/41 (82.9%) were females. However, cohort-based gender adjustment showed that 34/118 (28.81%) females and 7/77 (9.02%) males were hypertensive. Age-based analysis revealed an inverse relationship between normotension and age (r = -0.88; P < 0.05). Conversely HTN increased with age (r = 0.53; P < 0.05), peaking between 50-59 years. Isolated systolic hypertension (ISH) and isolated diastolic hypertension (IDH) were recorded in 16/195 (8.21%) and 15/195 (7.69%) individuals respectively, with females recording higher prevalence of ISH 11/16 (68.75%) while males reported a higher prevalence of IDH 11/15 (73.33%). Following class rebalancing, random forest classifier gave the best performance (Accuracy Score = 0.894; receiver operating characteristic-area under the curve (ROC-AUC) score = 0.893; F1 Score = 0.894) of the 27 model classifiers. The feature selection model identified uric acid and age as pivotal variables associated with hyperglycemia.
Conclusions Random Forest classifier identified significant clinical correlates associated with hyperglycemia, offering valuable insights for early detection of diabetes and informing the design and deployment of therapeutic interventions. However, to achieve a more comprehensive understanding of each feature’s contribution to blood glucose levels, modeling additional relevant clinical features in larger datasets could be beneficial.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Kolapo Oyebola was supported by APTI-18-07 Grant by the African Academy of Sciences in partnership with Bill and Melinda Gates Foundation; and a Fogarty Emerging Global Leader Grant (NIH-K43TW011926) from the US National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethical approval was obtained from the Institutional Review Board of the Nigerian Institute of Medical Research (IRB/21/074)
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
All relevant data are within the manuscript and its Supporting File.The Google Colab Python Script used for data analysis and machine learning has been deposited in our GitHub page https://github.com/oyebolakolapo/Machine-Learning-Prediction-of-Elevated-Blood-Glucose-in-a-Cohort-of-Apparently-Healthy-Adults