ABSTRACT
Background Pneumonia is the leading cause of death in children aged 1-59 months. Prediction models for child pneumonia mortality have been developed using regression methods but their performance is insufficient for clinical use.
Methods We used a variety of machine learning methods to develop a predictive model for mortality in children with clinical pneumonia enrolled in population-based surveillance in the Basse Health and Demographic Surveillance System in rural Gambia (n=11,012). Four machine learning algorithms (support vector machine, random forest, artifical neural network, and regularized logistic regression) were implemented, fitting all possible combinations of two or more of 16 selected features. Models were shortlisted based on their training set performance, the number of included features, and the reliability of feature measurement. The final model was selected considering its clinical interpretability.
Results When we applied the final model to the test set (55 deaths), the area under the Receiver Operating Characteristic Curve was 0.88 (95% confidence interval: 0.84, 0.91), sensitivity was 0.78 and specificity was 0.77.
Conclusions Our evaluation of multiple machine learning methods combined with minimal and pragmatic feature selection led to a predictive model with very good performance. We plan further validation of our model in different populations.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The Pneumococcal Surveillance Project (including this work) was funded by GAVI The Vaccine Alliance's Accelerated Development and Introduction Plan (PneumoADIP), Bill & Melinda Gates Foundation (OPP1020327), UK Medical Research Council.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The Gambia Government / MRCG Joint Ethics Committee approved the collection of data used in this retrospective analysis All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Conflicts of interest: All authors declare they have no conflicts of interest.
Grants and/or financial support: The Pneumococcal Surveillance Project (including this work) was funded by GAVI The Vaccine Alliance’s Accelerated Development and Introduction Plan (PneumoADIP), Bill & Melinda Gates Foundation (OPP1020327), UK Medical Research Council.
Access to data and computing code: The data is available on request from Dr. Grant Mackenzie (gmackenzie{at}mrc.gm). The code to replicate the results can be found on github: https://github.com/MRCG-djeffries/mortality-prediction.
Data Availability
The data is available on request from Dr. Grant Mackenzie. The code to replicate the results can be found on github: https://github.com/MRCG-djeffries/mortality-prediction. The model implemented via a Shiny web can be downloaded from the supplemental digital content (as well as github: https://github.com/MRCG-djeffries/mortality-prediction) both as an independent .rds file or as a Shiny App, which incorporates a user interface that allows for individual and batch mortality predictions, as well as validation of our model in new datasets.
Abbreviations
- AUC
- Area Under the Receiver operating characteristic Curve
- RF
- random forest
- SVM
- support vector machines
- ANN
- artificial neural networks