TY - JOUR T1 - Machine Learning to Predict Neonatal Mortality Using Public Health Data from São Paulo - Brazil JF - medRxiv DO - 10.1101/2020.06.19.20112953 SP - 2020.06.19.20112953 AU - Carlos Eduardo Beluzo AU - Luciana Correia Alves AU - Everton Silva AU - Rodrigo Bresan AU - Natália Arruda AU - Tiago Carvalho Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/06/22/2020.06.19.20112953.abstract N2 - Infant mortality is one of the most important socioeconomic and health quality indicators in the world. In Brazil, neonatal mortality accounts to 70% of the infant mortality. Despite its importance, neonatal mortality shows increasing signals, which causes concerns about the necessity of efficient and effective methods able to help reducing it. In this paper a new approach is proposed to classify newborns that may be susceptible to neonatal mortality by applying supervised machine learning methods on public health features. The approach is evaluated in a sample of 15,858 records extracted from SPNeoDeath dataset, which were created on this paper, from SINASC and SIM databases from São Paulo city (Brazil) for this paper intent. As a results an average AUC of 0.96 was achieved in classifying samples as susceptible to death or not with SVM, XGBoost, Logistic Regression and Random Forests machine learning algorithms. Furthermore the SHAP method was used to understand the features that mostly influenced the algorithms output.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis research was supported by Bill & Melinda Gates Foundation (Process no: OPP1201970) and Ministry of Health of Brazil, through the National Council for Scientific and Technological Development (CNPq) (Process no: 443774/2018-8). It was also supported by NVIDIA, that donated a GPU XP Titan used by the research team.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This paper uses publicly available data (SIM and SINASC) that has been de-identified and was deemed exempt from approval from a human research ethics committee.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesDataset will be made public after paper been published in a journal. ER -