TY - JOUR T1 - Machine learning model estimating number of COVID-19 infection cases over coming 24 days in every province of South Korea (XGBoost and MultiOutputRegressor) JF - medRxiv DO - 10.1101/2020.05.10.20097527 SP - 2020.05.10.20097527 AU - Yoshiro Suzuki AU - Ayaka Suzuki AU - Shun Nakamura AU - Toshiko Ishikawa AU - Akira Kinoshita Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/05/14/2020.05.10.20097527.abstract N2 - We built a machine learning model (ML model) which input the number of daily infection cases and the other information related to COVID-19 over the past 24 days in each of 17 provinces in South Korea, and output the total increase in the number of infection cases in each of 17 provinces over the coming 24 days. We employ a combination of XGBoost and MultiOutputRegressor as machine learning model (ML model). For each province, we conduct a binary classification whether our ML model can classify provinces where total infection cases over the coming 24 days is more than 100. The result is Sensitivity = 3/3 = 100%, Specificity = 11/14 = 78.6%, False Positive Rate = 3/11 = 21.4%, Accuracy = 14/17 = 82.4%. Sensitivity = 100% means that we did not overlook the three provinces where the number of COVID-19 infection cases increased by more than100. In addition, as for the provinces where the actual number of new COVID-19 infection cases is less than 100, the ratio (Specificity) that our ML model can correctly estimate was 78.6%, which is relatively high. From the above all, it is demonstrated that there is a sufficient possibility that our ML model can support the following four points. (1) Promotion of behavior modification of residents in dangerous areas, (2) Assistance for decision to resume economic activities in each province, (3) Assistance in determining infectious disease control measures in each province, (4) Search for factors that are highly correlated with the future increase in the number of COVID-19 infection cases.Competing Interest StatementThe authors have declared no competing interest.Funding StatementOn behalf of all authors, the corresponding author states that there is no conflict of interest.Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.Yes- Dataset name: Data Science for COVID-19 in South Korea (DS4C) - Dataset downloaded from: https: //www.kaggle.com/kimjihoo/coronavirusdataset - Dataset reported by: Korea Centers for Disease Control and Prevention (KCDC) and 17 provinces in South Korea - License of the dataset: CC BY-NC-SA 4.0 ER -