PT - JOURNAL ARTICLE AU - Chuanyu Hu AU - Zhenqiu Liu AU - Yanfeng Jiang AU - Xin Zhang AU - Oumin Shi AU - Kelin Xu AU - Chen Suo AU - Qin Wang AU - Yujing Song AU - Kangkang Yu AU - Xianhua Mao AU - Xuefu Wu AU - Mingshan Wu AU - Tingting Shi AU - Wei Jiang AU - Lina Mu AU - Damien C Tully AU - Lei Xu AU - Li Jin AU - Shusheng Li AU - Xuejin Tao AU - Tiejun Zhang AU - Xingdong Chen TI - Early prediction of mortality risk among severe COVID-19 patients using machine learning AID - 10.1101/2020.04.13.20064329 DP - 2020 Jan 01 TA - medRxiv PG - 2020.04.13.20064329 4099 - http://medrxiv.org/content/early/2020/04/19/2020.04.13.20064329.short 4100 - http://medrxiv.org/content/early/2020/04/19/2020.04.13.20064329.full AB - Background Coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection has been spreading globally. The number of deaths has increased with the increase in the number of infected patients. We aimed to develop a clinical model to predict the outcome of severe COVID-19 patients early.Methods Epidemiological, clinical, and first laboratory findings after admission of 183 severe COVID-19 patients (115 survivors and 68 nonsurvivors) from the Sino-French New City Branch of Tongji Hospital were used to develop the predictive models. Five machine learning approaches (logistic regression, partial least squares regression, elastic net, random forest, and bagged flexible discriminant analysis) were used to select the features and predict the patients’ outcomes. The area under the receiver operating characteristic curve (AUROC) was applied to compare the models’ performance. Sixty-four severe COVID-19 patients from the Optical Valley Branch of Tongji Hospital were used to externally validate the final predictive model.Results The baseline characteristics and laboratory tests were significantly different between the survivors and nonsurvivors. Four variables (age, high-sensitivity C-reactive protein level, lymphocyte count, and d-dimer level) were selected by all five models. Given the similar performance among the models, the logistic regression model was selected as the final predictive model because of its simplicity and interpretability. The AUROCs of the derivation and external validation sets were 0.895 and 0.881, respectively. The sensitivity and specificity were 0.892 and 0.687 for the derivation set and 0.839 and 0.794 for the validation set, respectively, when using a probability of death of 50% as the cutoff. The individual risk score based on the four selected variables and the corresponding probability of death can serve as indexes to assess the mortality risk of COVID-19 patients. The predictive model is freely available at https://phenomics.fudan.edu.cn/risk_scores/.Conclusions Age, high-sensitivity C-reactive protein level, lymphocyte count, and d-dimer level of COVID-19 patients at admission are informative for the patients’ outcomes.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the National Natural Science Foundation of China (grant number: 91846302, 81772170); the National Key Research and Development Program of China (grant numbers: 2017YFC0907000, 2017YFC0907500, 2019YFC1315804); Key Basic Research Grants from the Science and Technology Commission of Shanghai Municipality (grant number: 16JC1400500); and Shanghai Municipal Science and Technology Major Project (grant number: 2017SHZDZX01). Natural Science Foundation of Hubei (grant no. 2019CFB657).Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data were collected from medical records in Tongji Hospital, Wuhan, China.