PT - JOURNAL ARTICLE AU - Jie Li AU - Xin Li AU - John Hutchinson AU - Mohammad Asad AU - Yadong Wang AU - Edwin Wang TI - An ensemble prediction model for COVID-19 mortality risk AID - 10.1101/2022.01.10.22268985 DP - 2022 Jan 01 TA - medRxiv PG - 2022.01.10.22268985 4099 - http://medrxiv.org/content/early/2022/01/13/2022.01.10.22268985.short 4100 - http://medrxiv.org/content/early/2022/01/13/2022.01.10.22268985.full AB - Background It’s critical to identify COVID-19 patients with a higher death risk at early stage to give them better hospitalization or intensive care. However, thus far, none of the machine learning models has been shown to be successful in an independent cohort. We aim to develop a machine learning model which could accurately predict death risk of COVID-19 patients at an early stage in other independent cohorts.Methods We used a cohort containing 4711 patients whose clinical features associated with patient physiological conditions or lab test data associated with inflammation, hepatorenal function, cardiovascular function and so on to identify key features. To do so, we first developed a novel data preprocessing approach to clean up clinical features and then developed an ensemble machine learning method to identify key features.Results Finally, we identified 14 key clinical features whose combination reached a good predictive performance of AUC 0.907. Most importantly, we successfully validated these key features in a large independent cohort containing 15,790 patients.Conclusions Our study shows that 14 key features are robust and useful in predicting the risk of death in patients confirmed SARS-CoV-2 infection at an early stage, and potentially useful in clinical settings to help in making clinical decisions.Competing Interest StatementThe authors have declared no competing interest.Funding StatementAlberta Innovates for HealthAuthor DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:1, The first data is from https://figshare.com/s/79827c396af7df42b3d7. the detail of the first data can be found in the paper:"Altschul DJ, Unda SR, Benton J, de la Garza Ramos R, Cezayirli P, Mehler M, et al. A novel severity score to predict inpatient mortality in COVID-19 patients. Scientific Reports. 2020;10(1):16726. 2, the second data is from UKbiobank, we list two references in our manuscript:" Sudlow C, Gallacher J, Allen N, Beral V, Burton P, Danesh J, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 2015;12(3):e1001779. Barbour V. UK Biobank: a project in search of a protocol? The Lancet. 2003;361(9370):1734-8.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present work are contained in the manuscript https://figshare.com/s/79827c396af7df42b3d7.https://www.ukbiobank.ac.uk/OsSatsoxygen saturationTemptemperatureMAPmean arterial pressureDdimerD-dimerPltsplateletsINRinternational normalized ratioBUNblood urea nitrogenASTaspartate aminotransferaseALTalanine aminotransferaseWBCwhite blood cellsLympholymphocytesIL-6interleukin-6CrctProteinC-reactive proteinKNNk-nearest neighbor methodGBDTGradient Boosted Decision TreeXGBoostExtreme Gradient BoostingRFRandom ForestLRLogistic RegressionSVMSupport Vector MachineEMEnsemble ModelROCReceiver Operating CharacteristicAUCArea Under ROC CurveTPTrue PositiveFPFalse Positive, TN: True NegativeFNFalse NegativeCSSCOVID-19 severity scores