TY - JOUR T1 - Development of a Multivariable Model for COVID-19 Risk Stratification Based on Gradient Boosting Decision Trees JF - medRxiv DO - 10.1101/2020.12.23.20248783 SP - 2020.12.23.20248783 AU - Jahir M. Gutierrez AU - Maksims Volkovs AU - Tomi Poutanen AU - Tristan Watson AU - Laura Rosella Y1 - 2020/01/01 UR - http://medrxiv.org/content/early/2020/12/30/2020.12.23.20248783.abstract N2 - Importance Population stratification of the adult population in Ontario, Canada by their risk of COVID-19 complications can support rapid pandemic response, resource allocation, and decision making.Objective To develop and validate a multivariable model to predict risk of hospitalization due to COVID-19 severity from routinely collected health records of the entire adult population of Ontario, Canada.Design, Setting, and Participants This cohort study included 36,323 adult patients (age ≥ 18 years) from the province of Ontario, Canada, who tested positive for SARS-CoV-2 nucleic acid by polymerase chain reaction between February 2 and October 5, 2020, and followed up through November 5, 2020. Patients living in long-term care facilities were excluded from the analysis.Main Outcomes and Measures Risk of hospitalization within 30 days of COVID-19 diagnosis was estimated via Gradient Boosting Decision Trees, and risk factor importance was examined via Shapley values.Results The study cohort included 36,323 patients with majority female sex (18,895 [52.02%]) and median (IQR) age of 45 (31-58) years. The cohort had a hospitalization rate of 7.11% (2,583 hospitalizations) with median (IQR) time to hospitalization of 1 (0-5) days, and a mortality rate of 2.49% (906 deaths) with median (IQR) time to death of 12 (6-27) days. In contrast to patients who were not hospitalized, those who were hospitalized had a higher median age (64 years vs 43 years, p-value < 0.001), majority male (56.25% vs 47.35%, p-value<0.001), and had a higher median [IQR] number of comorbidities (3 [2-6] vs 1 [0-3], p-value<0.001). Patients were randomly split into development (n=29,058, 80%) and held-out validation (n=7,265, 20%) cohorts. The final Gradient Boosting model was built using the XGBoost algorithm and achieved high discrimination (development cohort: mean area under the receiver operating characteristic curve across the five folds of 0.852; held-out validation cohort: 0.8475) as well as excellent calibration (R2=0.998, slope=1.01, intercept=-0.01). The patients who scored at the top 10% in the validation cohort captured 47.41% of the actual hospitalizations, whereas those scored at the top 30% captured 80.56%. Patients in the held-out validation cohort (n=7,265) with a score of at least 0.5 (n=2,149, 29.58%) had a 20.29% hospitalization rate (positive predictive value 20.29%) compared with 2.2% hospitalization rate for those with a score less than 0.5 (n=5,116, 70.42%; negative predictive value 97.8%). Aside from age, gender and number of comorbidities, the features that most contribute to model predictions were: history of abnormal blood levels of creatinine, neutrophils and leukocytes, geography and chronic kidney disease.Conclusions A risk stratification model has been developed and validated using unique, de-identified, and linked routinely collected health administrative data available in Ontario, Canada. The final XGBoost model showed a high discrimination rate, with the potential utility to stratify patients at risk of serious COVID-19 outcomes. This model demonstrates that routinely collected health system data can be successfully leveraged as a proxy for the potential risk of severe COVID-19 complications. Specifically, past laboratory results and demographic factors provide a strong signal for identifying patients who are susceptible to complications. The model can support population risk stratification that informs patients’ protection most at risk for severe COVID-19 complications.Competing Interest StatementThe authors have declared no competing interest.Funding StatementLR is supported by a Tier 2 Canada Research Chair in Population Health Analytics. This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). The study sponsors did not participate in the design and conduct of the study; collection; management, analysis and interpretation of the data; preparation, review or approval of the manuscript; or the decision to submit the manuscript for publication.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:ICES has obtained ethical approval (and repeats this review tri-annually) for its privacy and security policies, procedures, and practices. Each research project that is conducted at ICES is also subject to internal ethical review by the ICES Privacy and Compliance Office. Please find attached to this submission a letter with more information regarding the ethical review and approval process for this research; please do not hesitate to contact me with any questions. ICES is a prescribed entity under section 45 of Ontario's Personal Health Information Protection Act (PHIPA). Section 45 is the provision that enables analysis and compilation of statistical information related to the management, evaluation, and monitoring of, allocation of resources to, and planning for the health system. Section 45 authorizes health information custodians to disclose personal health information to a prescribed entity, like ICES, without consent for such purposes. Projects conducted wholly under section 45, by definition, do not require review by a Research Ethics Board. As a prescribed entity, ICES must submit to trio-annual review and approval of its privacy and security policies, procedures and practices by Ontario's Information and Privacy Commissioner. These include policies, practices and procedures that require internal review and approval of every project by ICES' Privacy and Compliance Office.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.Yes(Not applicable) ER -