RT Journal Article SR Electronic T1 Who is most at risk of dying if infected with SARS-CoV-2? A mortality risk factor analysis using machine learning of COVID-19 patients over time in a large Mexican population JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2023.01.17.23284684 DO 10.1101/2023.01.17.23284684 A1 Liao, Lauren D. A1 Hubbard, Alan E. A1 Gutiérrez, Juan Pablo A1 Juárez-Flores, Arturo A1 Kikkawa, Kendall A1 Gupta, Ronit A1 Yarmolich, Yana A1 de Jesús Ascencio-Montiel, Iván A1 Bertozzi, Stefano M. YR 2023 UL http://medrxiv.org/content/early/2023/01/18/2023.01.17.23284684.abstract AB Background COVID-19 would kill fewer people if health programs can predict who is at higher risk of mortality because resources can be targeted to protect those people from infection. We predict mortality in a very large population in Mexico with machine learning using demographic variables and pre-existing conditions.Methods We conducted a population-based cohort study with over 1.4 million laboratory-confirmed COVID-19 patients using the Mexican social security database. Analysis is performed on data from March 2020 to November 2021 and over three phases: (1) from March to October in 2020, (2) from November 2020 to March 2021, and (3) from April to November 2021. We predict mortality using an ensemble machine learning method, super learner, and independently estimate the adjusted mortality relative risk of each pre-existing condition using targeted maximum likelihood estimation.Results Super learner fit has a high predictive performance (C-statistic: 0.907), where age is the most predictive factor for mortality. After adjusting for demographic factors, renal disease, hypertension, diabetes, and obesity are the most impactful pre-existing conditions. Phase analysis shows that the adjusted mortality risk decreased over time while relative risk increased for each pre-existing condition.Conclusions While age is the most important predictor of mortality, younger individuals with hypertension, diabetes and obesity are at comparable mortality risk as individuals who are 20 years older without any of the three conditions. Our model can be continuously updated to identify individuals who should most be protected against infection as the pandemic evolves.What is already known on this topic Studies for Mexico and other countries have suggested that pre-existing conditions such as renal disease, diabetes, hypertension, and obesity are strongly associated with COVID-19 mortality. While age and the presence of pre-existing conditions have been shown to predict mortality, other studies have typically used less powerful statistical approaches, have had smaller sample sizes, and have not been able to describe changes over time.What this study adds This study examines mortality risk in a very large population (> 60 M); it uses powerful ensemble machine learning methods that outperform regression analyses; and it demonstrates marked changes over time in the degree to which different risk factors predict mortality.How this study might affect research, practice or policy Because we show an important improvement in predictive performance over traditional regression analyses, and the ability to update estimates as the pandemic evolves, we argue that these methods should be much more widely used to inform national programming in Mexico and elsewhere. Programs that assume that predictive models don’t change over time as variants emerge and as pre-existing immunity evolves due to vaccination and prior infection will not accurately predict mortality risk.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis research effort was funded by the C3.ai Digital Transformation Institute. The C3.ai DTI was established by C3.ai, Microsoft, the University of California, Berkeley (UC Berkeley), the University of Illinois at Urbana-Champaign (UIUC), Carnegie Mellon University, University of Chicago, MIT, and Princeton University. It is being funded in cash and in kind by C3.ai, Microsoft Azure, and the Lawrence Berkeley National Laboratory. The funders had no role in access to data, design of the research, or analyses conducted. They have not seen or contributed to the manuscript in any way. In addition, LDL received funding from the National Science Foundation (DGE 2146752). AEH received funding from a global development grant (OPP1165144) from the Bill & Melinda Gates Foundation to the University of California, Berkeley, CA, USA.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This data-only study was approved on November 4th, 2020, by the Scientific Research National Committee (Social Security Mexican Institute) with R-2020-785-165. The University of California, Berkeley Institutional Review Board (IRB) determined that the project was exempt from IRB approval.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe study was conducted using confidential patient records subject to strict access controls and we are therefore unable to share the data that were used for this study.AUCarea under the receiver operating characteristic curveCIconfidence intervalCOPDchronic obstructive pulmonary diseaseCOVID-19coronavirus disease of 2019IMSSMexican Social Security InstitutePCRpolymerase chain reactionRRrelative riskSLsuper learnerTMLEtargeted maximum likelihood estimationXGBoostextreme gradient boosting