TY - JOUR T1 - Uncovering clinical risk factors and prediction of severe COVID-19: A machine learning approach based on UK Biobank data JF - medRxiv DO - 10.1101/2020.09.18.20197319 SP - 2020.09.18.20197319 AU - Kenneth C.Y. Wong AU - Yong Xiang AU - Hon-Cheong So Y1 - 2021/01/01 UR - http://medrxiv.org/content/early/2021/03/22/2020.09.18.20197319.abstract N2 - Background COVID-19 is a major public health concern. Given the extent of the pandemic, it is urgent to identify risk factors associated with disease severity. Accurate prediction of those at risk of developing severe infections is also of high clinical importance.Methods Based on the UK Biobank(UKBB data), we built machine learning(ML) models to predict the risk of developing severe or fatal infections, and to evaluate major risk factors involved. We first restricted the analysis to infected subjects(N=7846), then performed analysis at a population level, considering those with no known infection as controls(N for controls=465,728). Hospitalization was used as a proxy for severity. Totally 97 clinical variables(collected prior to COVID-19 outbreak) covering demographic variables, comorbidities, blood measurements(e.g. hematological/liver/renal function/metabolic parameters etc.), anthropometric measures and other risk factors (e.g. smoking/drinking habits) were included as predictors. We also constructed a simplified (‘lite’) prediction model using 27 covariates that can be more easily obtained (demographic and comorbidity data). XGboost (gradient boosted trees) was used for prediction and predictive performance was assessed by cross-validation. Variable importance was quantified by Shapley values and accuracy gain. Shapley dependency and interaction plots were used to evaluate the pattern of relationship between risk factors and outcomes.Results A total of 2386 severe and 477 fatal cases were identified. For the analysis among infected individuals (N=7846),our prediction model achieved AUCs of 0.723(95% CI:0.711-0.736) and 0.814(CI: 0.791-0.838) for severe and fatal infections respectively. The top five contributing factors for severity were age, number of drugs taken(cnt_tx), cystatin C(reflecting renal function), wait-hip ratio (WHR) and Townsend Deprivation index (TDI). For prediction of mortality, the top features were age, testosterone, cnt_tx, waist circumference(WC) and red cell distribution width (RDW).In analyses involving the whole UKBB population, the corresponding AUCs for severity and fatality were 0.696(CI:0.684-0.708) and 0.802(CI:0.778-0.826) respectively. The same top five risk factors were identified for both outcomes, namely age, cnt_tx, WC, WHR and TDI. Apart from the above features, Type 2 diabetes(T2DM), HbA1c and apolipoprotein A were ranked among the top 10 in at least two (out of four) analyses. Age, cystatin C, TDI and cnt_tx were among the top 10 across all four analyses.As for the ‘lite’ models, the predictive performances in terms of AUC are broadly similar, with estimated AUCs of 0.716, 0.818, 0.696 and 0.811 respectively. The top-ranked variables were similar to above, including for example age, cnt_tx, WC, male and T2DM.Conclusions We identified a number of baseline clinical risk factors for severe/fatal infection by an ML approach. For example, age, central obesity, impaired renal function, multi-comorbidities and cardiometabolic abnormalities may predispose to poorer outcomes. The presented prediction models may be useful at a population level to help identify those susceptible to developing severe/fatal infections, hence facilitating targeted prevention strategies. Further replications in independent cohorts are required to verify our findings.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported partially by the Lo Kwee Seong Biomedical Research Fund from The Chinese University of Hong Kong.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The UK Biobank study has received ethical approval from the NHS National Research Ethics Service North West (16/NW/0274).All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe UK Biobank data is available to registered researchers. ER -