Identifying main and interaction effects of risk factors to predict intensive care admission in patients hospitalized with COVID-19: a retrospective cohort study in Hong Kong

Background: The coronavirus disease 2019 (COVID-19) has become a pandemic, placing significant burdens on the healthcare systems. In this study, we tested the hypothesis that a machine learning approach incorporating hidden nonlinear interactions can improve prediction for Intensive care unit (ICU) admission. Methods: Consecutive patients admitted to public hospitals between 1st January and 24th May 2020 in Hong Kong with COVID-19 diagnosed by RT-PCR were included. The primary endpoint was ICU admission. Results: This study included 1043 patients (median age 35 (IQR: 32-37; 54% male). Nineteen patients were admitted to ICU (median hospital length of stay (LOS): 30 days, median ICU LOS: 16 days). ICU patients were more likely to be prescribed angiotensin converting enzyme inhibitors/angiotensin receptor blockers, anti-retroviral drugs lopinavir/ritonavir and remdesivir, ribavirin, steroids, interferon-beta and hydroxychloroquine. Significant predictors of ICU admission were older age, male sex, prior coronary artery disease, respiratory diseases, diabetes, hypertension and chronic kidney disease, and activated partial thromboplastin time, red cell count, white cell count, albumin and serum sodium. A tree-based machine learning model identified most informative characteristics and hidden interactions that can predict ICU admission. These were: low red cells with 1) male, 2) older age, 3) low albumin, 4) low sodium or 5) prolonged APTT. A five-fold cross validation confirms superior performance of this model over baseline models including XGBoost, LightGBM, random forests, and multivariate logistic regression. Conclusions: A machine learning model including baseline risk factors and their hidden interactions can accurately predict ICU admission in COVID-19.


Conclusions: A machine learning model including baseline risk factors and their hidden interactions
can accurately predict ICU admission in COVID-19.

Introduction
Coronavirus disease 2019 , the third coronavirus epidemic in the recent two decades after severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS), has become a pandemic, placing significant burdens on healthcare systems worldwide 1 . The number of people confirmed with COVID-19 worldwide exceeded 7.4 million on June 11, 2020, including at least 416,000 deaths across 188 countries and territories 2 . The coronavirus pandemic remains unresolved, even though countries around the world have moved to lift quarantines, stay-at-home orders and other social restrictions. A particular challenge countries face in the COVID-19 pandemic is the surge in demand for intensive care unit (ICU) care 3, 4 . Recent studies have exposed an astonishing case fatality rate of 61.5% for critical cases, increasing sharply with older age and for patients with underlying comorbidities 5 . The unfulfilled ICU demand would immediately lead to elevated fatality rate. The critical question on the clinical characteristics and relevant biomarkers for efficient ICU management of COVID-19 patients remains unanswered 6 . Identification of prognostic biomarkers to distinguish patients that require immediate medical attention has become an urgent yet challenging necessity. Therefore, the aim of this study is to identify significant risk factors or characteristics as well as hidden interaction effects associated with ICU admission by using an interpretable machine learning approach.
(%). The Mann-Whitney U test was used to compare continuous variables. The χ 2 test with Yates' correction was used for 2×2 contingency data, and Pearson's χ 2 test was used for contingency data for variables with more than two categories. To identify the significant risk factors associated with ICU admission of COVID-19 patients, univariate logistic regression was used to estimate odds ratios (ORs) and 95% CIs, adjusting for age, sex, comorbidities. A two-sided α of less than 0.05 was considered statistically significant. Statistical analyses (including univariate logistic regression) were performed using RStudio software (Version: 1.1.456) and Python (Version: 3.6).

Development of a tree-based interpretable machine learning model
After the identification of significant predictors for ICU admission, we aim to further construct a practically useful ICU use decision-making model by considering both main and interaction effects among those important univariable variables. Here the interaction effects, mainly pairwise interactions, capture the hidden nonlinear dependence between risk characteristics and can provide additional information for ICU outcome identification, besides individual predictors. Significant predictors identified on univariate logistic regression were enter into a state-of-the-art interpretable boosting machine model: Explainable Boosting Machine (EBM) 11 .
The EBM model is an explainable supervised predictor developed by using modern machine learning techniques like bagging, gradient boosting, and automatic main and interaction effects detection with high accuracy of state-of-the-art learning models (e.g., random forests 12 and XGBoost 13 ) with its light memory usage and fast prediction time. EBM is constructed with multiple . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ; hierarchically organized simple classifiers consisting of sequences of binary decisions. Unlike these black-box models, EBM produce lossless explanations for outcome predictions due to its great interpretability potential of tree-based decision system, which is desired for clinically operable decision-making. In contrast, internally black-box-like learning models are typically difficult to interpret. Intrinsic interpretability as equipped in EBM aims to intrinsically interpret the model predictions. The contribution of main and interaction effects to identify ICU use can be determined by their accumulated use in each decision tree splitting process, which can be easily sorted and visualized in descending order to identify the more important variables.

Baseline characteristics
The flowchart of patient enrolment in this study is provided in Figure 1. A total of 1043 patients admitted to the hospital between 1 st January 2020 and 24 th May 2020 were included in this study.
The case distributions with respect to the different districts of Hong Kong are shown in Figure 2.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The baseline demographics, comorbidities, medications, and laboratory test findings are shown in In terms of medications prescribed during the inpatient stay for non-ICU patients, lopinavir/ritonavir (Kaletra) is the most commonly used drug (60.8%), followed by ribavirin (53.2%), . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Predictors of ICU admission
Univariate logistic regression was conducted to identify significant predictors of ICU admission ( is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; p<0.0001), interferon beta

Main and Hidden Interaction Effects
The EBM model was employed to distinguish patients in need for ICU admission by accurately uncovering the main and hidden interaction effects. This utilized different data modalities such as demographics, comorbidities and multiple laboratory results. Significant variables identified by univariate logistic regression were entered into the EBM model, which will deal with the trade-off between having a minimal number of predictors and the capacity of good model prediction, therefore avoiding overfitting. The cohort is randomly classified into training and validation datasets with an 80:20 split. The obtained importance rankings of significant predictors for ICU admission are shown in Figure 5. Red blood cells, APTT, sex, age and white blood cells are the five most informative parameters in predicting ICU admission, followed by hypertension, serum sodium, serum albumin, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; serum triglycerides, and respiratory disease. Significant predictors for ICU admission identification are provided in Figure 6. We can observe that the following combination of patient characteristics predicts a higher likelihood for ICU admission: 1) male patients with lower level of red blood cells, 2) older patients with lower level of red blood cells, 3) patients with both lower levels of red blood cells and albumin or sodium, 4) patients with longer APTT and lower level of red blood cells. Important hidden pair-wise interaction effects are shown in Figure 7, where green or yellow zones with larger values indicate higher probability of ICU admission that can be predicted by examining the pair-wise variable interactions. We can observe from the plots of interaction effects that 1) male with lower red blood cells, (2) older age with lower red blood cells, 3) lower albumin level and lower red blood cells, 4) lower sodium level and lower red blood cells, 5) older age and prolonged APTT, 6) lower red bold cells level and higher white blood cells level, 7) lower red blood cells level and prolonged APTT, 8) older age and higher level white predicts higher probability of ICU admission.
EBM can provide predictions on individual cases. For example, a randomly selected patient (male, 69 years old) with ICU admission has the characteristics as shown in Figure 8. He has prior comorbidities of cardiovascular, chronic kidney, hypertension, diabetes and lung and respiratory diseases. EBM predicts that he needs ICU attention with 72% probability, based on his characteristics of prior cardiovascular disease, white blood cells at 12.43 (x10^9/L), lactate dehydrogenase level at 390 (U/L), APTT at 33.70 (sec), prior comorbidities of hypertension and diabetes, and others. But his characteristic of triglycerides at 6.29 provide non-supportive information to the prediction outcome. By contrast, a randomly selected patient (female, 54 years old) who did not require ICU admission is exemplified in Figure 9. EBM accurately predicted that . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; https://doi.org/10.1101/2020.06.30.20143651 doi: medRxiv preprint she doesn't need ICU admission. Local explanations provided by EBM can provide precise ICU admission predictions based on patient's main characteristics in a user-friendly visualization way for practical clinical use.
The five-fold cross validation performance of EBM was compared with baseline models including XGBoost, LightGBM, random forests, and multivariate logistic regression, as shown in Table 4. EBM outperforms all baseline models according to evaluation metrics of precision, recall, F1 score, and area under the curve (AUC) of the receiving operating characteristics (ROC) curve.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ;

Discussion
The main findings of this territory-wide retrospective cohort study are twofold: (1) Significant predictors of ICU admission were older age, male sex, prior coronary artery disease, respiratory diseases, diabetes, hypertension and chronic kidney disease, and activated partial thromboplastin time, red cell count, white cell count, albumin and sodium; (2) A tree-based interpretable machine learning model identified most informative characteristics and hidden interactions that can predict ICU admission. These interacting factors were low red cells with 1) male, 2) older age, 3) low albumin, 4) low sodium or 5) prolonged APTT around 33 seconds.
Prior studies have reported that patients with pre-existing medical comorbidities have a poorer prognosis in not only COVID-19 but also other infectious diseases such as 15 . In COVID-19, hypertension, diabetes, coronary heart disease, chronic kidney disease, cerebrovascular disease, hepatitis, and chronic obstructive pulmonary disease (COPD) have been identified as predictors of disease severity and mortality in 17 . In this study, we confirm that these comorbidities are predictive of ICU utilization and provide a simple clinical approach to quantify the initial risk of ICU admission precisely and quickly. Furthermore, various laboratory markers have been shown to predict adverse outcomes. Our study found that prolonged APTT and raised D-dimer, reflecting coagulopathy, was predictive of ICU admission. Other significant predictors were neutrophil count (inflammation), red cell count (oxygen carrying capacity), albumin (nutritional status), sodium (electrolyte homeostasis) and lactate dehydrogenase (tissue damage). Troponin was borderline significant, reflecting that myocardial damage is an important determinant of ICU use.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; We further illustrate the novel findings that interacting factors between low red cell count and basic demographics such as gender and age, or laboratory findings such as albumin, sodium and APTT are also important determinants. Older patients with laboratory examinations of lower red cells, lower albumin, lower sodium and prolonged APTT are subject to high ICU admission risk.
Red cells, albumin, sodium and APTT can be easily collected in any hospital. In crowded hospitals with limited medical resources, this simple model can help to quickly prioritize patients for ICU attention.
The optimum medication regimen for COVID-19 is yet to be determined. However, small scale observational studies or trials have suggested the use of antivirals 18 , antimalarials 19 , interferons 20 , anticoagulants 21 and antibodies 22 , though not all have been shown to be beneficial in larger clinical trials 23 . A better understanding of the pathophysiological mechanisms underlying COVID-19 will enable better treatment strategies to be devised 24 . In our study, the anti-viral drug lopinavir/ritonavir (Kaletra) was the commonest prescribed drug, followed by ribavirin, interferon-beta, ACEIs/ARBs, steroids, hydroxychloroquine and the antiviral remdesivir. We found that these medications were more frequently prescribed in patients requiring ICU compared to those without. This may reflect the increased severity of cases in which clinicians were more likely to prescribe a cocktail of drugs.

Conclusion
In summary, this study has identified important univariable and interaction effects informing intensive care admission in patients hospitalized with COVID-19. Significant univariable predictors of ICU admission include older age, male sex, prior coronary artery disease, respiratory diseases, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; diabetes, hypertension and chronic kidney disease, and activated partial thromboplastin time, red cell count, white cell count, albumin and serum sodium. A tree-based interpretable machine learning model identified most informative characteristics and hidden interactions (i.e., low red cells with male, older age, low albumin, low sodium or prolonged APTT) for COVID-19 prognostic ICU admission prediction. The tree-based machine learning model outperforms several baselines, enabling early detection of ICU admission, efficient healthcare resource utilization, and potentially mortality reduction of hospitalized patients with COVID-19. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ;  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; 1  8  .  A  n  t  i  n  o  r  i  S  ,  C  o  s  s  u  M  V  ,  R  i  d  o  l  f  o  A  L  ,  R  e  c  h  R  ,  B  o  n  a  z  z  e  t  t  i  C  ,  P  a  g  a  n  i  G  ,  G  u  b  e  r  t  i  n  i  G  ,   C  o  e  n  M  ,  M  a  g  n  i  C  ,  C  a  s  t  e  l  l  i  A  ,  B  o  r  g  h  i  B  ,  C  o  l  o  m  b  o  R  ,  G  i  o  r  g  i  R  ,  A  n  g  e  l  i  E  ,  M  i  l  e  t  o  D , . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020. ; Tables.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020.   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 2, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ; https://doi.org/10.1101/2020.06.30.20143651 doi: medRxiv preprint   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ; https://doi.org/10.1101/2020.06.30.20143651 doi: medRxiv preprint Figure 6. Changing effects of significant predictors on ICU use identification.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ; https://doi.org/10.1101/2020.06.30.20143651 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ; https://doi.org/10.1101/2020.06.30.20143651 doi: medRxiv preprint CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 2, 2020. ;