Derivation of an electronic frailty index for short-term mortality in heart failure: a machine learning approach

Objective: Frailty may be found in heart failure patients especially in the elderly and is associated with a poor prognosis. However, assessment of frailty status is time-consuming and the electronic frailty indices developed using health records have served as useful surrogates. We hypothesized that an electronic frailty index developed using machine learning can improve short-term mortality prediction in patients with heart failure. Methods: This was a retrospective observational study included patients admitted to nine public hospitals for heart failure from Hong Kong between 2013 and 2017. Age, sex, variables in the modified frailty index, Deyo's Charlson comorbidity index ([≥]2), neutrophil-to-lymphocyte ratio (NLR) and prognostic nutritional index (PNI) were analyzed. Gradient boosting, which is a supervised sequential ensemble learning algorithm with weak prediction submodels (typically decision trees), was applied to predict mortality. Comparisons were made with decision tree and multivariate logistic regression. Results: A total of 8893 patients (median: age 81, Q1-Q3: 71-87 years old) were included, in whom 9% had 30-day mortality and 17% had 90-day mortality. PNI, age and NLR were the most important variables predicting 30-day mortality (importance score: 37.4, 32.1, 20.5, respectively) and 90-day mortality (importance score: 35.3, 36.3, 14.6, respectively). Gradient boosting significantly outperformed decision tree and multivariate logistic regression (area under the curve: 0.90, 0.86 and 0.86 for 30-day mortality; 0.92, 0.89 and 0.86 for 90-day mortality). Conclusions: The electronic frailty index based on comorbidities, inflammation and nutrition information can readily predict mortality outcomes. Their predictive performances were significantly improved by gradient boosting techniques.


Introduction
Frailty refers to a reduced physiological reserve leading to an impairment in resilience from physical distress. Compared to highly functional community-dwelling elders, frail older adults are more likely to experience falls and disability, contributing to frequent hospitalization and premature death (1). Conventional evaluation of frailty relies on physical examination. However, this precludes its calculation using administrative data such as electronic health records. Recently, a claims-based frailty scoring system has been validated against Fried and colleges' frailty phenotype using a claim database in the United States (2-4). These electronic frailty indices do not normally include measures of chronic inflammation or nutrition status, which are both closely related to frailty syndrome and are strong determinants of adverse outcomes such as mortality (5, 6).
Heart failure is a complex syndrome characterized by high prevalence in older patients and poor prognosis (7). Heart failure and frailty have an overlapping phenotype and their co-existence is common (8). Given these associations, there has been several studies exploring the intersections between heart failure and frailty (8). Importantly, frailty has been recognized as a major prognostic indicator of heart failure, in which patients with concurrent frailty and heart failure have increased risks of hospitalizations and mortality (9). Furthermore, inflammation and nutrition status are known independent predictors of heart failure outcomes (10, 11). Inflammation has a pivotal role in the pathogenesis of heart failure. It can trigger cardiac remodeling and dysfunction that further induce cardiomyocyte damage that underlies heart failure (12). Moreover, it has been proposed that co-morbidities, such as diabetes and obesity, can induce a systemic pro-inflammatory state that drives the myocardial structural and functional alterations in heart failure (13). Conversely, inflammation can also be a consequence of established heart failure via the mechanisms of increased wall stress on endothelial cells, cell death and oxidative stress (ROS) (14). In this regard, inflammation and heart failure are interconnected and mutually inducing. Elevated pro-inflammatory cytokines were found to associate with worse clinical outcomes diagnosis, procedure, prescription, laboratory test results, admission/discharge information and death information.
The inclusion criterion was patient admitted to any of the nine local hospitals during a fouryear period between July 2013 and July 2017 with a principal diagnosis of heart failure. The diagnosis of heart failure was defined as having a record with the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9 CM) codes of 428.X.

Study variables
Variables that were previously included in the modified frailty index (4) were identified from the relevant ICD-9 codes. These include depression, Parkinson's disease, arthritis, paranoia, chronic skin ulcer, pneumonia, falls, skin and subcutaneous tissue infection, mycoses, gouty arthropathy and urinary tract infection. Laboratory test results on the measures of albumin level, neutrophil and lymphocyte counts were extracted to calculate inflammatory and nutritional indices. Neutrophil-tolymphocyte ratio (NLR) was given by the ratio of peripheral neutrophil count/mm 3 to peripheral lymphocyte count/mm 3 . Prognostic nutritional index (PNI) was calculated by 10 × serum albumin value (g/dl) + 0.005 × peripheral lymphocyte count/mm 3 . NLR and PNI estimates nearest to the admission time of the first heart failure related hospitalization of the patients were used in the analysis. Baseline Deyo's Charlson comorbidity index incorporating 17 major medical conditions was also included as a single score (28). A comparison of the included variables used in the modified frailty index, Charlson Deyo's Charlson comorbidity index and our electronic frailty index is shown in Supplementary Appendix Table S1.
Outcomes and statistical analysis . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The primary outcomes were 30-day and 90-day mortality, from the date of the first heart failure related-hospitalization of the patients. The outcome of 30-day mortality is binary and equals to 1 for mortality within 30 days and 0 otherwise, and the same for 90-day mortality outcome.
Continuous variables were presented as median (interquartile range [IQR]) and categorical variables were presented as count (%). The Mann-Whitney U test was used to compare continuous variables.
The χ 2 test with Yates' correction was used for 2×2 contingency data, and Pearson's χ 2 test was used for contingency data for variables with more than two categories. To identify significant risk factors associated with 30-day and 90-day mortality, univariate logistic regression was used to determine odds ratios (ORs) and 95% CIs. Significant variables from the univariate logistic regression (p<0.05) were further included in multivariate logistic regression to build the frailty model.
The idea of frailty is the cumulative deficits, each of which in isolation may not exert significant effects. To test this idea, we conducted an additional multivariate logistic regression analysis incorporating all risk variables, including the non-significant variables from univariate logistic regression. Finally, to demonstrate the utility of NLR and PNI, both variables were excluded in sensitivity analysis to examine the effects on evaluation metrics.
A two-sided α of less than 0.05 was considered statistically significant. All statistical analyses were performed using RStudio software (Version: 1.1.456).

Machine learning model development
Gradient boosting is a typical type of machine learning boosting, relying on the intuition that the best possible next model, when sequentially combined with previous weak models (e.g. decision trees) in a stage-wise fashion, is able to minimize the overall prediction error measured by performance evaluators, e.g., precision, recall, the area under the curve (AUC). Weaker learning models are fitted through loss gradient minimization with gradient descent optimization algorithm . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; https://doi. org/10.1101org/10. /2020. This method was used for mortality prediction in heart failure based on administrative claims with electronic health records (30). Variable importance ranking was generated to construct a machine learning based risk score for mortality prediction. Partial dependence plots were provided as low-dimensional graphical renderings of marginal effects to assist in the interpretation of relationships between most important variables and the mortality outcome. A Five-fold cross validation was performed to compare the performance in terms of precision, recall and area under the curve (AUC) of the gradient boosting model with decision tree model and logistic regression model.
The R packages, gbm (Version 2.1.5) and ggplot2 (Version 3.3.2), were used to generate the mortality prediction results.

Results
In our HF cohort (n = 8893), the median age was 81 (IQR 71-87) years and 45% (n=4027) were males. The baseline characteristics, individual variables included in the modified frailty index, inflammatory and nutritional indices between the patients died within 90 days and the patients without 90-day mortality are shown in Table 1. The median cell counts for lymphocytes was 1.2*10 9 /L and for neutrophils was 5.4*10 9 /L, yielding a neutrophil-to-lymphocyte ratio (NLR) of 4.4 (IQR 2.7-7.8). Albumin took a median level of 37.8g/L, yielding a prognostic nutritional index of 44.0 (IQR 39.8-48.5) (PNI, given by 10 × serum albumin value (g/dl) + 0.005 × peripheral lymphocyte count (per mm 3 )).
predictors of 30-day mortality (Table S2, left). For 90-day mortality, the same variables that predicted 30-day mortality, as well as Charlson score 2 were significant predictors (Table S2, right). Subsequently, the significant variables from the univariate analysis were included in multivariate logistic regression. The results of multivariable logistic regression for 30-day and 90day mortality prediction with all variables are reported in Table S3 and Table S4, respectively. Age, pneumonia, UTI, PNI, and NLR remained significant predictors of both 30-day and 90-day mortality (P<0.05).

Gradient boosting learning results and frailty score
Five-fold cross validation experiments were conducted with gradient boosting learning. The key to gradient boosting is to set the target outcomes to minimize the overall error in relation to precision, recall and AUC. In this way, the gradient boosting model sequentially adds weak decision tree learning models to the ensemble where subsequent models correct the prediction errors of prior models ( Figure S1), from which we can see that probability of 30-day mortality and 90-day mortality increases drastically as age grows above 80 years old. Specifically, predictions given by the sequential models that are close to the actual outcome should reduce the overall error, and the process continues until minimized total prediction error is achieved.
A total of 1400 and 1500 trees for 30-day and 90-day mortality prediction were assigned, respectively. The optimal tree number was identified using sensitivity analysis by plotting the value of the out-of-bag (OOB) error rate according to the number of trees within the forest ( Figure S2).
OOB samples are those samples that are not included in the bootstrap samples. Original training data is randomly sampled-with-replacement generating small subsets of data, also known as bootstrap samples. These bootstrap samples are then fed as training data to the forest model. The OOB approach was used for selecting the optimal tree number of the forest model in which four-fifths (as . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10. 1101/2020 training) of the data was used for constructing the predictive classifier while the remaining was used for evaluating the performance of the forest model. Tree depth was set at 1 for both 30-day and 90day mortality prediction according to tree depth parameter tuning ( Figure S3). The variable importance is reported in Table 2 and is used for building the frailty score.
The partial dependence relationships of the highest variable importance values for mortality prediction were also identified using gradient boosting learning. The probabilities of 30-day and 90day mortality both increase as patient becomes older, and it increases sharply when patients are older than 80 (Figure 1). For PNI, the likelihood of mortality decreases sharply as PNI increases from 0 to 20 and remains almost constant when PNI increases beyond 65 (30-day mortality) or 70 (90-day mortality) (Figure 2). There is a non-linear relationship between NLR and 30-and 90-day mortality ( Figure 3). Patients with pneumonia has high probability of mortality, 24% for 30-day mortality and 14% for 90-day mortality.
Comparative analyses of gradient boosting learning model, decision tree model, and multivariate logistic regression model for 30-day and 90-day mortality prediction are reported in Table S5 with five-fold cross validation. Gradient boosting learning shows the best performance in prediction, recall and AUC evaluation metrics.
The results from the sensitivity analysis excluding NLR and PNI are shown in Supplementary Appendix 2. The optimum tree numbers are shown in Figure S4. Without NLR and PNI, age became the most important variable for predicting both 30-day and 90-day mortality (Table S6; Figure S5) and some evaluation metrics were lower, but others were not affected ( . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021.

Discussion
The main findings of this study are that 1) PNI, NLR and age in the modified electronic frailty index were the most predictive variables for the short-term mortality outcomes in heart failure patients, and 2) non-linear partial dependence relationships between these predictors and outcomes were observed.
We developed a modified electronic frailty model after incorporating the inflammatory and nutritional indices into the conventional frailty scoring system based on the value of importance of each variable generated from gradient boosting learning model. Compared with multivariate logistic regression and decision tree, gradient boosting learning techniques improved the predictive performance of our frailty model. To enhance mortality prediction by capturing the non-linear pattern within characteristics, we develop an interpretable machine learning model based on gradient boosting machine (31). Machine learning models can be fitted to data individually or combined in an ensemble, resulting in an efficient combination of simple individual learning models that together create a more powerful model (32).
In this study, significant risk factors to predict 30-day and 90-day mortality are efficiently identified with gradient boosting learning model. The obtained rankings of important variables for mortality prediction can be used as an electronic heart failure frailty scoring tool in for clinical use.
The efficient identification of partial dependence for predictive variables provides more refined estimation of the likelihood of mortality. For example, effective estimations about patient's mortality probability based on characteristics of smaller PNI, older age, larger NLR (below 60 or so) and pneumonia. All of these variables were associated with impaired mobility. Of these factors, pneumonia is a common nosocomial condition that also confers a significantly higher risk of 30-day post-admission mortality(33). In addition, we extensively conduct the analysis without PNI and NLR, and the results are provided in Supplementary Material. Variable importance ranking for 30-day identifies important variables age, pneumonia, skin ulcer, UTI, Parkinson's, gout, and male sex to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; predict 30-day mortality, while variables age, pneumonia, UTI, skin ulcer, Parkinson's, Carlson score to predict 90-day mortality.
Heart failure has been recognized as predominately a syndrome that affects the geriatric population, with over 50% of incidence and 60% of heart failure-associated mortality occurring in the population over 75 years old (34). Age at diagnosis is also one of the most significant prognostic factors for subsequent survival (35). In our cohort, the median age was 81 years old and the risk of the short-term mortality increased strikingly in those aged over 80. Age was ranked as the most important variable to predict 90-day mortality and the second most important variable for 30-day mortality. In 2011, it was reported that the one-year mortality rates increased sharply from 20% to over 30% in patients 75-84 years, and over 40% in patients aged over 85 years (36). The high prevalence of important risk factors, such as hypertension and ischemic heart disease, leads to the increasing incidence of heart failure in older patients (37). Moreover, the survival outcomes of heart failure are closely related to the unfavorable age-associated changes in cardiovascular structure and function, which compromise cardiac reverse capacity (38). Therefore, it is not surprising to observe the strong prognostic value of age in our frailty model.
The frailty index was based on the concept that frailty is caused by the accumulation of health deficits (39). The frailty state itself is considered as an individual variable that can predict mortality (40), even independently of age in different settings (41). The first electronic frailty index developed by Segal et al. was based on the same concept, in which the candidate variables were selected based on their potential correlation with the frailty state rather than mortality directly. 4 Therefore, the individual variables in the frailty index might not associate well with the mortality outcome, the deficits cumulatively lead to an increased risk of mortality.
The specific value of frailty in heart failure cohort has been examined by many studies. A recent meta-analysis has confirmed the association between the frailty state and the worse clinical outcomes in patients with heart failure (42). Indeed, recent guidelines have recommended the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.26.20248867 doi: medRxiv preprint assessment of frailty status in heart failure patients to aid risk stratification (43). The Identification of Senior At Risk (ISAR) scale is another frailty screening tool that can predict 30-day mortality in older patients with acute heart failure (44). Among the current literature, a few studies utilized frailty indices and reached similar conclusions to other studies in which frailty was assessed as a phenotype (45,46), and there is no consensus which method is more suitable in the cohort of heart failure patients (42). The variables included in the various frailty indices used for heart failure were also largely different. A study in the UK combined the frailty index and nutritional index and found an improved prognostic power compared to the conventional frailty index, suggesting that nutrition and frailty are correlated but also remained as independent prognostic factors (46). No previous study has attempted to incorporate inflammatory measures into frailty indices for heart failure prognosis despite the strong pathophysiological associations between these concepts (47).

Strength and limitations
To the best of our knowledge, this is the first study incorporating both the inflammatory marker and nutritional index into the conventional frailty index. The indices used in this study, NLR and PNI can be easily calculated and incorporated into the decision-making process in the clinical setting. We utilized a large patient cohort that is homogeneously Chinese from a real-world database and the final frailty score was derived from a machine learning model, which was shown to have better performance than the baseline multivariate logistic regression for mortality prediction. There are some limitations to our study. Firstly, this is a multicenter study conducted in Hong Kong, external validation of our results using data from other databases in other countries are needed.
Secondly, our study did not include information on the treatment prescribed during the acute phase and postadmission, which may affect the survival outcomes in patients. Nevertheless, frailty models without adjusting such information are still strong predictors of mortality. (48) . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10. 1101/2020

Conclusions
In this study, we created an electronic frailty index that included comorbidity information, inflammatory and nutritional indices. This was then used for short-term mortality prediction in heart failure. Given that these variables can be determined or calculated automatically, their incorporation into clinical risk scores or prediction rules will facilitate clinicians to perform risk stratification more readily. Further prospective studies are warranted to validate the present model by combining other more comprehensive and complex inflammatory, nutritional and frailty assessment tools to confirm its predictive power for clinical use.

Conflicts of Interest
All authors declare no conflict of interest.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; https://doi. org/10.1101org/10. /2020 2  0  .  C  h  i  e  n  S  C  ,  L  o  C  I  ,  L  i  n  C  F  ,  S  u  n  g  K  T  ,  T  s  a  i  J  P  ,  H  u  a  n  g  W  H  ,  Y  u  n  C  H  ,  H  u  n  g  T  C  ,   L  i  n  J  L  ,  L  i  u  C  Y  ,  H  o  u  C  J  ,  T  s  a  i  I  H  ,  S  u  C  H  ,  Y  e  h  H  I  ,  H  u  n  g  C  L  .  M  a  l  n  u  t  r  i  t  i  o  n  i  n  a  c  u  t  e  h  e  a  r  t   f  a  i  l  u  r  e  w  i  t  h  p  r  e  s  e  r  v  e  d  e  j  e  c  t  i  o  n  f  r  a  c  t  i  o  n  :  c  l  i  n  i  c  a  l  c  o  r  r  e  l  a  t  e  s  a  n  d  p  r  o  g  n  o  s  t  i  c  i  m  p  l  i  c  a  t  i  o a  s  o  n  J  B  ,  P  e  t  e  r  B  a  r  t  l  e  t  t  ,  M  a  r  c  u  s  F  r  e  a  n  B  o  o  s  t  i  n  g  A  l  g  o  r  i  t  h  m  s  a  s  G  r  a  d  i  e  n  t   D  e  s  c  e  n  t   A  d  v  a  n  c  e  s  i  n  n  e  u  r  a  l  i  n  f  o  r  m  a  t  i  o  n  p  r  o  c  e  s  s  i  n  g  s  y  s  t  e  m  s   2  0  0  0  :  5  1  2  -5  1  8  .   3  0  .  D  e  s  a  i  R  J  ,  W  a  n  g  S  V  ,  V  a  d  u  g  a  n  a  t  h  a  n  M  ,  E  v  e  r  s  T  ,  S  c  h  n  e  e  w  e  i  s  s  S  .  C  o  m  p  a  r  i  s  o  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 2, 2021. ; https: //doi.org/10.1101//doi.org/10. /2020 4  1  .  H  e  w  i  t  t  J  ,  C  a  r  t  e  r  B  ,  M  c  C  a  r  t  h  y  K  ,  P  e  a  r  c  e  L  ,  L  a  w  J  ,  W  i  l  s  o  n  F  V  ,  T  a  y  H  S  ,   M  c  C  o  r  m  a  c  k  C  ,  S  t  e  c  h  m  a  n  M  J  ,  M  o  u  g  S  J  ,  M  y  i  n  t  P  K  .  F  r  a  i  l  t  y  p  r  e  d  i  c  t  s  m  o  r  t  a  l  i  t  y  i  n  a  l  l   e  m  e  r  g  e  n  c  y  s  u  r  g  i  c  a  l  a  d  m  i  s  s  i  o  n  s  r  e  g  a  r  d  l  e  s  s  o  f  a  g  e  .  A  n  o  b  s  e  r  v  a  t  i  o  n  a  l  s  t  u  d  y  . A g e a n d A g e i n g . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10. 1101/2020  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.26.20248867 doi: medRxiv preprint  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10.1101/2020.12.26.20248867 doi: medRxiv preprint Figure 3. Partial dependence of NLR for 30-day (left) and 90-day (right) mortality risk probability prediction.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 2, 2021. ; https://doi.org/10. 1101/2020