Summary
Background COVID-19 pandemic has developed rapidly and the ability to stratify the most vulnerable patients is vital. However, routinely used severity scoring systems are often low on diagnosis, even in non-survivors. Therefore, clinical prediction models for mortality are urgently required.
Methods We developed and internally validated a multivariable logistic regression model to predict inpatient mortality in COVID-19 positive patients using data collected retrospectively from Tongji Hospital, Wuhan (299 patients). External validation was conducted using a retrospective cohort from Jinyintan Hospital, Wuhan (145 patients). Nine variables commonly measured in these acute settings were considered for model development, including age, biomarkers and comorbidities. Backwards stepwise selection and bootstrap resampling were used for model development and internal validation. We assessed discrimination via the C statistic, and calibration using calibration-in-the-large, calibration slopes and plots.
Findings The final model included age, lymphocyte count, lactate dehydrogenase and SpO2 as independent predictors of mortality. Discrimination of the model was excellent in both internal (c=0·89) and external (c=0·98) validation. Internal calibration was excellent (calibration slope=1). External validation showed some over-prediction of risk in low-risk individuals and under-prediction of risk in high-risk individuals prior to recalibration. Recalibration of the intercept and slope led to excellent performance of the model in independent data.
Interpretation COVID-19 is a new disease and behaves differently from common critical illnesses. This study provides a new prediction model to identify patients with lethal COVID-19. Its practical reliance on commonly available parameters should improve usage of limited healthcare resources and patient survival rate.
Funding This study was supported by following funding: Key Research and Development Plan of Jiangsu Province (BE2018743 and BE2019749), National Institute for Health Research (NIHR) (PDF-2018-11-ST2-006), British Heart Foundation (BHF) (PG/16/65/32313) and Liverpool University Hospitals NHS Foundation Trust in UK.
Evidence before this study Since the outbreak of COVID-19, there has been a pressing need for development of a prognostic tool that is easy for clinicians to use. Recently, a Lancet publication showed that in a cohort of 191 patients with COVID-19, age, SOFA score and D-dimer measurements were associated with mortality. No other publication involving prognostic factors or models has been identified to date.
Added value of this study In our cohorts of 444 patients from two hospitals, SOFA scores were low in the majority of patients on admission. The relevance of D-dimer could not be verified, as it is not included in routine laboratory tests. In this study, we have established a multivariable clinical prediction model using a development cohort of 299 patients from one hospital. After backwards selection, four variables, including age, lymphocyte count, lactate dehydrogenase and SpO2 remained in the model to predict mortality. This has been validated internally and externally with a cohort of 145 patients from a different hospital. Discrimination of the model was excellent in both internal (c=0·89) and external (c=0·98) validation. Calibration plots showed excellent agreement between predicted and observed probabilities of mortality after recalibration of the model to account for underlying differences in the risk profile of the datasets. This demonstrated that the model is able to make reliable predictions in patients from different hospitals. In addition, these variables agree with pathological mechanisms and the model is easy to use in all types of clinical settings.
Implication of all the available evidence After further external validation in different countries the model will enable better risk stratification and more targeted management of patients with COVID-19. With the nomogram, this model that is based on readily available parameters can help clinicians to stratify COVID-19 patients on diagnosis to use limited healthcare resources effectively and improve patient outcome.
Introduction
Since the outbreak of coronavirus disease 2019 (COVID-19) in December 2019 in China, there have been over 200,000 confirmed cases, with 10-20% developing severe COVID-19 and approximately 5% requiring intensive care1,2. Although the number of daily cases has significantly reduced in China because of intensive control procedures, the numbers in the rest of world are continuing to increase. More than 10,000 deaths have been reported but the estimated mortality rate (3%) is much lower than Severe Acute Respiratory Syndrome (SARS) in 2003 (9·6%, 774 died of 8096 infected) and Middle East Respiratory Syndrome (MERS) in 2012 (34·4%, 858 died of 2494 infected)3. Although the vast majority of cases of COVID-19 are not life-threatening, stratification of these patients becomes increasingly important as large populations are expected to be infected globally.
The deaths and morbidity from COVID-19 infection are primarily due to respiratory failure, although a few people have died of multiple organ failure (MOF) or chronic comorbidities4,5. Therefore, reduced oxygen saturation is the major indicator of disease severity. However, symptoms at onset are relatively mild and a substantial proportion of patients have no obvious symptoms prior to the development of respiratory failure4,5. Clinically, it is difficult to stratify patients who may develop lethal COVID-19 until respiratory failure develops. Since early identification and effective treatment can reduce mortality and morbidity as well as relieve resource shortages, there is an urgent need for effective prediction models to identify patients who are most likely to develop respiratory failure and poor outcomes.
A robust, highly discriminatory and validated clinical prognosis model is required for stratification of these patients. Currently, very few reports propose reliable prediction models, which have been constructed using Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidance with internal and external validation6. One report based on a cohort of 191 COVID patients in early stages of the pandemic showed that age, Sequential Organ Failure Assessment (SOFA) score and D-dimer are independent risk factors of mortality7. However, D-dimer is not a routinely available clinical measurement and SOFA can be relatively low in COVID-19 patients on admission, but in the aforementioned report some patients in late stage of COVID-19 were transferred to its study hospitals7. This might explain the difference in SOFA scores.
In this study, we aimed to employ routine, widely available clinical test data to predict mortality of COVID-19 hospitalised patients using multivariable regression analysis in a cohort of 299 cases. The model has been established and validated with a new cohort of 145 patients with COVID-19 from a different hospital. Through this study, we hope that the experience from the outbreak in China will assist the rest of the world in better stratification of COVID-19 patients to achieve better outcomes for patients.
Methods
Study design and patients
This was a retrospective observational study performed in officially designated treatment centers for confirmed patients with COVID-19 in Wuhan at the center of the outbreak in China. The protocol was approved by the local Institutional Ethics Committee (Approval Number: KY-2020-10.02). First cohort of 299 patients admitted in Tongji hospital within January and February 2020 were enrolled into this study for model development. Second cohort of 151 patients admitted to Jinyintan hospital who were included in a previous study were reused for model validation. All patients were diagnosed by positive tests of novel coronavirus nucleic acids (SARS-CoV-2), according to WHO interim guidance8. Only patients who were discharged from hospital or had died were included in this study. Six patients in the second cohort, who died very quickly after admission and did not received any laboratory testing, were excluded. Patients younger than 18 years of age were also excluded.
Data collection
Clinical data, including demographic information, chronic comorbidities, biomarkers of infection and other laboratory tests were collected from the local Online Medical System. The time from onset of symptoms to hospital admission was also recorded. SOFA was calculated within 24h of admission. For patients who did not have arterial blood gas analysis, peripheral capillary oxygen saturation (SpO2) to estimate the arterial oxygen partial pressure (PaO2) and the recorded fraction of inspired oxygen (FiO2) were used. The SOFA score for the respiratory system was calculated according to PaO2/FiO2 (P/F) ratio when patients received non-invasive or invasive mechanical ventilation. All patients were closely followed until they died or were discharged from the hospital. Hospital mortality and length of hospital stay were also recorded.
Statistical analysis
Data were fully anonymized before data cleaning and analyses. We followed the TRIPOD guidance for development, validation and reporting of multivariable prediction models6. Analyses were performed in R version 3.5.19. The development model was built using the data from Tongji Hospital. We performed logistic regression analysis with the outcome variable defined as mortality. Variables were selected a priori based on previous clinically-related studies, completeness across both sites, clinical knowledge and practicality of measurement in acute medical emergencies. Variables were excluded if they had high collinearity with global scores. The number of predictors was restricted based on the total number of outcomes in the development dataset10. Analysis was conducted for the following nine variables: age, hypertension, diabetes, SpO2, systolic blood pressure, albumin, lactate dehydrogenase (LDH), lymphocyte count and platelet count. Organ-specific markers, such as alanine transaminase (ALT) and blood urea nitrogen (BUN) were not included in the original development model because of collinearity with global markers, particularly LDH. SOFA score was not included for this reason, as explained in Results. D-dimer, interleukin (IL)-6 and cardiac troponin I (cTnI) were excluded from the analysis because they were not routinely tested in the cohorts. LDH was selected as a general marker for cell injury or death and since blood gas analysis were not commonly used in these hospitals, SpO2 was selected to reflect respiratory function and was modelled as a continuous variable to maximize statistical power11. Albumin was selected as a marker for the acute phase response instead of C reactive protein (CRP) because of CRP collinearity with LDH. Age is known risk factor for mortality from severe respiratory infections, including COVID-19. Lymphocyte count and platelet counts were reported in a previous publication of COVID-19 and selected7,12. In this study, lymphocyte count was log transformed due to extreme values. Systolic blood pressure (SBP), diabetes and hypertension were selected based on previous publications7. All nine selected variables were input into the multivariable model and backwards stepwise selection was performed with improvement in goodness-of-fit assessed by a reduction in the Akaike Information Criterion (AIC). A nomogram of the selected final model was used for graphical representation of the prediction model and was produced with R function regplot – this was chosen to offer maximum use of the prediction model in clinical practice without the need for internet access13,14.
Model performance was assessed via measures of discrimination and calibration. To assess discrimination, the C statistic was used. For calibration, patients in the development database were split into deciles, ordered by their probability of death. For each decile, the mean of the predicted probability of death was calculated and compared with the mean observed probability of death. Calibration plots were constructed including locally estimated scatterplot smoothing (loess) regression lines15.
Internal validity was assessed with bootstrapping (1000 replications) of the entire model building process including backwards stepwise selection of all potential predictors. Bootstrapped samples were created by drawing random samples with replacement from the development database. The prediction model was fitted on each bootstrap sample and tested on the original sample. Overfitting was assessed in each bootstrap replication. This allowed calculation of optimism and optimism adjusted discrimination and calibration statistics. The final development model coefficients were then adjusted for optimism using the optimism adjusted calibration slope for shrinkage15.
To assess external validity, the final optimism-adjusted model was applied to an independent data set, from Jinyintan hospital. External validity of the model was assessed using the C statistic, calibration-in-the-large, calibration slope and the calibration plot. Recalibration was conducted updating the intercept, and the intercept and slope.
Results
Patient cohorts
The development cohort included 299 patients who were admitted in Tongji Hospital. Data completeness is presented in supplementary Table 1. There were 155 deaths, the median age of patients was 65 years (IQR: 54-73) and 48·2% were male (Table 1). The validation cohort that included 145 patients was younger: age 56 years (IQR 47-68) with a higher percentage of males (65·5%) than in the developmental cohort (P<0·001). The median (IQR) time from onset of symptoms to hospital admission was no different in the developmental 10 [7-14] and validation 10 [7-13] cohorts. More patients had hypertension (42·6% vs 26·9%) and heart failure (4·4% vs 0.7%) in the development than validation cohorts, whilst diabetes (18·5% vs 11·7%), chronic obstructive pulmonary disease (COPD) (5·0% vs 3·4%) and others (6·4% vs 4·8) showed no statistical difference. SOFA scores were statistically higher in the development than validation cohorts (2·0 [2·0-4·0] vs 1·0 [0·0-3·0]) but both were very low compared to common critical illnesses. Table 2 shows that non-survivors were significantly older than survivors (69 years (IQR 61·75 to 75·00) vs. 52·50 (IQR 43·75-64·00), p<0·001). Patients with hypertension, showed a significantly higher mortality rate (p<0·001).
Model development
In the developmental cohort (Tongji Hospital), 268/299 patients had complete data for all nine variables. The final multivariable model included age (adjusted OR 1·054; 95% CI 1·028 to 1·083), LDH (adjusted OR 1·004; 95% CI 1·002 to 1·006), log lymphocyte count (adjusted OR 0·296; 95% CI 0·148 to 0·541) and SpO2(adjusted OR 0·897; 95% CI 0·831 to 0·958) (Table 3). The final model showed excellent discrimination (C statistic = 0·89) and was well calibrated (slope =1 and calibration-in-the-large =0·00), as shown in the calibration plot (Table 4 and Figure 1). The final model for mortality is presented as a nomogram and an example of how to use the nomogram is presented for a patient aged 59, with LDH of 482, SpO2 of 85% and lymphocyte count of 0·64 (Figure 2).
Internal validation
Interval validation was run using bootstrap resampling and showed a small amount of overfitting. Bootstrapping with backwards stepwise selection provided a shrinkage factor of 0.900. After adjusting for overfitting the final model retained high discrimination (C statistic= 0·880) and calibration-in-the-large of 0·006 (Table 4). The shrinkage factor was applied to the β coefficients of the multivariable model to provide optimism adjusted coefficients (Table 3).
External validation
External validation is required because the accuracy of a predictive model will always be high if it is validated on the development cohort used to generate the model16,17. In this study, we applied our final prediction model with optimism adjusted coefficients to the validation dataset. There were 145 patients in the validation dataset of whom 69 patients (47·6%) died. Of the 145 patients, 127 had complete data for the four variables in the final developmental model (age, lymphocyte count, SpO2 and LDH) The C statistic for the discrimination of the developed model in the independent data was 0·980 (0·958 to 1·000) and the model had reasonable calibration (slope and calibration-in-the-large) (Table 5). Figure 3 suggests some mis-calibration of the model in the independent data with over-estimation of risk of mortality in low-risk individuals and under-estimation of risk in high-risk individuals. After recalibration of the intercept, the discrimination of the model was the same (as expected given that recalibration does not influence the ranking of individuals according to the model) as was the calibration slope. Calibration-in-the-large was 0·00 (−0·477 to 0·471). The calibration plots still suggest mis-calibration of the model (Figure 3c). When both intercept and slope were recalibrated the calibration slope was 1·00 (0·675 to 1·477) and the plot then demonstrated good calibration (Figure 3d).
Discussion
Most routine scoring systems, such as SOFA, which are very useful in the intensive care unit (ICU) for sepsis and other critical illnesses18-20, were very low in COVID-19 patients on admission with median SOFA score of 2. However, 155/299 and 69/145 patients in the development and validation cohorts died in the hospital, respectively. This poor outcome strongly indicates that these routine scoring systems used in general wards and ICU cannot accurately assess the severity and predict the mortality of patients with COVID-19. We have therefore used a hospitalized study population from Wuhan Tongji Hospital to develop a clinical prediction model using available demographic and clinical indicators. The most informative factors in COVID-19 positive patients were age, LDH, lymphocyte count and SpO2. The developed statistical model had good discrimination and minimal model optimism. In external validation using an independent cohort from Jinyintan hospital, the model had excellent discrimination but needed recalibrating to account for the different underlying risk profile in the independent dataset. Specifically, the development and validation cohorts are temporally distinct as the validation cohort were from earlier in the COVID-19 outbreak. In addition, they are from different sites within Wuhan and the demographics are distinct, with patients admitted to the Jinyintan hospital being younger with a male majority and less patients with hypertension and low SOFA scores (1.0 [0-3] vs 2.0 [2.0-4.0]. However, the nomogram of the final model after optimization can be used by clinicians to give a prediction of mortality and thus inform treatment choice and guide patient and family counselling.
Old age is a major factor of mortality from infection21. This is consistent with the current data from China on COVID-19 with less than 1% in children <10 years old, increasing to 2%, 4% and 10% in age 11-40, 41-70, and >70, respectively. However, the majority of people infected are 30-70 years old (approximately 80% of the total cases), thereby making it difficult to stratify the majority of cases according to age. Patient comorbidities, including hypertension, diabetes, did not significantly contribute to the model, although mortality rates were significantly higher in patients with hypertension.
The pathological feature of COVID-19 is mainly that of a viral pneumonia with alveolar oedema and blockage of small bronchi to compromise gas exchange within the lungs. Therefore, PaO2 will drop when severe pneumonia develops. Since arterial blood gas (ABG) analysis is an invasive and complex process, which requires arterial blood taking and use of a ABG analyzer, SpO2 measurement is much more used for continuous analysis of blood oxygen saturation in patients to estimate the arterial oxygen partial pressure (PaO2)22. Moreover, many mobile phone brands contain built-in sensors and SpO2 can be measured by patients themselves. In our cohorts, many patients had no blood gas data and therefore SpO2 was used as a variable in our model. Although SpO2 is not as accurate as PaO2 in critical illnesses, it could give the first indication of compromised lung gas exchange in the early stages of COVID-19, which would then be verified by ABG measurement upon admission to hospital.
Lymphopenia is very common in patients with influenza virus infection and bacterial infection, particularly in sepsis23,24. CD4 and CD8 T cell apoptosis causes prolonged infection by promoting virus survival25, and induces immune suppression to increase mortality in patients with sepsis26. In this study, we found a dramatic difference in lymphocyte counts between survivors and non-survivors. In non-survivors, the peripheral lymphocyte counts were very low on admission, indicating that extensive lymphocyte death occurred following infection. LDH is released into circulation after cell injury or death27 and serves as an index of the extent of cellular damage by the virus or the host immune response28.
This study has several limitations. At this stage in the global outbreak of COVID-19, no published studies have developed and validated prediction models for assessing the probability of mortality in hospitalized patients. A most recent report proposed SOFA score, age and D-dimers as the important prognostic markers7. However, in our cohorts, the SOFA score was very low although there is statistical difference between survivors and non-survivors. This difference may be due to some patients being treated in other hospitals prior to transfer in the previously reported study7. In addition, SOFA score relies on blood gas analysis, which cannot be easily done continually, in clinics and in under-resourced settings, thus making it difficult for clinicians to use readily. In addition, since D-dimer is not routinely requested, too many patients in our cohorts had no test performed, making it is difficult for us to verify published results. Organ-specific injury markers were not included in the model, including cTnI due to missing data (not routinely measured), as well as ALT and BUN due to high co-linearity with other variables within the model. Therefore, the choice of potential prognostic factors could not be informed by a systematic review of existing prediction models. Similarly, there are no published estimates of discrimination and calibration to compare this study to. Additionally, the event rate is limited, especially in the independent dataset where at least 100 events would be preferable29. A multicenter study with a larger cohort for model development and also for validation would add more power, reduce bias, and make the findings more generically applicable.
There is also inherent bias in the recruitment of patients because the Jinyintan hospital was the first officially designated receiving hospital in Wuhan for all COVID-19 cases at the start of the outbreak. The recruited patients were mainly self-referrals, but some were referred by other hospitals immediately after a positive diagnostic test. Self-referred patients are more likely to have severe symptoms that cause them to seek emergency medical help. Likewise, other hospitals are more likely to test patients who have more pronounced symptoms, thus creating an inherent bias in the patient sample. Therefore, the cohort may not accurately represent patients with mild or asymptomatic COVID-19. These results need to be further validated with patients who are hospitalized for different severities of illness.
Despite these limitations, the obvious advantage of the model developed here is that the panel of variables used to build the model are basic demographics and values from accessible clinical tests that are recorded routinely in the acute clinical setting. With the vivid nomogram, this model will enable the majority of clinicians to assess the severity of patient illness without difficulty, particular in very busy clinical settings. We were also able to externally validate our model in an independent dataset. The discrimination of the model was high in both internal and external validation showing that the model is able to accurately differentiate high and low risk individuals. The prediction models are based on and validated in Chinese hospitalized COVID-19 positive populations and should therefore be applicable to other sites within China and may also be translatable into other sites during the global outbreak. However, separate validation will be required in other patient populations before widespread application.
In conclusion, traditional early warning scores and organ injury markers do not help to predict severity and mortality in patients with COVID-19. Here, we have developed and validated a clinical prediction model using readily available clinical markers which is able to accurately predict mortality in people with COVID-19. This model can therefore be used in current pandemic regions to identify patients who are at risk of severe disease at an early enough stage to initialize intensive supportive treatment and to guide discussions between clinicians, patients and families. It has the potential to be used throughout the world following additional validation in relevant worldwide datasets, with recalibration as necessary to account for differing risk profiles of patients.
Data Availability
Aggregated data will be made available on request.
Author Contribution
JX, HC, SL, GW, BD and HQ had the idea for and designed the study; JX, HC, SL, YW, HK, RZ, XL and ZT collected the clinical data. DH, STA and LB did the statistical analysis and wrote part of the manuscript. GW and CHT wrote the manuscript. All authors contributed to acquisition, analysis, or interpretation of data. All authors revised the report and approved the final version before submission.
Declaration of Interests
All the authors declare no conflict of interest.
Role of the funding source
The sponsors had not been involved in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding authors had full access to all the data and had final responsibility for this study. DH is funded outside the scope of work by an NIHR Post-Doctoral Fellowship (PDF-2018-11-ST2-006).
Acknowledgements
This study was supported by following funding: Key Research and Development Plan of Jiangsu Province (BE2018743 and BE2019749), National Institute for Health Research (NIHR) (PDF-2018-11-ST2-006), British Heart Foundation (BHF) (PG/16/65/32313) and Liverpool University Hospitals NHS Foundation Trust in UK. We also thank all the patients and medical staff involved.