Abstract
Objectives We aimed to derive and validate a triage tool, based on clinical assessment alone, for predicting adverse outcome in acutely ill adults with suspected COVID-19 infection.
Methods We undertook a mixed prospective and retrospective observational cohort study in 70 emergency departments across the United Kingdom (UK). We collected presenting data from 22445 people attending with suspected COVID-19 between 26 March 2020 and 28 May 2020. The primary outcome was death or organ support (respiratory, cardiovascular, or renal) by record review at 30 days. We split the cohort into derivation and validation sets, developed a clinical score based on the coefficients from multivariable analysis using the derivation set, and the estimated discriminant performance using the validation set.
Results We analysed 11773 derivation and 9118 validation cases. Multivariable analysis identified that age, sex, respiratory rate, systolic blood pressure, oxygen saturation/inspired oxygen ratio, performance status, consciousness, history of renal impairment, and respiratory distress were retained in analyses restricted to the ten or fewer predictors. We used findings from multivariable analysis and clinical judgement to develop a score based on the NEWS2 score, age, sex, and performance status. This had a c-statistic of 0.80 (95% confidence interval 0.79-0.81) in the validation cohort and predicted adverse outcome with sensitivity 0.98 (0.97-0.98) and specificity 0.34 (0.34-0.35) for scores above four points.
Conclusion A clinical score based on NEWS2, age, sex, and performance status predicts adverse outcome with good discrimination in adults with suspected COVID-19 and can be used to support decision-making in emergency care.
Registration ISRCTN registry, ISRCTN28342533, http://www.isrctn.com/ISRCTN28342533
Introduction
The initial management of acutely ill people with suspected COVID-19 involves assessing the risk of adverse outcome and the need for life-saving intervention, to then determine decisions around hospital admission and inpatient referral.[1-5] Triage tools can assist decision-making by combining information from clinical assessment in a structured manner to predict the risk of adverse outcome. They can take the form of a score that increases with the predicted risk of adverse outcome or a rule that categorises patients into groups according to their risk or their intended management. Inclusion of laboratory and radiological information can improve prediction but requires hospital attendance, increases emergency department (ED) length of stay, and increases the infection risk related to repeated patient contacts. An appropriate triage tool for suspected COVID-19 needs to be based on clinical assessment alone and applicable to people with suspected COVID-19.
We designed the Pandemic Influenza Triage in the Emergency Department (PAINTED) study following the 2009 H1N1 influenza pandemic to develop and evaluate triage tools in any future influenza pandemic.[6] We changed PAINTED to the Pandemic Respiratory Infection Emergency System Triage (PRIEST) study in January 2020 to address any pandemic respiratory infection, including COVID-19. The United Kingdom (UK) Department of Health and Social Care activated PRIEST on 20 March 2020 to develop and evaluate triage tools in the COVID-19 pandemic. Initial descriptive analysis of the PRIEST data showed that adults presenting to the ED with suspected COVID-19 have much higher rates of COVID-19 positivity, hospital admission and adverse outcome than children.[7] We therefore decided to undertake separate studies in adults and children, and only develop a new triage tools in adults, which we present here.
Evaluation of existing triage tools using the PRIEST study data suggested that CURB-65, [8] the National Early Warning Score version 2 (NEWS2) [9] and the Pandemic Modified Early Warning Score (PMEWS) [10] provide reasonable prediction for adverse outcome in suspected COVID-19 (c-statistics 0.75 to 0.77). [11] Scope therefore existed to develop a specific triage tool for COVID-19 with better prediction for adverse outcome.
Aims and objectives
We aimed to derive and validate a triage tool in the form of an illness severity score, based on clinical assessment alone, for predicting adverse outcome in acutely ill adults with suspected COVID-19 infection.
Methods
Design
We designed PRIEST as an observational study to collect standardised predictor variables recorded in the ED, which we would then use to derive and validate new tools for predicting adverse outcome up to 30 days after initial hospital presentation. The study did not involve any change to patient care. Hospital admission and discharge decisions were made according to usual practice, informed by local and national guidance.
Setting and population
We identified consecutive patients presenting to the ED of participating hospitals with suspected COVID-19 infection. Patients were eligible if they met the clinical diagnostic criteria [12] of fever (≥ 37.8°C) and acute onset of persistent cough (with or without sputum), hoarseness, nasal discharge or congestion, shortness of breath, sore throat, wheezing, or sneezing. This was determined on the basis of the assessing clinician recording that the patient had suspected COVID-19 or completing a standardised assessment form designed for suspected pandemic respiratory infection [6].
Interventions
For this study we planned to develop a triage tool in the form of an illness severity score based on clinical assessment and routine observations that any health care professional could use to rapidly estimate the risk of adverse outcome. The score would be based on a number of categorised variables, with points allocated to each category of each variable, which would then be summed to give a total score reflecting the predicted risk of adverse outcome. To enhance usability, we planned to (a) use a restricted number of variables, rather than all potentially predictive variables, and (b) categorise variables in accordance with currently used scores, unless there was clear evidence that these categories provided suboptimal prediction.
Data collection
Data collection was both prospective and retrospective. Participating EDs were provided with a standardised data collection form (Appendix 1) that included variables used in existing triage tools or considered to be potentially useful predictors of adverse outcome. Participating sites could adapt the form to their local circumstances, including integrating it into electronic or paper clinical records to facilitate prospective data collection, or using it as a template for research staff to retrospectively extract data from clinical records. We did not seek consent to collect data but information about the study was provided in the ED and patients could withdraw their data at their request. Patients with multiple presentations to hospital were only included once, using data from the first presentation identified by research staff.
Outcome measurement
Research staff at participating hospitals reviewed patient records at 30 days after initial attendance and recorded outcomes using the follow-up form in Appendix 2. The primary outcome was death or major organ support (respiratory, cardiovascular, or renal) up to 30 days after initial attendance. Death and major organ support were also analysed separately as secondary outcomes. Our primary outcome definition reflected the need for triage tools to identify patients at risk of adverse outcome or requiring life-saving intervention to prevent adverse outcome. Respiratory support was defined as any intervention to protect the patient’s airway or assist their ventilation, including non-invasive ventilation or acute administration of continuous positive airway pressure. It did not include supplemental oxygen alone or nebulised bronchodilators. Cardiovascular support was defined as any intervention to maintain organ perfusion, such as inotropic drugs, or invasively monitor cardiovascular status, such as central venous pressure or pulmonary artery pressure monitoring, or arterial blood pressure monitoring. It did not include peripheral intravenous cannulation or fluid administration. Renal support was defined as any intervention to assist renal function, such as haemofiltration, haemodialysis, or peritoneal dialysis. It did not include intravenous fluid administration.
Analysis
We randomly split the study population into derivation and validation cohorts by randomly allocating the participating sites to one or other cohort. We developed a score based on the prognostic value of predictor variables in multivariable analysis of the derivation cohort and expert judgements regarding clinical usability. Candidate predictors were combined in a multivariable regression with Least Absolute Shrinkage and Selection Operator (LASSO) using ten sample cross validation to select the model. The LASSO begins with a full model of candidate predictors and simultaneously performs predictor selection and penalisation during model development to avoid overfitting. The LASSO was performed twice: once where the number of predictors were unrestricted, and a second time when the LASSO was restricted to pick ten predictors. Fractional polynomials were used to model non-linear relationships for continuous variables.
We excluded cases from all analyses if age or outcome data were missing. We undertook three multivariable analyses, using different approaches to missing predictor variable data in the derivation cohort: (1) Complete case; (2) Multiple imputation using chained equations; (3) Deterministic imputation with missing predictor data assumed to be normal, where applicable. We did not consider any predictor with more than 50% missing data across the cohort for inclusion in the predictive model.
Clinical members of the research team reviewed the models and selected variables for inclusion in the triage tool, based on their prognostic value in the model, the clinical credibility of their association with adverse outcome, and their availability in routine clinical care. We categorised continuous variables, using recognised categories from existing scores where appropriate, while checking that categorisation reflected the relationship between the variable and adverse outcome in the derivation data. We then assigned integer values to each category of predictor variable, taking into account the points allocated to the category in existing scores, and the coefficient derived from a multivariable logistic regression model using categorised continuous predictors. This generated a composite clinical score in which risk of adverse outcome increased with the total score.
We applied the clinical score to the validation cohort, calculating diagnostic parameters at each threshold of the score, constructing a receiver-operating characteristic (ROC) curve, calculating the area under the ROC curve (c-statistic) and calculating the proportion with an adverse outcome at each level of the score. We used deterministic imputation to handle missing data in the validation cohort, assuming missing predictor variable data were normal, but excluding cases with more than a pre-specified number of predictor variables missing. We also undertook a complete case sensitivity analysis.
The sample size was dependent on the size and severity of the pandemic, but based on a previous study in the 2009 H1N1 influenza pandemic we estimated we would need to collect data from 20,000 patients across 40-50 hospitals to identify 200 with an adverse outcome, giving sufficient power for model derivation. In the event, the adverse outcome rate in adults was much higher in the COVID-19 pandemic, giving us adequate power to undertake derivation and validation of triage tools to predict all three outcomes.
Patient and public involvement
The Sheffield Emergency Care Forum (SECF) is a public representative group interested in emergency care research. [13] Members of SECF advised on the development of the PRIEST study and two members joined the Study Steering Committee. Patients were not involved in the recruitment to and conduct of the study. We are unable to disseminate the findings to study participants directly.
Ethical approval
The North West - Haydock Research Ethics Committee gave a favourable opinion on the PAINTED study on 25 June 2012 (reference 12/NW/0303) and on the updated PRIEST study on 23rd March 2020. The Confidentiality Advisory Group of the Health Research Authority granted approval to collect data without patient consent in line with Section 251 of the National Health Service Act 2006.
Results
The PRIEST study recruited 22485 patients from 70 EDs across 53 sites between 26 March 2020 and 28 May 2020. We included 20889 in the analysis after excluding 39 who requested withdrawal of their data, 1530 children, 20 with missing outcome data, and seven with missing age. The derivation cohort included 11773 patients and the validation cohort 9118. Table 1 shows the characteristics of the derivation and validation cohorts.
Table 2 shows summary statistics for each predictor variable in those with and without adverse outcome in the derivation sample, and univariate odds ratios for prediction of adverse outcome. Physiological variables were categorised to reflect their expected relationships with adverse outcome.
Supplementary Tables 1 to 3 show the results of multivariable analysis using complete case analysis, multiple imputation and deterministic imputation. Unrestricted LASSO on multiply imputed data included more predictors, with a higher c-statistic for the model (0.85, 95% CI 0.84 to 0.86), than the LASSO on deterministically imputed data or complete cases (c-statistics both 0.83, 95% CI 0.82 to 0.84). When restricted, there were nine predictors that were retained by LASSO in all three analyses (age, sex, respiratory rate, systolic BP, oxygen saturation/inspired oxygen ratio, history of renal impairment, performance status, consciousness and respiratory distress). C-statistics for the restricted models using deterministic imputation and complete case analysis (0.82, 95% CI 0.81 to 0.83) were slightly lower than c-statistics for the respective unrestricted models.
We developed a score through the following steps:
Clinical review judged that the nine predictors are clinically credible; that age, sex, respiratory rate, systolic BP, consciousness, oxygen saturation and inspired oxygen are routinely recorded in administrative systems and early warning scores (although the ratio of oxygen saturation to inspired oxygen is not routinely recorded); and that many EDs routinely record a measure of performance status for suspected COVID-19 cases that could be mapped onto our scale.
We decided to include temperature and heart rate, as these are routinely recorded alongside other physiological variables in early warning scores, and added prognostic value in the full models.
We created categories for age based on the observed multivariate association between age and outcome in our data, and categories for respiratory rate, heart rate, oxygen saturation, inspired oxygen, systolic BP, consciousness and temperature based on those used in the NEWS2 early warning score.
We created a multivariable logistic regression model using categorised predictor variables (Supplementary Table 4) and compared the coefficients for each category of predictor variable in the NEWS2 score to the points allocated in the NEWS2 score. We judged that the inconsistencies between the coefficients and the points used in NEWS2 were insufficient to justify allocating alternative points in our score. We allocated points to categories of age, sex, performance status, renal history, and respiratory distress, based on the coefficients in the model.
We removed renal history and respiratory distress from the multivariable model (Supplementary Table 5), noted that this made no meaningful difference to the c-statistic (0.82 in both models) and, given concerns about subjectivity and lack of routine recording, decided not to include them in the score.
The developed score is shown in Figure 1. We applied the score to the validation cohort. Figure 2 shows the ROC curve, with a c-statistic of 0.80 (95% CI 0.79 to 0.81) for the score. Sensitivity analysis using only complete cases gave a c-statistic of 0.79 (95% CI 0.77 to 0.80). Supplementary Figures 1 and 2 show the calibration plots for the unrestricted and restricted LASSO models applied to the validation cohort. The c-statistics (0.82 and 0.81 respectively, compared with 0.80 for the score) indicate the effect of restricting the number of variables and then developing a score had upon discrimination. Figure 3 shows the probability of adverse outcome for each value of the score. Table 3 shows the sensitivity and specificity for predicting outcome at each threshold of the triage tool.
Supplementary Figures 3 and 4 show the ROC curves, and Supplementary Tables 6 and 7 show the predictive performance of the score when applied to the secondary outcomes of organ support and death without organ support in the validation cohort. The score provided better prognostic discrimination for death without organ support (c-statistic 0.83, 95% CI 0.82 to 0.84) than for organ support (0.68, 95% CI 0.67 to 0.69).
Discussion
We have developed a clinical illness severity score for acutely ill patients with suspected COVID-19 that combines the NEWS2 score, age, sex, and performance status to predict the risk of death or receipt of organ support. The score ranges from zero to 29 points, with a score greater than four predicting adverse outcome with high sensitivity. In developing the score, we tried to optimise usability without compromising performance. Usability was optimised by basing the score on the existing NEWS2 score and only adding easily available information. The c-statistic of the score on the validation cohort was 0.80, compared with 0.82 and 0.81 when the unrestricted and restricted models were applied to the validation cohort, suggesting that simplifying the tool did not excessively compromise prediction.
We previously analysed the performance of triage tools that have been recommended in guidelines for the initial assessment of acutely ill people with sepsis, and showed that CURB-65 (c-statistic 0.75), PMEWS (0.77) and NEWS2 (0.77) offer good prediction for adverse outcome. Our new triage tool offers improved prediction with the addition of three variables.
Previous research
A living systematic review [14] has identified 50 prognostic models for adverse outcome in people with diagnosed COVID-19. C-statistics ranged from 0.68 to 0.99, and the most frequently used predictor variables were age, sex, comorbidities, temperature, lymphocyte count, C reactive protein, creatinine, and imaging features. Recently the ISARIC WHO Clinical Characterisation Protocol developed and validated the 4C Mortality Score [15] that predicts the mortality risk for people admitted with COVID-19 with better discriminant performance than 15 pre-existing risk stratification scores (c-statistic 0.77 versus 0.61-0.76).
Mortality prediction scores have an important role predicting mortality in hospital admissions but have limitations as triage tools. The inclusion of laboratory data as predictor variables usually requires hospital attendance, prolongs ED stay and prevents the rapid assessment required in ED or prehospital settings. Furthermore triage tools need to predict need for life-saving intervention rather than just mortality, and ideally need to be developed and evaluated in a relevant cohort, i.e. those with suspected COVID-19, including those not admitted to hospital after assessment.
Rapid clinical scores have been proposed or evaluated in several studies. Liao et al [16] proposed adding age>65 years to the NEWS2 score to aid decision-making, based on early experience of the pandemic in China. Myrstad et al [17] reported a c-statistic of 0.822 (95% CI 0.690 to 0.953) for NEWS2 predicting death or severe disease in a small study (N=66) of people hospitalised with confirmed COVID-19. Hu et al [18] reported c-statistics of 0.833 (0.737 to 0.928) for the Rapid Emergency Medicine Score (REMS) and 0.677 (0.541 to 0.813) for the Modified Emergency Medicine Score (MEWS) for predicting mortality in critically ill patients with COVID-19. Haimovich et al [19] developed the quick COVID-19 severity index, consisting of respiratory rate, oxygen saturation, and oxygen flow rate, which predicted respiratory failure within 24 hours in adults admitted with COVID-19 requiring supplemental oxygen with a c-statistic of 0.81 (0.73 to 0.89).
Strengths and limitations
We collected data from a clinically relevant population of patients presenting with suspected COVID-19 across a varied range of EDs. The large sample size and high rate of adverse outcome provided good statistical power to support analysis of many predictor variables in multivariable analysis and allowed us to estimate parameters with a high degree of precision. An important limitation is that retrospective data collection resulted in some missing and may have resulted in some inaccuracy of predictor variable recording. Recording of inspired oxygen concentration was subject to a particularly high rate of missing data. We anticipated this problem and pre-specified analyses involving multiple imputation, deterministic imputation, and complete case analysis to explore the impact of missing data. There was reasonable concordance between the models. Another potential limitation is that we may have missed adverse outcomes if patients attended a different hospital after initial hospital discharge. This is arguably less likely in the context of a pandemic, in which movements between regions were curtailed, but cannot be discounted. The 5-point scale we used for determining performance status has not been widely used or evaluated, although the 9-point clinical frailty index maps onto it reasonably well. Finally, although our triage tool can be used in the prehospital or community setting, we recommend caution in extrapolating our findings to settings where there is likely to be a lower prevalence of adverse outcome.
Implications for practice
Our clinical score could be used to support ED decision-making around hospital admission and inpatient referral. A score greater than four could provide an appropriate balance of sensitivity and specificity to support hospital admission decisions, while a higher threshold could be used to select patients for critical care referral. However, triage tools should support and not replace clinical decision-making, and patient preferences and values must be considered. Our triage tool could also be used to support prehospital and community decision-making around decisions to refer for hospital assessment. However, further validation is required to determine the performance of the tool in these settings.
In summary, we have developed a clinical score that can provide a rapid and accurate assessment of the risk of adverse outcome in adults who are acutely ill with suspected COVID-19.
Data Availability
Anonymised data are available from the corresponding author upon reasonable request
Competing interests
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: grant funding to their employing institutions from the National Institute for Health Research; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Contributor and guarantor information
SG, AB, KC, CF, TH, FL, ALe, IM and DW conceived and designed the study. BT, KB, ALo, SW, RS, JS, SC, ES, JH and EY acquired the data. LS, MBu, MBr, EL, SG and BT analysed the data. SG, LS, MBu, MBr, ES, AB, KC, CF, TH, FL, ALe, IM, DW, BT, KB and CM interpreted the data. All authors contributed to drafting the manuscript. Steve Goodacre is the guarantor of the paper. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Acknowledgements
We thank Katie Ridsdale for clerical assistance with the study, Erica Wallis (Sponsor representative, all members of the Study Steering Committee (Appendix 3) and the site research teams who delivered the data for the study (Appendix 4), and the research team at the University of Sheffield past and present (Appendix 5).
Data sharing
Anonymised data are available from the corresponding author upon reasonable request (contact details on first page).
Role of the funding source
The PRIEST study was funded by the United Kingdom National Institute for Health Research Health Technology Assessment (HTA) programme (project reference 11/46/07). The funder played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; and in the decision to submit the article for publication. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.
Footnotes
s.goodacre{at}sheffield.ac.uk, b.d.thomas{at}sheffield.ac.uk, l.j.sutton{at}sheffield.ac.uk, m.bursnall{at}sheffield.ac.uk, e.lee{at}sheffield.ac.uk, m.bradburn{at}sheffield.ac.uk, a.loban{at}sheffield.ac.uk, r.p.simmonds{at}sheffield.ac.uk, c.e.biggs{at}sheffield.ac.uk, c.marincowitz{at}sheffield.ac.uk, j.schutter{at}sheffield.ac.uk, s.l.connelly{at}sheffield.ac.uk, e.m.sheldon{at}sheffield.ac.uk, jamie.hall{at}sheffield.ac.uk, e.l.young{at}sheffield.ac.uk, andrew.bentley{at}manchester.ac.uk, kirsty.challen{at}lthtr.nhs.uk, c.fitzsimmons{at}nhs.net, Tim.Harris{at}bartshealth.nhs.uk, f.e.lecky{at}sheffield.ac.uk, andrew.lee{at}sheffield.ac.uk, I.maconochie{at}ic.ac.uk, darren.walter{at}manchester.ac.uk
(1) We have corrected a typo in the abstract. The specificity for scores above 4 points was reported as 0.38 and has been corrected to 0.34, consistent with the main results in table 3. (2) We identified 61 cases that had an adverse outcome (death) after 30 days and had been misclassified as having an adverse outcome when the protocol stated that only events within 30 days would be classified as adverse outcomes in the analysis. We have corrected the results. This produced minor changes to the numbers reported in the tables, and consciousness being consistently included in the restricted models, but there was no change to the score or its accuracy.