ABSTRACT
Objectives To (a) derive and validate risk prediction algorithms (QCovid4) to estimate risk of COVID-19 mortality and hospitalisation in UK adults with a SARS-CoV-2 positive test during the ‘Omicron’ pandemic wave in England and (b) evaluate performance with earlier versions of algorithms developed in previous pandemic waves and the high-risk cohort identified by NHS Digital in England.
Design Population-based cohort study using the QResearch database linked to national data on COVID-19 vaccination, high risk patients prioritised for COVID-19 therapeutics, SARS-CoV-2 results, hospitalisation, cancer registry, systemic anticancer treatment, radiotherapy and the national death registry.
Settings and study period 1.3 million adults in the derivation cohort and 0.15 million adults in the validation cohort aged 18-100 years with a SARS-CoV-2 positive test between 11th December 2021 and 31st March 2022 with follow up to 30th June 2022.
Main outcome measures Our primary outcome was COVID-19 death. The secondary outcome of interest was COVID-19 hospital admission. Models fitted in the derivation cohort to derive risk equations using a range of predictor variables. Performance evaluated in a separate validation cohort.
Results Of 1,297,984 people with a SARS-CoV-2 positive test in the derivation cohort, 18,756 (1.45%) had a COVID-19 related hospital admission and 3,878 (0.3%) had a COVID-19 death during follow-up. Of the 145,404 people in the validation cohort, there were 2,124 (1.46%) COVID-19 admissions and 461 (0.3%) COVID-19 deaths.
The COVID-19 mortality rate in men increased with age and deprivation. In the QCovid4 model in men hazard ratios were highest for those with the following conditions (for 95% CI see Figure 1): kidney transplant (6.1-fold increase); Down’s syndrome (4.9-fold); radiotherapy (3.1-fold); type 1 diabetes (3.4-fold); chemotherapy grade A (3.8-fold), grade B (5.8-fold); grade C (10.9-fold); solid organ transplant ever (2.4-fold); dementia (1.62-fold); Parkinson’s disease (2.2-fold); liver cirrhosis (2.5-fold). Other conditions associated with increased COVID-19 mortality included learning disability, chronic kidney disease (stages 4 and 5), blood cancer, respiratory cancer, immunosuppressants, oral steroids, COPD, coronary heart disease, stroke, atrial fibrillation, heart failure, thromboembolism, rheumatoid/SLE, schizophrenia/bipolar disease sickle cell/HIV/SCID; type 2 diabetes. Results were similar in the model in women.
COVID-19 mortality risk was lower among those who had received COVID-19 vaccination compared with unvaccinated individuals with evidence of a dose response relationship. The reduced mortality rates associated with prior SARS-CoV-2 infection were similar in men (adjusted hazard ratio (HR) 0.51 (95% CI 0.40, 0.64)) and women (adjusted HR 0.55 (95%CI 0.45, 0.67)).
The QCOVID4 algorithm explained 76.6% (95%CI 74.4 to 78.8) of the variation in time to COVID-19 death (R2) in women. The D statistic was 3.70 (95%CI 3.48 to 3.93) and the Harrell’s C statistic was 0.965 (95%CI 0.951 to 0.978). The corresponding results for COVID-19 death in men were similar with R2 76.0% (95% 73.9 to 78.2); D statistic 3.65 (95%CI 3.43 to 3.86) and C statistic of 0.970 (95%CI 0.962 to 0.979). QCOVID4 discrimination for mortality was slightly higher than that for QCOVID1 and QCOVID2, but calibration was much improved.
Conclusion The QCovid4 risk algorithm modelled from data during the UK’s Omicron wave now includes vaccination dose and prior SARS-CoV-2 infection and predicts COVID-19 mortality among people with a positive test. It has excellent performance and could be used for targeting COVID-19 vaccination and therapeutics. Although large disparities in risks of severe COVID-19 outcomes among ethnic minority groups were observed during the early waves of the pandemic, these are much reduced now with no increased risk of mortality by ethnic group.
What is known
The QCOVID risk assessment algorithm for predicting risk of COVID-19 death or hospital admission based on individual characteristics has been used in England to identify people at high risk of severe COVID-19 outcomes, adding an additional 1.5 million people to the national shielded patient list in England and in the UK for prioritising people for COVID-19 vaccination.
There are ethnic disparities in severe COVID-19 outcomes which were most marked in the first pandemic wave in 2020.
COVID-19 vaccinations and therapeutics (monoclonal antibodies and antivirals) are available but need to be targeted to those at highest risk of severe outcomes.
What this study adds
The QCOVID4 risk algorithm using data from the Omicron wave now includes number of vaccination doses and prior SARS-CoV-2 infection. It has excellent performance both for ranking individuals (discrimination) and predicting levels of absolute risk (calibration) and can be used for targeting COVID-19 vaccination and therapeutics as well as individualised risk assessment.
QCOVID4 more accurately identifies individuals at highest levels of absolute risk for targeted interventions than the ‘conditions-based’ approach adopted by NHS Digital based on relative risk of a list of medical conditions.
Although large disparities in risks of severe COVID-19 outcomes among ethnic minority groups were observed during the early waves of the pandemic, these are much reduced now with no increased risk of mortality by ethnic group.
INTRODUCTION
During the first waves of the COVID-19 pandemic, before the introduction of vaccines, it was essential to be able to identify people at highest risk of severe COVID-19 outcomes if they were infected with SARS-CoV-2 virus. The QCOVID risk assessment tool for predicting risk of COVID-19 death or hospital admission based on individual characteristics was developed,1 independently externally validated in England,2 Wales3 and Scotland,4 and found to have excellent performance for identifying those at high risk of severe outcomes from COVID-19. QCOVID was used in England in February 2021 to identify patients at high risk of severe COVID-19 outcomes, adding an additional 1.5 million people to the national shielded patient list. It was also used for prioritising people for vaccination across the UK (if they had not already been offered the vaccine on account of their age or other risk classification)5. The QCOVID model was updated following the second and third waves of the pandemic to create two new versions of the model, QCOVID2 based on unvaccinated patients6 and QCOVID3 based on partially vaccinated patients.6 These models accounted for changes that had occurred both in the virus as well as the deployment of the vaccination programme.6
In December 2021, the UK experienced a new wave of COVID-19 infections with the Omicron variant rapidly replacing high circulating levels of the previous Delta variant. Whilst the Omicron variant (BA1) was associated with a lower risk of COVID-19 death than the Delta variant,7 further mutations have occurred and there are concerns that COVID-19 vaccines may become less effective. Additional therapeutic agents are likely to be needed to protect vulnerable individuals such as antivirals and neutralising monoclonal antibodies (nMABS8). On 9th December 2021, nMABs became available in the UK for high-risk non-hospitalised patients with a symptomatic SARS-CoV-2 infection.9 Since nMABs are limited resources, they have been targeted to those at highest risk of poor COVID-19 outcomes who are most likely to benefit.9,10 This was done based on a set of clinical conditions associated with a high relative risk of severe outcomes from the published literature6,11 combined with clinical judgement regarding likelihood of clinical benefit based on the biological mechanism for nMABS 9. Patients with these conditions were then identified from centrally held electronic health records and contacted by NHS Digital in December 2021 to inform them of their potential eligibility for nMABS should they develop a symptomatic SARS-CoV-2 infection. The guidance did not however, account for the cumulative absolute risk associated with multiple co-morbidities, age, prior infection, vaccination status or the new variants.
The aims of this study, commissioned by the UK’s Department of Health and Social Care, were to develop and validate a new QCOVID risk algorithm (QCOVID 4) based on new data from the Omicron pandemic wave in England, accounting for prior SARS-CoV-2 infection and number of COVID-19 vaccination doses. We also evaluated the performance of the QCOVID4 algorithm with earlier versions of the risk model developed in the first two pandemic waves and the ‘high risk’ cohort identified by NHS Digital on the basis of relative risk of a list of conditions. The results can then be used to inform ongoing strategies for targeting therapeutics and other public health interventions, designed to protect those most at risk from COVID-19 death and hospitalisation.
METHODS
Data sources
We used the QResearch database (version 47) of 12 million current patients with demographic, clinical and medication data which is used for epidemiological12,1 and drug safety research.13,14 QResearch is linked to multiple datasets at individual patient level. For this analysis, we used the following linked datasets
National Immunisation (NIMS) Database of COVID-19 vaccinations to identify data on vaccine dates and doses for all people vaccinated in England
Hospital Episode Statistics (HES) dataset supplemented by the more regularly updated Secondary Users Service data (SUS-PLUS).
Civil registration national data for mortality with date and up to 15 causes of death
SARS-CoV-2 infection data, Second Generation Surveillance System (SGSS) and Pillar 2)
Systemic anticancer treatment (SACT) data
NHS Digital ‘high risk’ cohort prioritised for novel COVID-19 therapeutics in December 2021.
Study design and period for cohort
We undertook a cohort study of all individuals aged 18-100 years who had one or more positive SARS-CoV-2 tests from 11 December 2021 (the date of the first notified Omicron case) to 31 March 2022 (the date after which widespread free NHS SARS-CoV-2 tests became unavailable). Individuals were followed from the date of their first SARS-CoV-2 test in the study period, until they had the outcome of interest, died or the end of the study period on 30 June 2022 (the latest date for which mortality and hospital admissions were available).
Outcomes for cohort
The primary outcome was time to COVID-19 death (either in hospital or out of hospital) as recorded in any position on the death certificate or death within 28 days of a SARS-CoV-2 positive test. The secondary outcome was time to hospital admission with COVID-19, defined as either confirmed or suspected COVID-19 on International Classification of Diseases (ICD)-10 code (U071, U072). We used these definitions for the outcomes for consistency with other QCovid algorithms and because these are the ones used for COVID-19 death and hospital admission in the UK.15
Predictor variables
Candidate predictor variables were those previously identified as associated with increased risk of COVID-19 death or hospitalisation from the original QCOVID protocol16 and the published literature1,6,12,17. The variables were: age, sex, ethnicity, Townsend material deprivation (an area level score based on postcode where higher scores indicate higher levels of deprivation 18), number of vaccine doses (none, 1, 2, 3, 4 or more), body mass index (BMI) 17, domicile (care home, homeless, neither); chronic kidney disease (CKD); chemotherapy in previous 12 months; type 1 or type 2 diabetes (with glycosylated haemoglobin, HbA1C <59 or ≥59 mmol/mol); blood cancer; bone marrow transplant in last six months; respiratory cancer; radiotherapy in last six months; solid organ transplant; chronic obstructive pulmonary disease (COPD); asthma; rare lung diseases (cystic fibrosis, bronchiectasis or alveolitis); pulmonary hypertension or pulmonary fibrosis); coronary heart disease; stroke; atrial fibrillation; heart failure; venous thromboembolism; peripheral vascular disease; congenital heart disease; dementia; Parkinson’s disease; epilepsy; Down’s syndrome; rare neurological conditions (motor neurone disease, multiple sclerosis, myasthenia or Huntington’s Chorea); cerebral palsy; osteoporotic fracture; rheumatoid arthritis or systemic lupus erythematosus (SLE); liver cirrhosis; bipolar disorder or schizophrenia; inflammatory bowel disease; sickle cell disease; Human Immunodeficiency Virus (HIV) or Acquired Immunodeficiency Syndrome (AIDS); and Severe Combined Immunodeficiency (SCID).
We defined predictors using information recorded in primary care electronic health records at the start of follow-up (date of first positive SARS-CoV-2 test in the study period), except for data for the number of SARS-CoV-2 infections, COVID-19 vaccinations, chemotherapy, radiotherapy and transplants, which were based on linked secondary care data. For all predictor variables, we used the most recently available value at the cohort entry date.
Model development
As in previous studies, we used 90% of practices to develop the models and the remaining 10% of practices for model validation.6 We developed separate risk models in men and women using Cox proportional hazard models to calculate hazard ratios (HRs) for the two outcomes. We used second degree fractional polynomials to model non-linear relationships for continuous variables including age, BMI and Townsend material deprivation score.18 We used multiple imputation with chained equations to impute missing values for ethnicity, Townsend score, BMI and HBA1C. We carried out five imputations and fitted the prediction models in each imputed dataset. We used Rubin’s rules19 to combine the model parameter estimates across the imputed datasets.
We retained variables in the final models that were significant at the 5% level and where adjusted hazard ratios were > 1.1. We combined clinically similar variables with very low numbers of events. We examined interactions between predictor variables and age. We estimated the baseline survivor function based on zero values of centred continuous variables, with all binary predictor values set to zero. We used the regression coefficients for each variable from the final model as weights which we combined with the baseline survivor function evaluated at 30, 60 and 90 days of follow-up.20
Model evaluation
We evaluated model performance in the validation cohort. We used multiple imputation to replace missing values for ethnicity, HBA1C, BMI and Townsend score using the same imputation model as in the derivation cohort. We applied the final risk equations to calculate the risk scores for each outcome. We calculated Harrell’s C statistics 21, R2 values and D statistics22. We assessed model calibration in the validation cohort by comparing mean predicted risks at 90 days with the observed risks by twentieths of predicted risk23.
We calculated each performance metric in the whole validation cohort and in subgroups for age and ethnic group (where numbers allowed). We compared model discrimination with risk scores calculated using earlier versions of QCOVID developed on unvaccinated populations:
during the first pandemic wave (QCOVID1, developed on the total unvaccinated population between 24 January 2020 and 30 April 2020) and
during the second pandemic wave (QCOVID2, developed on the unvaccinated patients with a SARS-CoV-2 positive test between 8 December 2020 to 21 June 2021 during which the Alpha and Delta variants were dominant).
We decided not to include QCOVID3 as that was developed on a partially vaccinated population during the rapid roll-out of the vaccination program. We also determined whether there were identifiable high-risk groups based on their QCOVID4 predicted risks who had an equivalent or higher observed risk of COVID-19 death than the current ‘high risk cohort’ by comparing groups of the same size (high risk vs QCOVID4 threshold).
Risk stratification
We applied the QCOVID4 algorithms to the validation cohort to define the centile thresholds based on absolute predicted risk. We calculated sensitivity as the total number of patients with a risk score above the risk threshold with a COVID-19 death out of the total number of COVID-19 deaths.
We also compared risk stratification using (a) QCOVID4 to identify the top 2.5% of patients at highest absolute risk in the validation cohort with (b) the current recommended guidelines which have selected a high-risk cohort based on relative risk of patients with selected medical conditions.
Reporting
We adhered to the RECORD24 and TRIPOD statements for reporting 25. We used all the available data on the database to maximise the power and generalisability of the results. We used STATA (version 17) for analyses.
Patient and public involvement
Patients were involved in framing research question, identifying predictors and in developing plans for design and implementation of the QCOVID risk tool. A citizen’s jury convened by the Scottish government, evaluated earlier versions of the QCovid algorithm and highlighted the importance of keeping it up-to-date and maintaining transparency over its use 26. Patients will be invited to advise on disseminating the results including the development of infographics and its translation into different languages.
RESULTS
Baseline characteristics of study cohorts
Overall, there were 1,430 practices in the QResearch database (version 47). We allocated 1,287 practices to the derivation cohort and 143 to the validation cohort. Of the 9,526,580 patients aged 18-100 years in the derivation cohort, 1,297,922 (13.6%) had a SARS-CoV-2 positive test in the study period. Of these, 18,756 (1.5%) had a COVID-19 hospital admission and 3878 (0.3%) had a COVID-19 death during follow-up.
Of the 1,064,255 patients in the validation cohort, 145,397 (13.7%) had a SARS-CoV-2 positive test and were included in the analysis. Of these, 2,124 (1.5%) had a COVID-19 hospital admission and 461 (0.3%) had a COVID-19 death.
Table 1 shows the baseline characteristics of those who tested positive, those with a COVID-19 death and those with a COVID-19 admission in the derivation cohort. Supplementary table 1 shows the corresponding results for the validation cohort. The mean age in the derivation cohort for those with a SARS-CoV-2 positive test was 42.4 years (SD 16.4), COVID-19 admission 55.6 years (SD 22.1), COVID-19 death, 80.9 years (SD 12.3)
Factors associated with increased/decreased risk of severe COVID-19 outcomes
Figure 1 shows the adjusted HRs for variables in the final QCOVID4 model for COVID-19 death in men. Figure 2 show the corresponding results for women. Figures 3 and 4 show the adjusted HR for variables in the final model for COVID-19 hospital admission in men and women. The adjusted HRs for fractional polynomial terms for each of the models for age and BMI can be found in supplementary figure 1.
The COVID-19 mortality rate in men increased very steeply with age and there was also an association with deprivation. In the final model in men adjusted HRs were highest for those with the following conditions (for 95% CI see Figure 1): kidney transplant (6.1-fold); Down’s syndrome (4.9-fold increase); radiotherapy (3.1-fold); type 1 diabetes (3.4-fold); chemotherapy grade A (3.8-fold), grade B (5.8-fold); grade C (10.9-fold); solid organ transplant ever (2.4-fold); dementia (1.6-fold); Parkinson’s disease (2.2-fold); liver cirrhosis (2.5-fold). Other conditions associated with increased COVID-19 mortality included learning disability, chronic kidney disease (stages 4 and 5), blood cancer, respiratory cancer, immunosuppressants, oral steroids, COPD, coronary heart disease, stroke, atrial fibrillation, heart failure, thromboembolism, rheumatoid/SLE, schizophrenia/bipolar disease sickle cell/HIV/SCID; type 2 diabetes. Unlike QCOVID2, there was no association by residential status or with asthma, rare pulmonary conditions, cerebral palsy or congenital heart disease. There was no difference in risk according to HBAC1C levels, so we included type 1 and type 2 diabetes as binary rather than categorical variables. These results were generally similar in women (Figure 2).
The increased risks of both COVID-19 death and admission in ethnic minority groups observed in previous pandemic waves were either not observed or were much less marked in QCOVID4 compared with earlier analyses. There were no significantly increased risks of COVID-19 death by ethnic group in men or women compared with the white group. There was an increased risk of COVID-19 admission among Bangladeshi, Pakistani and Other Asian men and women and an increased risk of admission for Black African women. There were no other increased risks for Indian, Other Asian, Black ethnicities or Chinese for admission.
The COVID-19 mortality rate was lower among those who had received COVID-19 vaccination compared with unvaccinated individuals with evidence of a dose response relationship as summarised in Supplementary table 2. For example, compared with unvaccinated men, there was a 42% risk reduction associated with one vaccination dose (adjusted HR 0.58 (95% CI 0.43, 0.79)) and a 92% reduction in risk for men with 4 or more doses (adjusted HR 0.08 (95% CI 0.06, 0.12)). The reduced COVID-19 mortality risks associated with COVID-19 vaccination doses were similar in women. COVID-19 admission risk was also reduced in vaccinated men and women.
Prior SARS-CoV-2 infection was associated with a 49% reduced risk of COVID-19 death in men (adjusted HR 0.51 (95% CI 0.40, 0.64)) and a 45% reduced risk in women (adjusted HR 0.55 (95% 0.45 to 0.67)), independent of age, ethnicity, vaccination status and other factors included in the final QCOVID4 models. The 39% reduction in admission risk for men (adjusted HR 0.61 (95% 0.56, 0.68)) was similar to the 33% reduction in admission risk for women (adjusted HR 0.67 (95% 0.63, 0.72)).
Discrimination
Table 2 shows the explained variation and discrimination of QCOVID1, QCOVID2 and QCOVID4 models in the validation cohort for women and men overall for COVID-19 death and hospital admission. The QCOVID4 algorithm explained 76.6% (95%CI 74.4 to 78.8) of the variation (R2) in time to COVID-19 death in women. The D statistic was 3.70 (95%CI 3.48 to 3.93) and the Harrell’s C statistic was 0.965 (95%CI 0.951 to 0.978). The corresponding results for COVID-19 death in men were similar with R of 76.0% (95% 73.9 to 78.2); D statistic 3.65 (95%CI 3.43 to 3.86) and a C statistic of 0.970 (95%CI 0.962 to 0.979).
The performance of QCOVID4 for COVID-19 mortality was slightly improved compared with both QCOVID1 and QCOVID2. For COVID-19 hospital admissions, however, QCOVID4 had significantly improved performance compared with QCOVID2, which in turn had improved performance compared with QCOVID1. For example, Harrell’s C statistic in men for QCOVID4 was 0.970 (95%CI 0.963 to 0.977) compared with 0.932 (95%CI 0.918 to 0.945) for QCOVID2 and 0.798 (95%CI 0.781 to 0.814), for QCOVID1.
Supplementary table 3 shows performance of QCOVID4 in subgroups by age and ethnicity. Performance measures were generally higher in the younger age groups and similar across ethnic groups (where there were sufficient numbers in the subgroup to enable an analysis).
Calibration
Figures 5 and 6 show the mean predicted risks and the observed risks for COVID-19 mortality and admission using QCOVID4 to assess calibration in the validation cohort. There was close correspondence between the mean predicted risks and the observed risks within each model twentieth in women and men indicating the algorithms are well calibrated.
Supplementary figures 2 and 3 show the corresponding results for QCOVID2 which has a degree of miscalibration indicating over prediction.
Thresholds
Table 3 shows the classification statistics in the validation cohort for men and women by twentieths of predicted mortality risk using QCOVID4. For example, for the 20% of the cohort at highest predicted risk (i.e. those with a 90-day predicted risk score of 0.075% or higher), the sensitivity was 97.8%, specificity was 80.2% and the observed risk was 1.54%. The corresponding figures for the top 5% at highest predicted risk were a sensitivity of 87.6%, specificity of 95.3% and observed 90-day risk of 5.52%.
We identified 34,864 patients in the NHS Digital high-risk cohort in the QResearch cohorts of whom 3,600 were in the validation cohort. Supplementary Table 4 shows the characteristics of these patients compared with the characteristics of 3,600 patients (top 2.48%) with the highest predicted risks of COVID-19 mortality using the QCOVID4 models. Patients in the QCOVID4 high risk group tended to be much older (mean 85.0 years compared with 55.4 years) with higher levels of co-morbidities, with some exceptions (e.g. CKD5, blood cancer, grade B chemotherapy, rare neurological conditions). There were 520 patients included in both high-risk groups. Of the 461 COVID19 deaths which occurred in the validation cohort, 333 (72.2%) occurred in the QCOVID4 high risk group and 95 (20.6%) in the NHS Digital high-risk group. Uptake of the COVID-19 therapeutics (both antivirals and nMABS) was very low with 504 (14.0%) of the NHS digital high-risk group receiving treatment and 131 (3.6%) of the QCovid4 high risk cohort.
DISCUSSION
Principal findings
We have developed and validated a new QCOVID model (QCOVID4) using data recorded during the Omicron wave in England. QCOVID4 more accurately identifies individuals at highest levels of absolute risk for targeted interventions than the ‘conditions-based’ approach adopted by NHS Digital based on relative risk of a list of medical conditions.
We also compared performance with earlier versions of the QCOVID algorithms on this dataset. The earlier QCOVID models were developed in the first wave of the original variant (QCOVID1) and second waves of Alpha and Delta variants (QCOVID2). Overall, the factors associated with increased risk in earlier models 1,6, were still associated with increased risk in the QCOVID4 model. An exception was ethnic minority groups where the previously elevated risks, particularly associated with South Asian and Black ethnicities for COVID-19 death in QCOVID11 and QCOVID2 6, were no longer apparent in QCOVID4. There was however, a residual increased risk for Pakistani and Bangladeshi groups for COVID-19 admission compared with the White group for both men and women, and an increased admission risk for Black African and other Asian women after adjustment for age, ethnicity, deprivation, co-morbidity and vaccination status.
We have demonstrated that infection with SARS-CoV-2 prior to the study period was associated with approximately 50% lower risk of COVID-19 mortality in both men and women. This was independent of age, ethnicity, deprivation, co-morbidity and vaccination status. Similarly, there was a dose-dependent reduction in mortality risk in men and women following COVID-19 vaccination with each subsequent dose conferring additional benefits.
The validation shows that all three models (QCOVID4, QCOVID1 and QCOVID2) have high levels of discrimination for COVID-19 mortality and explained variation in this dataset. The QCOVID4 model has substantially improved discrimination and explained variation for predicting risk of COVID-19 hospital admission. Of those identified by NHS Digital as a high risk group for targeted therapeutics (2.5% of total), only 14% were also identified in an equivalently sized high-risk group using QCOVID4. Nearly three quarters of the total COVID-19 deaths in the validation cohort occurred in the QCOVID4 high risk group compared with one fifth in the NHS Digital cohort. This difference was not explained by the use of therapeutic interventions which was low in both groups.
The validation results also demonstrate that the QCOVID4 model is well-calibrated to the current contemporaneous validation dataset. QCOVID1 was developed in a general population to estimate the combined risk of ‘catching and dying’ from COVID-19 due to lack of testing data available in the first pandemic and so estimates of absolute risk would not be valid for prediction of COVID-19 death in people with a SARS-CoV-2 positive test. There was over-prediction associated with QCOVID2 which is likely to mainly reflect higher levels of vaccination, better treatments and differences in the variant type with Omicron now generally considered to be less severe than earlier variants 7,27. Taken together, QCOVID1 and QCOVID2 have acceptable ongoing utility for ranking those at highest risk of death for interventions but are less robust for predicting the absolute of risk of each outcome than QCOVID4.
Strengths and limitations of this study
Our study has some major strengths, but some important limitations. These include specific issues related to COVID-19 along with others similar to those for a range of other widely used clinical risk prediction algorithms developed using the QResearch database 28,29,30. Key strengths include the use of very large representative population-based contemporaneous data sources which have been used to develop other widely used risk prediction tools 28,29, the wealth of candidate risk predictors, the prospective recording of outcomes and their ascertainment using multiple national-level database linkage, lack of selection, recall and respondent biases, and robust statistical analysis. We have used non-linear terms to model association with BMI and age and multiple imputation to handle missing data.
Limitations include relatively small numbers of events in some of the subgroups which is an inevitable consequence of undertaking an analysis involving multiple subgroups during a relatively short pandemic wave. Whilst we have accounted for many risk factors for severe COVID-19 outcomes, there may be risks conferred by some rare medical conditions or other factors associated with exposure such as occupation that are poorly recorded in general practice or hospital records and which may be being proxied to some extent by the covariates included. Also our study does not address outcomes related to the very recent Omicron BA.4/BA.5 wave in England which was identified as a variant of concern on 18th May 2022.
Whilst we have reported a validation using practices from QResearch, these practices were completely separate to those used to develop the model. Previously we have used this approach to develop and validate other widely used prediction models. When these models have been validated on data from different clinical computer systems, the results have been very similar 31-33. Work has now been completed to evaluate earlier QCOVID models in external datasets including the English national dataset hosted by the Office of National Statistics 2, Scotland4 and Wales3 data including data which has not been used to derive the algorithm and these evaluations also showed similar levels of performance to the validation in QResearch practices.
Implications for clinical practice, policy and research
The utility of a risk prediction model crucially depends on the purpose for which it has been designed and the setting in which it has been developed and that where it might be deployed. The speed at which new COVID-19 variants of concern have emerged and become dominant inevitably means that prediction models could become out of date almost as soon as they have been developed and implemented. This study, with its comparisons across the last three major pandemic waves, each associated with different variant types, provides evidence that the performance of QCOVID1 and QCOVID2 algorithms remains good and therefore are likely to be effective for risk stratifying or ranking individuals for interventions. Algorithms may need recalibration before being used to calculate absolute risks in settings or time periods with different mortality or admission rates.
Although large disparities among ethnic minority groups were observed during the early waves of the pandemic, there were no increased mortality risks for any ethnic group. We observed increased risks of admission for Bangladeshi, Pakistani and Other Asian men and women and for Black African women. There were no other increased risks for Indian, Other Asian, Black ethnicities or Chinese for admission. Reports from the second wave showed higher rates of COVID-19 mortality among Pakistani and Bangladeshi groups34 suggesting ethnic disparities may have been improved by widespread vaccination and other public health interventions to reduce risk of exposure or infection.
Conclusion
In conclusion, the QCOVID4 algorithm developed using data from the Omicron wave has excellent performance for identifying those at highest risk of severe COVID-19 outcomes. This can be used to risk stratify patients for intervention (such as COVID-19 therapeutics) and inform clinical decision making on individualised risk management with patients and this could be more effective than an approach based on relative risks of individual medical conditions.
Data Availability
To guarantee the confidentiality of personal and health information only the authors have had access to the data during the study in accordance with the relevant licence agreements. Access to the QResearch data is according to the information on the QResearch website (www.qresearch.org). The full model, model coefficients, functional form and cumulative incidence function, is published on the qcovid.org website.
CONTRIBUTORS
Study conceptualisation was led by JHC and CC. JHC specified the data, organised data approvals and data linkage. JHC and CC, designed the statistical analysis plan. JHC and CC undertook the analyses. JHC wrote the first draft of the paper. AS, KK and JVT contributed to development of study ideas, analysis plan and provided critical comments on the manuscript. The Office of the Chief Medical Officer contributed to the development of the study question and facilitated access to relevant national datasets, contributed to interpretation of data, drafting of the report. All authors contributed to the interpretation of the results and revision of the manuscript and approved the final version of the manuscript.
FUNDING
This study is funded by the National Institute for Health Research (NIHR) following a commission by Department of Health and Social Care. The researchers are independent from the NIHR. The QResearch is supported by funds from the John Fell Oxford University Press Research Fund, grants from Cancer Research UK (CR-UK) grant number C5255/A18085, through the Cancer Research UK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z), during the conduct of the study.
COMPETING INTERESTS
All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf and declare: JHC reports grants from National Institute for Health Research (NIHR) Biomedical Research Centre, Oxford, grants from John Fell Oxford University Press Research Fund, grants from Cancer Research UK (CR-UK) grant number C5255/A18085, through the Cancer Research UK Oxford Centre, grants from the Oxford Wellcome Institutional Strategic Support Fund (204826/Z/16/Z) and other research councils, during the conduct of the study. JHC is an unpaid director of QResearch, a not-for-profit organisation which is a partnership between the University of Oxford and EMIS Health who supply the QResearch database used for this work. JHC is a founder and shareholder of ClinRisk ltd and was its medical director until 31st May 2019. ClinRisk Ltd produces open and closed source software to implement clinical risk algorithms (outside this work) into clinical computer systems. JHC is chair of the NERVTAG risk stratification subgroup and a member of SAGE COVID-19 groups and the NHS group advising on prioritisation of use of monoclonal antibodies in COVID-19 infection. CC reports receiving personal fees from ClinRisk Ltd, outside this work and is a member of the NERVTAG risk stratification subgroup. KK is Chair of the Ethnic Subgroup of the UK Scientific Advisory Group for Emergencies (SAGE) and a member of SAGE. AS has received research grants from NIHR, MRC, HDRUK, CSO, NCS, GSK for COVID-19 work and has provided advice for AstraZeneca’s Thrombotic Thrombocytopenic Taskforce and the UK and Scottish Government COVID-19 Advisory Groups (roles unremunerated)
ETHICS APPROVAL
The QResearch® ethics approval was provided on 8th June 2020 by the East Midlands-Derby Research Ethics Committee [reference 18/EM/0400].
DATA SHARING
To guarantee the confidentiality of personal and health information only the authors have had access to the data during the study in accordance with the relevant licence agreements. Access to the QResearch data is according to the information on the QResearch website (www.qresearch.org). The full model, model coefficients, functional form and cumulative incidence function, is published on the qcovid.org website. https://www.qcovid.org/Home/Algorithm.
TRANSPARENCY
JHC had full access to all data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. JHC is the guarantor for the study and affirms that the manuscript is an honest, accurate, and transparent account of the study reported; that no important aspects of the study have been omitted and that any discrepancies from the study as planned have been explained.
COPYRIGHT STATEMENT
“I, the Submitting Author has the right to grant and does grant on behalf of all authors of the Work (as defined in the author licence), an exclusive licence on a worldwide, perpetual, irrevocable, royalty-free basis to BMJ Publishing Group Ltd (“BMJ”) its licensees. The Submitting Author accepts and understands that any supply made under these terms is made by BMJ to the Submitting Author unless I am acting as an employee on behalf of my employer which is paying any applicable article publishing charge (“APC”) for Open Access articles. Where the Submitting Author wishes to make the Work available on an Open Access basis (and intends to pay the relevant APC), the terms of reuse of such Open Access shall be governed by a Creative Commons licence – details of these licences and which licence will apply to this Work are set out in our licence referred to above.”
LEGENDS FOR FIGURES
Supplementary Figure 1a Adjusted hazard ratio (95% CI) by age for risk of COVID-19 mortality
Supplementary Figure 1b Adjusted hazard ratio (95% CI) by BMI for COVID-19 mortality
Supplementary Figure 1c Adjusted hazard ratio (95% CI) by age for risk of COVID-19 admission
Supplementary Figure 1d Adjusted hazard ratio (95% CI) by BMI for COVID-19 admission
Supplementary Figure 2 Calibration of the QCOVID2 risk model to predict COVID-19 death following vaccination
Supplementary Figure 3 Calibration of the QCOVID2 risk model to predict COVID-19 admission following vaccination
ACKNOWLEDGMENTS
We acknowledge the contribution of EMIS practices who contribute to QResearch® and EMIS Health and the Universities of Nottingham and Oxford for expertise in establishing, developing or supporting the QResearch database. This project involves data derived from patient-level information collected by the NHS, as part of the care and support of cancer patients. The data is collated, maintained and quality assured by the National Cancer Registration and Analysis Service, which is now part NHS Digital. The Hospital Episode Statistics data, SARS-Cov-2 results and civil registration mortality data are used by permission from NHS Digital who retain the copyright in that data. NHS Digital and Public Health England bears no responsibility for the analysis or interpretation of the data. KK is supported by NIHR Applied Research Collaboration East Midlands (ARC-EM) and the NIHR BRC.