Abstract
VOC 202012/01, a SARS-CoV-2 variant first detected in the United Kingdom in September 2020, has spread to multiple countries worldwide. Several studies have established that this novel variant is more transmissible than preexisting variants, but have not identified whether it leads to any change in disease severity. We analyse a large database of SARS-CoV-2 community test results and COVID-19 deaths, representing 52% of all SARS-CoV-2 community tests in England from 1 September 2020 to 5 February 2021. This subset of SARS-CoV-2 tests can identify VOC 202012/01 because mutations in this lineage prevent PCR amplification of the spike gene target (S gene target failure, SGTF). We estimate that the hazard of death among SGTF cases is 58% (95% CI 40–79%) higher than among non-SGTF cases after adjustment for age, sex, ethnicity, deprivation level, care home residence, local authority of residence and test date. This corresponds to the absolute risk of death for a male aged 55–69 increasing from 0.6% to 0.9% (95% CI 0.8–1.0%) over the 28 days following a positive test in the community. Correcting for misclassification of SGTF and missingness in SGTF status, we estimate a 71% (48–97%) higher hazard of death associated with VOC 202012/01. Our analysis suggests that VOC 202012/01 is not only more transmissible than preexisting SARS-CoV-2 variants but may also cause more severe illness.
Most community SARS-CoV-2 PCR tests in England are processed by one of six national “Lighthouse” laboratories. Among the mutations carried by Variant of Concern (VOC) 202012/01 is a 6-nucleotide deletion which prevents amplification of the S gene target by the commercial PCR assay used in three of the Lighthouse labs1. By linking individual records of positive community tests with and without S gene target failure (SGTF) to a comprehensive line list of COVID-19 deaths in England, we estimate the relative hazard of death associated with infection by VOC 202012/01. We define confirmed SGTF as a compatible PCR result with cycle threshold (Ct) < 30 for ORF1ab, Ct < 30 for N, and no detectable S (Ct > 40); confirmed non-SGTF as any compatible PCR result with Ct < 30 for each of ORF1ab, N, and S; and an inconclusive (missing) result as any other positive community test, including tests processed by a laboratory incapable of assessing SGTF. We address missing SGTF status in our analysis.
Characteristics of the study population
The study sample (Table 1) includes a total of 1,994,449individuals who had a positive community (“Pillar 2”) test between 1 November 2020 and 25 January 2021. Just over half of those tested (1,028,296, 52%) had a conclusive SGTF reading and, of these, 48% had SGTF. Females comprised 53.7% of the total sample; 44.4% were aged 1–34 years, 34.3% aged 35– 54, 15.1% aged 55–69, 4.3% aged 70–85 and 1.9% aged 85 or older. The majority of individuals (93.7%) lived in residential accommodation (defined as residing in a house, flat, sheltered accommodation, or house in multiple occupancy), with 3.1% living in a care or nursing home. Based on self-identified ethnicity, 73.8% were White, 13.7% Asian, 4.7% Black and 7.8% of other, mixed or unknown ethnicity. The data include tests performed in all 7 NHS England regions, with the London region contributing 23.4% of tests and the South West 5.8%. The first two weeks of the study period (1–14 Nov) contributed 12.6% of the total tests, and the final two weeks (10–25 Jan) 22.2%. The period between 27 Dec and 9 Jan contributed 30.5% of tests.
In those with SGTF status measured, SGTF prevalence was similar in males and females but lower in the older age groups: 54.9% in the 1-34 year olds compared with 48.6% in those aged 85 and older. In keeping with these age patterns, SGTF prevalence was lower in individuals living in a care or nursing home (45.2%, compared to 54.7% among those in residential accommodation). SGTF prevalence by self-identified ethnicity was 53.5% in the White group, 54.0% in the Asian group, 67.2% in the Black group, and 61.6% in the other, mixed, or unknown ethnicity group. SGTF prevalence was lowest in the most deprived index of multiple deprivation1 (IMD) decile (43.9%) and highest in the least deprived decile (58.7%). The highest prevalences of SGTF over the study period were observed in the East of England (75.7%), South East (75.6%) and London (74.0%) NHS England regions, and prevalence of SGTF was lowest in the North East and Yorkshire region (32.5%). The prevalence of SGTF also increased steeply over time (Fig. 1a), ranging from 4.9% during 1–14 November 2020 to 87.4% during 10 –25 January 2021.
Having missing SGTF status was strongly associated with age and place of residence. The proportion with SGTF status missing was similar in age groups 1-34 (47.9%), 35-54 (47.1%) and 55-69 (47.7%), and then rose to 54.3% in the 70-84 age group and to 78.6% in the 85 and older age group. SGTF status was missing in 89.1% of tests for individuals living in a care or nursing home, compared to 46.9% of tests among individuals in residential accommodation. This is partly due to more extensive use of lateral flow immunoassay tests in care homes, which do not yield an SGTF reading. Missingness in SGTF status also differed substantially by NHS England region, ranging from 21.5% in the North West to 70.8% in the South West. Missingness also depended on specimen date, with the percentage missing being lower for the earlier specimen dates and highest (55.4%) in the 2 week period that contributed the most tests (27 December-9 January). There were also some more minor differences in the percentages of missingness of SGTF status by ethnicity and IMD. Of the 48% of tests with missing SGTF status, 9% were inconclusive due to high Ct values and the remaining 39% were not analysed in one of the three Lighthouse labs capable of producing an SGTF result.
The most commonly used definition of a COVID-19 death in England is any death occurring within 28 days of a positive SARS-CoV-2 test. Table 2 presents crude death rates within 28 days of a positive test per 10,000 person-days of follow-up. Death rates for unlimited follow-up (i.e. not restricted to 28 days) are shown in Table S1; the maximum observed follow-up was 85 days. A total of 13,860 individuals out of the 1,994,449 in the study sample are known to have died (0.69%), 12,967 of whom (92.8%) died within 28 days of their first positive test (Fig. 1b). As expected, crude death rates were substantially higher in the elderly and in those living in a care or nursing home.
Crude survival assessed by Kaplan-Meier curves was lower in the SGTF group (Fig. 1c). Stratifying by broad age groups and looking at death rates by sex, place of residence, ethnicity, IMD, NHS England region, and specimen date, it can be seen that death rates within 28 days of a positive SARS-CoV-2 test are higher among SGTF than non-SGTF cases in 99 of the 108 strata assessed (92%; Figs. 1d–i).
Cox regression analyses
To estimate the effect of SGTF on mortality while controlling for observed confounding, we fitted a series of Cox proportional hazards models2 to the data. We stratified the analysis by lower tier local authority (LTLA) and specimen date to control for geographical and temporal differences in the baseline hazard—for example, due to changes in hospital pressure during the study period—and used spline terms for age and IMD and fixed effects for sex, ethnicity, and residence type. All models were fitted twice, once using complete cases only, i.e. by simply excluding individuals with missing SGTF status, and once using inverse probability weighting (IPW), i.e. accounting for missingness by upweighting individuals whose characteristics—age, sex, IMD, ethnicity, residence type, NHS England region of residence and sampling week—are underrepresented among complete cases.
For the complete-cases analysis, the estimated hazard ratio for SGTF was 1.58 (95% CI 1.40– 1.79), indicating that the hazard of death within 28 days of a positive test is 58% (40–79%) higher in those with SGTF compared to non-SGTF (Fig. 2a). We included an interaction term between SGTF and time since positive test in the model to assess the proportional hazards assumption. There was strong evidence of non-proportionality of hazards (likelihood ratio test Fig. 2a; Fig. S11). The estimated time-varying hazard ratio increases over time: 1.19 (0.94–1.52) one day after the positive test, 1.66 (1.46–1.88) on day 14, and 2.36 (1.71–3.25) on day 28. There was no evidence that adding higher-order functions of time into the interaction terms improved model fit (likelihood ratio test , and no evidence of a significant interaction between time and age , time and sex , time and IMD , time and ethnicity , or time and residence type .
We found no evidence of a significant interaction between SGTF and age group (likelihood ratio test , or ethnicity . There was some evidence of an interaction between SGTF and residence type , with the associated hazard ratio for SGTF being 1.53 (1.35–1.74) in standard residential accommodation, 2.43 (1.72–3.45) in care/nursing homes, and 1.64 (0.80–3.38) in “other” residence types (i.e. residential institutions including residential education, prisons and detention centres, medical facilities, no fixed abode and other/unknown).
In the investigation of a model for the probability of missingness in SGTF status, the cauchit model was found to provide a good fit and to result in less extreme weights than the logistic model. The IPW analysis was therefore performed using weights derived from the cauchit model. The IPW analysis yielded similar results to the complete-cases analysis, generally with marginally higher hazard ratios and wider CIs (Fig. 2e); the hazard ratio associated with SGTF for the IPW analysis was 1.67 (1.46–1.90). While the IPW analysis recovered a similarly time-varying hazard ratio to the complete-cases analysis, the increase was less marked (Fig. 2c) and the inclusion of a time-varying term did not significantly improve model fit .
Misclassification analysis
Prior to the emergence of VOC 202012/01, a number of minor circulating SARS-CoV-2 lineages with spike mutations could also cause SGTF3. Our main analyses are restricted to specimens from 1 November 2020 onwards to minimise the number of these non-VOC 202012/01 lineages among SGTF-positive samples. However, the appearance of non-VOC 202012/01 samples in SGTF may dilute the estimated effect of VOC 202012/01 on the hazard of mortality. We therefore undertook a misclassification analysis4, modelling the relative frequency of SGTF over time for each NHS England region as a combination of a low, time-invariant frequency of non-VOC 202012/01 samples with SGTF plus a logistically growing5 frequency of VOC 202012/01 samples with SGTF, which allows us to assign to each SGTF sample a probability pVOC that the sample is VOC 202012/01 based upon its specimen date and NHS England region (Fig. S9). Again restricting the analysis to specimens from 1 November 2020 onward, we find a hazard ratio associated with pVOC of 1.63 (1.44–1.86) for the complete-cases analysis and 1.71 (1.48– 1.97) for the IPW analysis.
Absolute risks
To put these results into context, we estimated how the absolute risk of death due to COVID-19 may differ had an individual been infected with VOC 202012/01 compared with had they been infected with the original variant. We calculated absolute risks by applying 28--day hazard ratios for SGTF to the baseline risk of death estimated among individuals tested in the community between August–October 2020 (expected to be representative of the CFR associated with preexisting variants of SARS-CoV-2; Table 3). The risk of death due to COVID-19 following a positive test in the community remains below 1% in most individuals younger than 70 years old. For the complete cases analysis, in females aged 70–84, the estimated risk of death within 28 days of a positive SARS-CoV-2 test with SGTF increases from 2.9% to 4.5% (95% CI 4.0– 5.1%) and for females 85 or older increases from 13% to 20% (17–22%). For males aged 70–84 the risk of death within 28 days increases from 4.7% to 7.3% (6.4–8.2%) and for males 85 or older it increases from 17% to 26% (23–28%). Estimates based on the IPW analysis were marginally higher. These estimates reflect a substantial increase in absolute risk amongst older age groups. Note that these estimates do not reflect the infection fatality ratio, but the fatality ratio among people tested in the community, and are thus likely to be higher than the infection fatality rate as many infected individuals will not have been tested.
Further investigations
We conducted a number of sensitivity analyses to verify the robustness of our results. Our main results were largely insensitive to: restriction to death-certificate-confirmed COVID-19 deaths only; any follow-up time of 21 days or longer; coarseness of geographical and temporal stratification; use of linear versus spline terms for age and IMD; analysis start date; followup time–covariate interactions; removal of the 10-day death registration cutoff; and restriction of the analysis to individuals with a full 28-day follow-up period (Fig. 2e; Table S2). Pillar 2 testing data include an indicator for whether the subject was tested because of symptoms or due to asymptomatic screening. Although symptomatic status may lie on the causal pathway between SGTF status and death, we adjusted for symptomatic status as a further sensitivity analysis and found that it had no effect on the relative hazard of SGTF (1.58 [1.39–1.79], complete-cases analysis).
Discussion
Our analysis identifies an increased hazard of death associated with VOC 202012/01 infection relative to infection by preexisting SARS-CoV-2 variants. We controlled for several factors that we hypothesised could confound the association between VOC 202012/01 infection status and mortality. By controlling for test time and geographical location, via stratified analysis, mimicking matching on these variables, we aimed to account for the fact that VOC 202012/01 infection increased rapidly over time and differed substantially by region, and also that the hospitals in which some individuals will have required care were subject to pressure on health services that changed over time and by region.
We do not attempt to identify the mechanism for an increased mortality rate in this analysis. There is some evidence that infections with VOC 202012/01 may be associated with higher viral loads, as measured by Ct values detected during PCR testing of specimens (Fig. S10). Higher viral loads resulting from infection with VOC 202012/01 may be partly responsible for the observed increase in mortality, partly because they may reduce the efficacy of standard antiviral treatments for COVID-19. The impact of viral load on observed SGTF mortality could be assessed using a mediation analysis, which is outside the remit of this study.
Another potential explanation for an increased mortality rate among individuals testing positive for VOC 202012/01 may be that this variant leads to changes in testing behaviour. If individuals infected with this variant are less likely to show symptoms, then only relatively more severe cases may get tested, and consequently our study would overestimate the infection fatality rate. However, comparison to random population testing carried out by the Office for National Statistics suggests no clear difference in the proportion of SGTF among Pillar 2 tests relative to the population at large (Fig. S12).
We previously identified that the novel SARS-CoV-2 lineage VOC 202012/01 appears to have a substantially greater transmission rate than preexisting variants of SARS-CoV-25, but could not robustly estimate any increase or decrease in associated disease severity from ecological analysis. The individual-level linked community testing data analysed here suggest that the fatality rate among individuals infected with VOC 202012/01 is higher than that associated with infection by preexisting variants. Crucially, due to the nature of the data currently available, we were only able to assess mortality among individuals who received a positive test for SARS-CoV-2 in the community. Indicators for VOC 202012/01 are not currently available for the vast majority of individuals who die due to COVID-19, as they are first tested in hospital. Accordingly, the evidence we provide here must be contextualised with further study of a larger population sample, and in other settings. Nonetheless, by focusing on individuals tested in the community, our analysis captures any combined effect of an altered risk of hospitalisation given positive test and an altered risk of death given hospitalisation, which would not be fully captured by a study focusing on hospitalised patients only.
Our findings are consistent with those identified by other groups using different methods to verify the increased risk of death among community-tested individuals with SGTF6. Estimates of increased mortality based upon Pillar 2 data will become more robust as test results and mortality outcomes continue to accumulate over time. However, our approach of comparing outcomes between individuals with and without SGTF who were tested in the same place and at the same time would no longer accrue additional information at the point when SGTF becomes effectively fixed in England, which may occur as soon as February 2021 if current trends continue5.
Methods
Data sources
We linked three datasets provided by Public Health England: a line list of all positive tests in England’s “Pillar 2” (community) testing for SARS-CoV-2, containing specimen date and demographic information on the test subject; a line list of cycle threshold (Ct) values for the ORF1ab, N (nucleocapsid), and S (spike) genes for positive tests that were processed in one of the three national laboratories (Alderley Park, Glasgow, or Milton Keynes) utilising the Thermo Fisher TaqPath COVID-19 assay; and a line list of all deaths due to COVID-19 in England, which combines and deduplicates deaths reported by hospitals in England, by the Office for National Statistics, via direct reporting from Public Health England Health Protection Team, and via Demographic Batch Service tracing of laboratory-confirmed cases 7. We link these datasets using a numeric identifier for Pillar 2 tests (‘FINALID’) common to all three datasets. We define S gene target failure (SGTF) as any test with Ct < 30 for ORF1ab and N targets but no detectable S gene, and non-SGTF as any test with Ct < 30 for ORF1ab, N, and S targets. A small proportion (9%) of SGTF tests are inconclusive. The study population of interest is defined as all individuals who received a positive Pillar 2 test between 1 November 2020 and 25 January 2021. For our main analysis, we included only tests from after 1 November 2020 to avoid including an excess of tests with SGTF not resulting from infection by VOC 202012/01. In sensitivity analyses, we also consider extending the population to include tests performed between 1 September and 31 October 2020..
The linked dataset available for analysis excludes individuals who first tested positive in hospital, that is, those who presented to hospital after symptom onset without first being tested in the community. This is because cycle threshold values used to ascertain SGTF status are not available for individuals who were not tested in the community. Our study sample comprises all community tests between 1 November 2020 and 25 January 2021, but only 7% of the total number of COVID-19 deaths were recorded within 28 days following a positive test in either the community or in hospital during this period. This is explained by differing mortality rates among individuals who first test positive in a hospital compared to those who first receive a community test.
There was a small amount of missing data for sex (n = 13, <0.01%), age (n = 151, <0.01%), and IMD and regional covariates (n = 3,428, 0.15%). There were no missing specimen dates. Individuals with missing age, sex, or geographical location were excluded. We also excluded individuals from the dataset whose age was recorded as zero, as there were 16,936 age-0 individuals compared to 8,867 age-1 individuals in the dataset, suggesting that many of these age-0 individuals may have been miscoded. There was some missing data on ethnicity (n = 43,032, 2%) and we created a category that combines missing values with “Other” and “Mixed”. Missing values for residence type (n = 67,458, 3%) were also combined with an “Other” category. The data set used for the main analysis comprises 1,994,449 individuals, and SGTF status is missing for 966,153 (48%). In addition, the SGTF status of 97,461 individuals (9%) with an inconclusive SGTF test was set to missing. Missing data on the exposure is addressed in the analysis, described below.
We grouped residence types into three categories: Residential, which included the “Residential dwelling (including houses, flats, sheltered accommodation)” and “House in multiple occupancy (HMO)” groups; Care/Nursing home; and Other/Unknown, which included the “Medical facilities (including hospitals and hospices, and mental health)”, “No fixed abode”, “Other property classifications”, “Overseas address”, “Prisons, detention centres, secure units”, “Residential institution (including residential education)”, and “Undetermined” groups, as well as unspecified residence type. We grouped ethnicities into four categories according to the broad categories used in the 2011 UK Census: Asian, which included the “Bangladeshi (Asian or Asian British)”, “Chinese (other ethnic group)”, “Indian (Asian or Asian British)”, “Pakistani (Asian or Asian British)”, and “Any other Asian background” groups; Black, which included the “African (Black or Black British)”, “Caribbean (Black or Black British)”, and “Any other Black background” groups; White, which included the “British (White)”, “Irish (White)”, and “Any other White background” groups; and Other / Mixed / Unknown, which included the “Any other ethnic group”, “White and Asian (Mixed)”, “White and Black African (Mixed)”, “White and Black Caribbean (Mixed)”, “Any other Mixed background”, and “Unknown” groups.
Statistical methods
There are several factors that we expect to be associated with both SGTF and with risk of death, thus confounding the association between SGTF and risk of death in those tested. Area of residence and specimen date were expected to be potentially strong confounders. Area of residence is expected to be strongly associated with SGTF status due to different virus variants circulating in different areas, and specimen date because the prevalence of SGTF is known to have greatly increased over time. Area of residence and specimen date are also expected to be associated with risk of death following a test, including due to differential pressure on hospital resources by area and time. The following variables were also identified as potential confounders: sex, age, place of residence (Residential, Care/Nursing home, or Other/Unknown), ethnicity (White, Asian, Black, or Other/Mixed/Unknown), index of multiple deprivation (IMD). The potential confounders are referred to collectively as the covariates. For descriptive analyses, age (in years) was categorised as 1-34, 35-54, 55-69, 70-84, 85 and older.
Descriptive analyses were performed. We tabulated the distribution of the covariates in the whole study sample, and the association between each covariate and SGTF status in the subset with SGTF measured (Table 1). We also summarised the association between each covariate and missing data in SGTF status (Table 1). The subset with SGTF status measured are referred to as the complete cases. The unadjusted association between SGTF and mortality in the complete cases was assessed using a Kaplan-Meier plot (Fig. 1c), and Kaplan-Meier plots and crude mortality rates (Table 2) are also presented separately according to categories of the covariates (Figs. S1–S7). Crude overall mortality rates were obtained for the whole sample, by SGTF status in the complete cases, and and in those with missing SGTF status, according to categories of each covariate (Table 2). We also obtained mortality rates by SGTF status (in the complete cases) for categories of each covariate stratified by age group. Exact Poisson CIs are used for mortality rates, assuming constant rate.
Approximately 46% of individuals in the study sample are missing data on SGTF status, due to their test not being sent to one of the three laboratories utilising the Thermo Fisher TaqPath COVID-19 assay or the test being inconclusive. We performed complete cases analysis, restricted to the subset with SGTF status measured. This complete case analysis assumes that for each analysis, the missing data, in this case missing SGTF status, is independent from the outcome of interest, given the variables included in the models. This is a specific type of Missing not at random assumption, as in particular it is allowed to depend on the underlying value of SFTG. We also performed an analysis of the complete cases using inverse probability weights8 (IPW) to address the missing data on SGTF, under a missing at random assumption (MAR). In the analysis, each individual with SGTF status measured is weighted by the inverse of their probability of having SGTF status measured based on their covariates. For the IPW, the missingness model estimated the probability of missingness using logistic regression with age (restricted cubic spline), sex, IMD decile (restricted cubic spline), ethnicity, residence type, and NHS region by specimen week as predictors. We also considered a cauchit and a Gosset link for the missingness model, including the same predictors, as this was expected to provide better stability for the weights9. The fit of the missingness model was assessed using a Q-Q plot (Fig. S11), and Hosmer-Lemeshow and Hinkley tests were used to choose the most appropriate model.
Cox regression2 was used to estimate the association between SGTF and the hazard for mortality, conditioning on the potential confounders listed above. The analyses described here were applied to the complete cases and using IPW, For IPW analyses, the standard errors (SEs) accounted for the weights, though the fact that the weights were estimated was not accounted for. This results in conservative SEs. The baseline hazard in the Cox model was stratified by both specimen date and LTLA, therefore finely controlling for these variables. The stratification gives a large number of strata matched by specimen date and LTLA. Only those strata that contain individuals who die and individuals who survive contribute to the analysis. The analysis is therefore similar to that which would be performed had we created a matched nested case-control sample. The remaining variables were included as covariates in the model (sex, age, place of residence, ethnicity, IMD decile). Age and IMD were included as restricted cubic splines with 3 knots. The time origin for the analysis was specimen date and we considered deaths up to 28 days after the specimen date. Individuals who did not die within 28 days were censored at the earlier of 28 days post specimen date and the administrative censoring date, which we chose as the date of the most recent death linkable to SGTF status minus 10 days (i.e., 25 January 2021) in order to minimise any potential bias due to late reporting of deaths. We began by assuming proportionality of hazards for SGTF and the covariates included in the model. The proportional hazards assumption was assessed by including in the model an interaction between each covariate and time, which was performed separately for SGTF and for each other covariate. Schoenfeld residual plots were also obtained for each covariate (Fig. S8). We assessed whether the association between SGTF and the hazard was modified by age, sex, IMD, ethnicity, and place of residence. Models with and without interactions were compared using likelihood ratio tests for the complete cases analyses. For the analysis using IPW we used Wald tests based on robust standard errors10.
The analysis assumes that censoring is uninformative, which is plausible as all censoring is administrative.
Misclassification analysis
The exposure of SGTF is subject to misclassification, because a number of minor circulating variants of SARS-CoV-2 in addition to VOC 202012/01 are also associated with failure to amplify the spike gene target. Accordingly, a positive test with SGTF is not necessarily indicative of infection with VOC 202012/01. A negative test of SGTF is assumed to be indicative of absence of infection with VOC 202012/01. Misclassification of an exposure can result in bias in its estimated association with the outcome. We fitted a logistic model to Pillar 2 SGTF frequencies by NHS region to estimate a “background” rate of SGTF in the absence of VOC 202012/01, assuming a beta binomial prior. This model is then used to estimate the probability that an individual testing positive with SGTF is infected with VOC 202012/01, separately for individuals in each NHS region. These probabilities can then be used in place of the indicator of SGTF exposure in the Cox models. This is the regression calibration approach4 to correcting for bias due to measurement error in an exposure..
We fitted models accounting for false positives (modelled as regionally-varying background rates of SGTF associated with non-VOC 202012/01 variants) to the SGTF data. Our logistic model for VOC 202012/01 growth over time is as follows:
Here, f(t) is the predicted frequency of VOC 202012/01 among positive tests at time t (in days since 1 September 2020) based on the terms slope and intercept; s(t) is the predicted frequency of S gene target failure at time t due to the combination of VOC 202012/01 and a background false positive rate falsepos, conc is the “concentration” parameter (= α + β) of a beta distribution with mode s(t); kt is the number of S gene target failures detected at time t; and nt is the total number of tests at time t. All priors above are chosen to be vague, and the truncation of conc to values greater than 2 ensures a unimodal distribution for the proportion of tests that are SGTF. The model above is fitted separately for each NHS England region. Then, pVOC for a test with SGTF = 1 at time t is equal to f(t)/s(t), and pVOC = 0 for all tests with SGTF = 0.
The model above was fitted using the same data source (i.e. SGTF frequencies among Pillar 2 community tests for SARS-CoV-2) as our survival analysis. To verify the robustness of this model, we performed a sensitivity analysis using sequencing data from the COVID-19 UK Genomics Consortium11 downloaded from the Microreact platform12 on 11 January 2020 to estimate pVOC. In this alternative analysis we estimated pVOC for each NHS England region and date as the number of samples that were VOC 202012/01 (i.e. lineage B.1.1.7 with mutations Δ69/Δ70 and N501Y in Spike) divided by the number of samples that were SGTF (i.e. any lineage with Δ69/Δ70, the deletion that causes SGTF) for that NHS England region and date, setting pVOC = 1 for all dates later than 31 December 2020 as there were no sequencing data available past this date, and filling any gaps in the data using linear interpolation. This yielded nearly identical results to our modelled probability of VOC (Fig. 2e).
Absolute risks
Estimates from the final Cox models were used to obtain estimates of absolute risk of death for 28 and 60 days with SGTF and pVOC. Given the strong influence of age on risk of death, we present absolute risks by sex and age group (1-34, 35-54, 55-69, 70-84, 85+). Absolute risks of death (case fatality rate) within 28 and 60 days were estimated by age group and sex using data on individuals tested during September 2020; this is referred to as the baseline risk. The absolute risks of death for individuals with SGTF were then estimated as follows. If the baseline absolute risk of death in a given age group is (1 − A), then the estimated absolute risk of death with SGTF is (1 – AHR), where HR denotes the estimated hazard ratio obtained from the Cox model assuming proportional hazards. We applied the hazard ratio from 28 days to the baseline risk for 28 days, and the hazard ratio for 60 days to the baseline risk for 60 days, to estimate absolute risks of death for individuals with SGTF and uncertainty of these estimates. Standard errors are obtained via the delta method, and CIs based on normal approximations.
Sensitivity analyses
Several sensitivity analyses were performed. After establishing the final model through using the process outlined above we investigated the impact of using different variables for stratification of the baseline hazard measuring region at a coarser level (UTLA, or NHS England region), as well as coarser test specimen time (week rather than exact date).
Adjusting for these variables instead of using stratification was also explored. We also repeated the main analysis restricting data to specimens collected from September onwards, October onwards, November onwards, or December onwards.
To assess the impact of imposing an administrative cutoff to follow-up time of 10 days prior to data extraction, we first reanalysed the data without this cutoff, as well as reanalysing the data restricting the analysis to individuals with at least 28 days’ follow-up.
Finally, we adjusted for symptomatic status associated withthe test (asymptomatic, symptomatic, or unknown), which relates to whether the test was given for asymptomatic screening purposes or on the basis of a request by a (presumed symptomatic) individual, as only symptomatic individuals may request a community SARS-CoV-2 test in England.
Data Availability
Analysis code is available at https://github.com/nicholasdavies/cfrvoc. An anonymised data set allowing replication of the analysis is available at the same URL.
Funding statement
NGD: UK Research and Innovation (UKRI) Research England; NIHR Health Protection Research Unit in Immunisation (NIHR200929); UK Medical Research Council (MC_PC_19065). CIJ: Global Challenges Research Fund project ‘RECAP’ managed through Research Councils UK and the Economic and Social Research Council (ES/P010873/1). WJE: European Commission (EpiPose 101003688), National Institutes of Health Research (NIHR200908). NPJ: National Institutes of Health / National Institute of Allergy and Infectious Diseases (R01AI148127). KDO: Royal Society-Wellcome Trust Sir Henry Dale Fellowship 218554/Z/19/Z. RHK: UKRI Future Leaders Fellowship (MR/S017968/1).
Working group authors and acknowledgements
The CMMID COVID-19 working group is (randomized order) Kevin van Zandvoort, Samuel Clifford, Fiona Yueqian Sun, Sebastian Funk, Graham Medley, Yalda Jafari, Sophie R Meakin, Rachel Lowe, W John Edmunds, Matthew Quaife, Naomi R Waterlow, Rosalind M Eggo, Nicholas G. Davies, Jiayao Lei, Mihaly Koltai, Fabienne Krauer, Damien C Tully, James D Munday, Alicia Showering, Anna M Foss, Kiesha Prem, Stefan Flasche, Adam J Kucharski, Sam Abbott, Billy J Quilty, Thibaut Jombart, Alicia Rosello, Gwenan M Knight, Mark Jit, Yang Liu, Jack Williams, Joel Hellewell, Kathleen O’Reilly, Yung-Wai Desmond Chan, Timothy W Russell, Christopher I Jarvis, Simon R Procter, Akira Endo, Emily S Nightingale, Nikos I Bosse, C Julian Villabona-Arenas, Frank G Sandmann, Amy Gimma, Kaja Abbas, William Waites, Katherine E. Atkins, Rosanna C Barnard, Petra Klepac, Hamish P Gibbs, Carl A B Pearson, and Oliver Brady.
Funding statements for the CMMID COVID-19 working group are as follows. KvZ: KvZ is supported by the UK Foreign, Commonwealth and Development Office (FCDO)/Wellcome Trust Epidemic Preparedness Coronavirus research programme (ref. 221303/Z/20/Z), and Elrha’s Research for Health in Humanitarian Crises (R2HC) Programme, which aims to improve health outcomes by strengthening the evidence base for public health interventions in humanitarian crises. The R2HC programme is funded by the UK Government (FCDO), the Wellcome Trust, and the UK National Institute for Health Research (NIHR). SC: Wellcome Trust (grant: 208812/Z/17/Z). FYS: NIHR EPIC grant (16/137/109). SFunk: Wellcome Trust (grant: 210758/Z/18/Z), NIHR (NIHR200908). GFM: NTD Modelling Consortium by the Bill and Melinda Gates Foundation (OPP1184344). YJ: LSHTM, DHSC/UKRI COVID-19 Rapid Response Initiative. SRM: Wellcome Trust (grant: 210758/Z/18/Z). RL: Royal Society Dorothy Hodgkin Fellowship. WJE: European Commission (EpiPose 101003688), NIHR (NIHR200908). MQ: European Research Council Starting Grant (Action Number #757699); Bill and Melinda Gates Foundation (INV-001754). NRW: Medical Research Council (grant number MR/N013638/1). RME: HDR UK (grant: MR/S003975/1), MRC (grant: MC_PC 19065), NIHR (grant: NIHR200908). NGD: UKRI Research England; NIHR Health Protection Research Unit in Immunisation (NIHR200929); UK MRC (MC_PC_19065). JYL: Bill & Melinda Gates Foundation (INV-003174). MK: Foreign, Commonwealth and Development Office / Wellcome Trust. FK: Innovation Fund of the Joint Federal Committee (Grant number 01VSF18015), Wellcome Trust (UNS110424). DCT: No funding declared. JDM: Wellcome Trust (grant: 210758/Z/18/Z). AS: No funding declared. AMF: No funding declared. KP: Gates (INV-003174), European Commission (101003688). SFlasche: Wellcome Trust (grant: 208812/Z/17/Z). AJK: Wellcome Trust (grant: 206250/Z/17/Z), NIHR (NIHR200908). SA: Wellcome Trust (grant: 210758/Z/18/Z). BJQ: This research was partly funded by the National Institute for Health Research (NIHR) (16/137/109 & 16/136/46) using UK aid from the UK Government to support global health research. The views expressed in this publication are those of the author(s) and not necessarily those of the NIHR or the UK Department of Health and Social Care. BJQ is supported in part by a grant from the Bill and Melinda Gates Foundation (OPP1139859). TJ: RCUK/ESRC (grant: ES/P010873/1); UK PH RST; NIHR HPRU Modelling & Health Economics (NIHR200908). AR: NIHR (grant: PR-OD-1017-20002). GMK: UK Medical Research Council (grant: MR/P014658/1). MJ: Gates (INV-003174, INV-016832), NIHR (16/137/109, NIHR200929, NIHR200908), European Commission (EpiPose 101003688). YL: Gates (INV-003174), NIHR (16/137/109), European Commission (101003688). JW: NIHR Health Protection Research Unit and NIHR HTA. JH: Wellcome Trust (grant: 210758/Z/18/Z). KO’R: Bill and Melinda Gates Foundation (OPP1191821). YWDC: No funding declared. TWR: Wellcome Trust (grant: 206250/Z/17/Z). CIJ: Global Challenges Research Fund (GCRF) project ‘RECAP’ managed through RCUK and ESRC (ES/P010873/1). SRP: Bill and Melinda Gates Foundation (INV-016832). AE: The Nakajima Foundation. ESN: Gates (OPP1183986). NIB: Health Protection Research Unit (grant code NIHR200908). CJVA: European Research Council Starting Grant (Action number 757688). FGS: NIHR Health Protection Research Unit in Modelling & Health Economics, and in Immunisation. AG: European Commission (EpiPose 101003688). KA: Bill & Melinda Gates Foundation (OPP1157270, INV-016832). WW: MRC (grant MR/V027956/1). KEA: European Research Council Starting Grant (Action number 757688). RCB: European Commission (EpiPose 101003688). PK: This research was partly funded by the Royal Society under award RP\EA\180004, European Commission (101003688), Bill & Melinda Gates Foundation (INV-003174). HPG: This research was produced by CSIGN which is part of the EDCTP2 programme supported by the European Union (grant number RIA2020EF-2983-CSIGN). The views and opinions of authors expressed herein do not necessarily state or reflect those of EDCTP. This research is funded by the Department of Health and Social Care using UK Aid funding and is managed by the NIHR. The views expressed in this publication are those of the author(s) and not necessarily those of the Department of Health and SocialCare (PR-OD-1017-20001). CABP: CABP is supported by the Bill & Melinda Gates Foundation (OPP1184344) and the UK Foreign, Commonwealth and Development Office (FCDO)/Wellcome Trust Epidemic Preparedness Coronavirus research programme (ref. 221303/Z/20/Z). OJB: Wellcome Trust (grant: 206471/Z/17/Z).
Ethical approval
Approved by the Observational / Interventions Research Ethics Committee at the London School of Hygiene and Tropical Medicine (reference number 24020). Subject consent is not required for national infectious disease notification data sets in England.
Code and data availability
Analysis code is available at https://github.com/nicholasdavies/cfrvoc. An anonymised data set allowing replication of the analysis is available at the same URL.
Supplementary Figures
Acknowledgements
We gratefully acknowledge the assistance of Public Health England in providing the analysis data and authorising release of an anonymised data set.