The SARS-CoV-2 Alpha variant is associated with increased clinical severity of disease

Background The Alpha (B.1.1.7) SARS-CoV-2 variant of concern has been associated with increased transmission and increased 28-day mortality. We aimed to investigate the impact of infection on clinical severity of illness, including the need for oxygen or ventilation in a national cohort study. Methods In this prospective clinical cohort study, 1475 SARS-CoV-2 sequences were obtained from patients infected in Scotland, UK between the 1st November 2020 and 30th January 2021 and matched to clinical outcomes as the lineage became dominant in Scotland. We modelled the association between B.1.1.7 infection and severe disease using a cumulative generalised linear mixed model employing a 4-point scale of maximum severity based on requirement of respiratory support at 28 days. We also estimated the growth rate of B.1.1.7-associated infections as it emerged in Scotland using a phylogenetic exponential growth rate population model. Results The B.1.1.7 lineage was responsible for a third wave of SARS-CoV-2 infection in Scotland in association with a transmission rate 5-fold higher than the preceding second wave B.1.177 lineage. Of 1475 patients, 364 were infected with B.1.1.7, 1030 with B.1.177 and 81 with other lineages. Our analysis found a positive association between increased clinical severity and lineage (B.1.1.7 versus non-B.1.1.7; cumulative odds ratio: 1.40, 95% CI: 1.02, 1.93). Viral load was higher in B.1.1.7 samples than in non-B.1.1.7 samples, as measured by cycle threshold (Ct) value (mean Ct change: -2.46, 95% CI: -4.22, -0.70). Conclusions The B.1.1.7 lineage was associated with more severe clinical disease in Scottish patients than co-circulating lineages.


Background
The B.1.1.7 (Alpha) SARS-CoV-2 variant of concern was associated with increased transmission relative to other variants present at the time of its emergence and several studies have shown an association between the B.1.1.7 lineage infection and increased 28-day mortality. However, to date none have addressed the impact of infection on severity of illness or the need for oxygen or ventilation.

Methods
In this prospective clinical cohort sub-study of the COG-UK consortium, 1475 samples from hospitalised and community cases collected between the 1 st November 2020 and 30 th January 2021 were collected. These samples were sequenced in local laboratories and analysed for the presence of B.1.1.7-defining mutations. We prospectively matched sequence data to clinical outcomes as the lineage became dominant in Scotland and modelled the association between B.1.1.7 infection and severe disease using a 4-point scale of maximum severity by 28 days: 1. no support, 2. oxygen, 3. ventilation and 4. death. Additionally, we calculated an estimate of the growth rate of B.1.1.7associated infections following introduction into Scotland using phylogenetic data.

Conclusions
The B.1.1.7 lineage was associated with more severe clinical disease in Scottish patients than cocirculating lineages.
The B.1.1.7 SARS-CoV-2 Pango lineage (termed the Alpha variant by the World Health Organisation) was first identified in the UK in September 2020 and at the time of writing has been reported in 150 countries (1). It is defined by 21 genomic mutations or deletions, including 8 characteristic changes within the spike gene (Table S1) (2). These are associated with increased ACE-2 receptor binding affinity and innate and adaptive immune evasion (3-6). The B.1.1.7 lineage, the first variant of concern (VOC), was estimated to be 50-100% more transmissible than others present at the time of its emergence (7), explaining the transient dominance of variants in this lineage globally. The presence of a spike gene deletion (Δ69-70) results in spike-gene target failure (SGTF) in real-time reverse transcriptase polymerase chain reaction (RT-PCR) diagnostic assays and provides a useful proxy for the presence of B.1.1.7 for epidemiological analysis (2). Recently, three large community analyses have shown a positive association between 28-day mortality and the presence of SGTF, with hazard ratios of 1.55 (CI 1.39-1.72), 1.64 (CI 1.32-2.04) and 1.67 (CI 1.34-2.09) (8)(9)(10). Two other large-scale analyses found a greater risk of hospitalisation in cases with SGTF (hazard ratio 1.52; CI 1.47-1.57) or confirmed B.1.1.7 infection (hazard ratio 1.34; CI 1.07-1.66) (11,12). In contrast, a smaller analysis of 341 hospitalised patients with confirmed COVID-19 and matched sequences found no association between B.1.1.7 and increased clinical severity on a composite score of severe COVID-19 at day 14 and 28-day mortality (PR 1.02, CI 0.76-1.38, p=0.88) (13). Limited data are available on the full clinical course of disease with B.1.1.7 in relation to other variants.
Understanding the clinical pattern of disease with B.1.1.7 infection is important for a number of reasons. Firstly, if B.1.1.7 is more pathogenic in younger people than previous variants, this has implications for easing of lockdown in partially vaccinated populations, especially vaccination focused on targeting older age groups. Secondly, much of the world, particularly in low-and middleincome countries, is unlikely to achieve vaccination coverage until well into 2022. A better understanding of a lineage with increased severity is important in modelling the impact of unmitigated infection in these settings. Finally, a clear understanding of the behaviour of this lineage, which has emerged as a dominant variant, is needed as a baseline to compare the clinical phenotype of newly emerging variants such as B.1.351 (Beta variant) and the B.1.617 sublineages (particularly B.1.617.2, Delta variant) which may be better able to evade vaccine-induced immunity than B.1.1.7 and therefore may have the potential to spread even in immunised populations (14).
We aimed to quantify the clinical features and rate of spread of B.1.1.7-lineage infections in Scotland in a comprehensive national dataset. We used whole genome sequencing data to analyse patient presentations between the 1st November 2020 and 30th January 2021 as the virus emerged in Scotland and used cumulative generalised additive models to compare 28-day maximum clinical severity for B.1.1.7 against other lineages over the same period.

METHODS
Sequencing -sequencing was performed using amplicon-based next generation sequencing as previously described (15) as part of the COG-UK consortium (16).
Bioinformatics -sequence alignment, lineage assignment, tree generation and estimates of growth rate were performed using the COG-UK data pipeline (https://github.com/COG-UK/datapipe) and phylogenetic pipeline (https://github.com/cov-ert/phylopipe) with pangolin lineage assignment (https://github.com/cov-lineages/pangolin) (17). Lineage assignments were performed on 18/03/2021 and phylogenetic analysis was performed using the COG-UK tree generated on 25/02/2021. Estimates of growth rates of major lineages in Scotland were calculated from timeresolved phylogenies for lineages B. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; 2020 -March 2021 in BEAST with an exponential growth rate population model, strict molecular clock model and TN93 with four gamma rate distribution categories. Each lineage was randomly subsampled to a maximum of 5 sequences per epiweek (resulting in 52 to 103 sequences per subsample, depending on the lineage), and 10 subsamples replicates analysed per lineage in a joint exponential growth rate population model.
Clinical data -we included all Scottish COG-UK pillar 1 samples sequenced at the MRC-University of Glasgow Centre for Virus Research (CVR) and the Royal Infirmary of Edinburgh (RIE) between 1st November 2020 and 30th January 2021. These samples derived from hospitalised patients (59%) as well as community testing (41%). Core demographic data (age, sex, partial postcode) were collected via linkage to electronic patient records and a full prospective review of case notes was undertaken. Collected data included residence in a care home; occupation in care home or healthcare setting; admission to hospital; date of admission, discharge and/or death and maximum clinical severity at 28 days sample collection date via a 4-point ordinal scale (1. No respiratory support; 2. Supplemental oxygen; 3. Intubation and ventilation or non-invasive ventilation or high-flow nasal canula; 4. Death) as previously used in Volz et al 2020 and Thomson et al 2021 (18-19). Where available, PCR cycle threshold (Ct) and the PCR testing platform were recorded. Hospital acquired COVID-19 in patients admitted to hospital was defined as a first positive PCR occurring greater than 48 hours following admission to hospital. Discharge status was followed up until 15th April 2021 for the hospital stay analysis. For the co-morbidity subanalysis, delegated research ethics approval was granted for linkage to National Health Service (NHS) patient data by the Local Privacy and Advisory Committee at NHS Greater Glasgow and Clyde. Cohorts and de-identified linked data were prepared by the West of Scotland Safe Haven at NHS Greater Glasgow and Clyde.
Severity analyses -four level severity data was analysed using cumulative (per the definition of Bürkner and Vuorre (2019)) generalised additive mixed models (GAMMs) with logit links, specifically, following Volz et al (2020) (18,20). We analysed three subsets of the data: 1. the full dataset, 2. the dataset excluding care home patients, and 3. exclusively the hospitalised population. Further details regarding these analyses are provided in Supplementary Appendix 1.
Ct analysis -Ct value was compared between B.1.1.7 and non-B.1.1.7 lineage infections for those patients where the TaqPath assay (Applied Biosystems) was used. This platform was used exclusively for this analysis because different platforms output systematically different Ct values, and this was the most frequently used in our dataset (n = 154, B.1.1.7 = 38, non-B.1.1.7 = 116). We used a generalised additive model with a Gaussian error structure and identity link, and the same covariates used as in the severity analysis to model the Ct value. The model was fitted using the brms (v. 2.14.4) R package (22). The presented model had no divergent transitions and effective sample sizes of over 200 for all parameters. The intercept of the model was given a t-distribution (location = 20, scale = 10, df = 3) prior, the fixed effect coefficients were given normal (mean = 0, standard deviation = 5) priors, random effects and spline standard deviations were given exponential (mean = 5) priors.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ;

Emergence of the B.1.1.7 lineage in Scotland
Between 01/11/2020 and 31/01/2021 1863 samples from individuals tested in pillar 1 facilities underwent whole genome sequencing for SARS-CoV-2. Of these, 1475 (79%) could be linked to patient records and were included in the analysis. The contribution of patients infected with the B.1.1.7 variant increased over the course of the study, in line with dissemination across the UK during the study period (Figure 1a and 1b). Two peaks of SARS-CoV-2 infection have occurred in the UK to date: the first (wave 1) in March 2020 (13) and the second in summer 2020 (26), both in association with hundreds of importations following travel to Central Europe (27). The second peak incorporated two variant waves (waves 2 and 3), initially of B.1.177 ( Figure 1c) and then B.1.1.7, radiating from the South of England (Figure 1e). This B.1.1.7 "takeover" (Figure 1d), corresponded to a five-fold increase in growth rate on an epidemiological scale relative to non-B.1.1.7 lineages (Figure 1f).

Demographics of the clinical cohort
The age of the clinical cohort ranged from 0-105 years, (mean 66.8 years) and was slightly lower in the B.1.1.7 group (65.6 years vs. 67.2 years). Overall, 59.1% were female; this preponderance occurred in both subgroups and was higher in the B.1.1.7 subgroup (60.4% vs 58.6%). In the full cohort, 3.0% were care home workers and 10.4% were NHS healthcare workers. 5.5% and 5.8% of those infected with the B.1.1.7 variant were care home and other healthcare workers respectively, compared with 2.2% and 12.0% of those infected with non-B.1.1.7 lineages. 12.9% of those in the B.1.1.7 subgroup were care home residents, compared with 21.7% in non-B.1.1.7. There was also a difference in the proportion of cases admitted to Intensive Care Units: 6.3% of the B.1.1.7 group compared with 3.4% for non-B.1.1.7. Full details of the demographic data of the cohort can be found in Table 1 and full lineage assignments can be found in Table S2.

Clinical severity analysis
Within the clinical severity cohort there were 364 B.1.1.7, 1030 B.1.177 and 81 of 19 other lineage infections ( Figure 2). Consistent with previous research comparing mortality and hospitalisation in SGTF detected by PCR versus absence of SGTF, we found that B.1.1.7 lineage viruses were associated with more severe disease on average than those from other lineages circulating during the same time period. In the full dataset, we observed a positive association with severity (median cumulative odds ratio: 1.40, 95% CI: 1.02,1.93). In both the subsets excluding care home patients, or limiting to hospitalised patients, the mean estimate of the increase in severity of B.1.1.7 lineage viruses was smaller, and the variance in the posterior distribution higher likely due to the smaller sample sizes. Given this uncertainty, we cannot determine whether the association of B.1.1.7 with severity in the populations corresponding to these subsets is the same as that in the population described by the full dataset, but in all cases, the most likely direction of the effect is positive. Model estimates from severity models from all subsets can be found in Tables S3-5. Bernoulli models looking at each severity category individually suggested that for our cohort, there was no evidence that B.1.1.7 was associated with increased mortality at 28 days (median odds ratio: 1.04; 95% central credible interval: 0.67,1.59), but that infection with B.1.1.7 lineage viruses was associated with a moderate increase in the risk of requiring supplemental oxygen (median odds ratio: 1.77; 95% central credible interval: 1.12,2.83). An individual model looking at high flow oxygen/ventilation could not be fit due to the low numbers of events in some cells. Estimates of the severity across the phylogeny are visible in Figure 3 is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; https://doi.org/10.1101/2021.08.17.21260128 doi: medRxiv preprint comorbidities for the subset of patients where they were available implied that the inclusion of comorbidities had no impact on the results obtained, see Supplementary Appendices 1 and 3.
Model estimates for all parameters can be found in Table S6.
We found no evidence that B.1.1.7 was associated with longer hospital stays after controlling for age and sex (HR: -0.02; 95% CI: -0.23, 0.20; p = 0.89). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; introductions from England. Wave three has a single origin in Kent so Scotland lags behind England in numbers of cases F) Estimates of growth rates of major lineages in Scotland from time-resolved phylogenies. Estimates were carried out on a subsample of the named lineages using sequences from Scotland only from November 2020-March 2021using BEAST and an exponential growth effective population size model. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; Figure 3: The estimated maximum likelihood phylogenetic tree and a measure of estimated severities of infection. Estimated severities for each viral isolate are means and 95% credible intervals of the linear predictor change under infection with that viral genotype from the phylogenetic random effect in the cumulative severity model under a Brownian motion model of evolution. This model constrains genetically identical isolates to have identical effects, so changes should be interpreted across the phylogeny rather than between closely related isolates which necessarily have similar estimated severities. The dataset was downsampled to 100 random samples for this figure to aid readability. Figure was generated using ggtree (28).

Discussion
In this prospective analysis of hospitalised and community patients with B.1.1.7 and non-B.1.1.7 lineage SARS-CoV-2 infection, carried out as the B.1.1.7 became dominant in Scotland, we provide evidence of increased clinical severity associated with this variant. This was observed across all adult age groups, incorporating the spectrum of COVID-19 disease; from no requirement for supportive care to supplemental oxygen requirement, the need for invasive or non-invasive ventilation to death. This analysis is the first to assess the full clinical severity spectrum of B.1.1.7 infection in relation to other prevalent lineages circulating during the same time period.
Our study supports recent community testing analyses that have reported an increased 28-day mortality associated with SGTF as a proxy for B.1.1.7 status (8-10). A smaller study found no effect of the lineage on 28-day mortality (13), but we note that we would not have detected an effect in a population of the size used in Frampton et al. 2021, indicating that while there is evidence for an effect, it is not large enough to be observed in smaller detailed studies. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; The association between higher viral load, higher transmission and lineage may reflect changes in the biology of the virus; for example, the B.1.1.7 asparagine (N) to tyrosine (Y) mutation at position 501 of the spike protein receptor binding domain (RBD) is associated with an increase in binding affinity to the human ACE2 receptor (29). In addition, a deletion at position 69-70 may increase virus infectivity (30). The P681H mutation found at the furin cleavage site is associated with more efficient furin cleavage, enhancing cell entry (31). An alternative explanation for the higher viral loads observed in B.1.1.7 infection may be that clinical presentation occurs earlier in the illness. Further modelling, animal experiments and studies in healthy volunteers may help to unravel the mechanisms behind this phenomenon.
Our data indicate an association between B.1.1.7 and an increased risk of requiring supplemental oxygen and ventilation; two factors that are critical determinants of healthcare capacity during a period of high incidence of SARS-CoV-2 infection. This means that countries where B.1.1.7 is not yet dominant, in particular those with weaker public health control of the virus, will need to factor the requirement for supportive treatment into models of clinical severity and pandemic response decision planning. In regions where B.1.1.7 is dominant it should be used as the comparison lineage for clinical severity analysis of emergent variants of concern, such as B.1.351 and B.1.617.2.
There are some limitations to our study. Our dataset is drawn from first-line local NHS diagnostic (pillar 1) testing which over-represents patients presenting for hospital care (59%) while those sampled in the community represented 41% of the dataset. Further, the analysis dataset employed a non-standardised approach to sampling across the study period as sequencing was carried out both as systematic randomised national surveillance and sampling following outbreaks of interest. Finally, the cumulative model used in this analysis assumes a homogenous application of therapeutic intervention across the population. Despite these limitations, our results remain consistent with previous work on the mortality of Alpha, and this study provides new information regarding differences in infection severity.
In summary, the B.1.1.7 lineage was found to be associated with a rapid increase in SARS-CoV-2 cases in Scotland and an increased risk of severe infection requiring supportive care. This has implications for planning for outbreaks in countries with low vaccine uptake where the B.1.1.7 lineage is not yet dominant. Our study has shown the value of the collection of higher resolution patient outcome data linked to genetic sequences when looking for clinically relevant differences between viral variants.

Tables
All tables should be included at the end of the manuscript text file. Double-space tables (including footnotes) and provide a title for each table. For Original Articles, there is normally a limit of five figures and tables (total) per manuscript. Extensive tables or supplementary materials will be published as supplemental materials with the digital version of the article. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Appendix 1 -Further methods
Four level severity data was analysed using cumulative (per the definition of Bürkner and Vuorre (2019)) generalised additive mixed models (GAMMs) with logit links, specifically, following Volz et al (2020) (18,20). We analysed three subsets of the data: 1. the full dataset, 2. the dataset excluding care home patients, and 3. exclusively the hospitalised population. These GAMMs included B.1.1.7 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; status and patient sex as fixed effects, with county and partial postcode included as random effects. We included patient age and the days since the first diagnosis in the dataset as non-linear penalised regression splines. The k parameter of the penalised regression splines was set to maximum possible value in each case, with the intention that regularisation occur through the prior. The full dataset was additionally analysed using a phylogenetic cumulative generalised additive mixed model (PGAMM). The PGAMM was a modification of the GAMMs described above, where instead of including B.1.1.7 status as a fixed effect, we included a random effect of phylogenetic relationship between viral isolates (using a variance-covariance matrix calculated from the virus phylogeny under a Brownian motion assumption using the vcv.phylo function in ape (v. 5.5) (21)). All severity models were fitted using the brms (v. 2.14.4) R package (22). All presented models had no divergent transitions and effective sample sizes of over 200 for all parameters. Additionally, we fitted Bernoulli models with the same covariate set as the cumulative model for supplemental oxygen and mortality individually (an individual model for high flow oxygen/ventilation was attempted but could not be fitted due to the low numbers of events in some cells).
Comorbidities were only available for patients from the Greater Glasgow and Clyde health board (n = 639). Comorbidities used were those previously identified as important for COVID-19 severity by the ISARIC4C consortium (23). To test whether the lack of comorbidity data for the rest of the sample was leading to biased estimates of the impact of B.1.1.7 lineage infection, we performed three analyses on the Greater Glasgow and Clyde patient population. We fit the above model with the number of comorbidities a patient exhibited included as non-linear penalised regression spline. While the exact form of the relationship between severity of infection and the number of comorbidities a patient exhibits is unknown, we would expect the relationship to be monotonically increasing, however, for mathematical simplicity, we do not enforce this constraint on the spline. We also fit the model to this patient population without the comorbidities included and with the comorbidities permuted in order to estimate the change in the estimate of the B.1.1.7 effect by the inclusion of comorbidities. As the inclusion of comorbidities was found not to change the estimated effect of B.1.1.7, this analysis is presented in Supplementary Appendix 3.
Priors were defined over classes of parameters. Priors were designed to be informative for the scale of the parameters, but not for the precise values. The same classes received the same priors in each model. The intercepts of the models were given t-distribution (location = 0, scale = 2.5, df = 3) priors, fixed effects were given normal (mean = 0, standard deviation = 2.5) priors, random effects and spline standard deviations were given exponential (mean = 2.5) priors.

Appendix 2 -Phylogenetic severity model
The estimates of the severity per isolate shown in Figure 3 were generated by a model making several assumptions, which were violated. The key assumptions used and their impacts will be discussed in this appendix (see 1 for deeper discussion of some the issues involved). Despite the violation of the assumptions, the answer generated was consistent with the non-phylogenetic method and the output is illustrative, so the results are included in the main text, though not stressed.
The first major assumption is that the source phylogeny is known without error. This can be practically broken into two assumptions. Firstly, that tree-like evolution is the correct description of the underlying evolutionary process, i.e. that horizonal gene transfer is unimportant. This appears to be a relatively safe assumption in SARS-CoV-2. Secondly, that the phylogenetic tree is correctly estimated. This is likely to be violated as there may be error in both the discrete branching structure (or topology) and real-valued branch lengths. While the topology may be correctly estimated, the . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted August 24, 2021. ; https://doi.org/10.1101/2021.08.17.21260128 doi: medRxiv preprint probability of estimating all the branch lengths correctly is vanishingly small. This is unlikely to be a large practical issue however, as small errors in the branch lengths of the phylogeny are unlikely to have large impacts relative to other model misspecification issues present in all statistical analyses.
If we are willing to assume that the estimated phylogeny is good enough for our purposes, we then must assume some model of the evolution of the trait of interest across that phylogeny. This model of the change in the trait (severity) across the phylogeny is what allows the conversion of the phylogenetic tree into a variance-covariance matrix. This describes the expected covariances (rescaled to correlations) between the severities associated with infection with different genetic variants. Here we made a common simple choice and assumed Brownian motion evolution of the trait across the phylogeny. However, this model has been acknowledged as often suboptimal since its inception (1), and we can consider it particularly so here. The number of observed changes across SARS-CoV-2 genomes are relatively few, and the number of amino acid changes even fewer, with some mutations occurring repeatedly in different lineages. Few mutations with combined with semifrequent homoplasy represent a particularly problematic case for this model, as severity would be expected to change discretely with mutations and in consistent directions when convergent changes occur (in the absence of extreme epistatic effects on severity), two things that simple Brownian motion does not allow. Theoretically, model extensions using Levy processes may allow discrete jumps in trait value along a phylogenetic tree, however implementing such a model was beyond the scope of this study. Future work will explore more realistic evolutionary models for change in severity with genomes, which will reduce the error potentially imposed by this assumption.

Appendix 3 -Comorbidities
In the Greater Glasgow and Clyde population for which comorbidity data was available, the model without inclusion of comorbidities estimated the odds ratio for the impact of B.1.1.7 on severity as 1.06 (95% CI: 0.70, 1.58). When number of relevant comorbidities a patient had were included but permuted, so as to break any relationship with the response, a similar odds ratio was estimated (1.06: 95% CI: 0.70, 1.60). The inclusion of the number of relevant comorbidities a patient exhibited did not substantially change this result (odds ratio for impact of B.1.1.7 lineage viruses: 1.13; 95% CI: 0.73, 1.72). This is not unexpected, as the distribution of comorbidities was similar between those patients infected with B.1.1.7 lineage viruses and those infected with non-B.1.1.7 lineage viruses.