Abstract
The Beta variant of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) emerged in South Africa in late 2020 and rapidly became the dominant variant, causing over 95% of infections in the country during and after the second epidemic wave. Here we show rapid replacement of the Beta variant by the Delta variant, a highly transmissible variant of concern (VOC) that emerged in India and subsequently spread around the world. The Delta variant was imported to South Africa primarily from India, spread rapidly in large monophyletic clusters to all provinces, and became dominant within three months of introduction. This was associated with a resurgence in community transmission, leading to a third wave which was associated with a high number of deaths. We estimated a growth advantage for the Delta variant in South Africa of 0.089 (95% confidence interval [CI] 0.084-0.093) per day which corresponds to a transmission advantage of 46% (95% CI 44-48) compared to the Beta variant. These data provide additional support for the increased transmissibility of the Delta variant relative to other VOC and highlight how dynamic shifts in the distribution of variants contribute to the ongoing public health threat.
Main text
By the beginning of 2021, the emergence and international spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants of concern (VOC) with increased transmissibility or partial immune evasion properties completely altered the dynamics of the coronavirus disease 2019 (COVID-19) pandemic. The detection of Alpha, Beta and Gamma variants in the UK, SA and Brazil respectively resulted in a renewed worldwide race to contain new waves of infection and mortality and re-affirmed the need for global access to vaccines1–3. In these VOC, the accumulation of amino acid changes in the spike glycoprotein and other viral proteins have significant phenotypic impact, most notably increased transmissibility4. The Beta variant was also associated with resistance to neutralizing antibodies, contributing to reduced vaccine protection against symptomatic COVID-195–7.
On 11 May 2021, the World Health Organization (WHO) designated a fourth VOC as Delta (initial Pango lineage B.1.617.2). The Delta variant was first sampled in October 2020 in India8 and was later associated with a massive resurgence of infections throughout that country from March 20219. The Delta variant has since spread globally (sampled in 180 countries as of 14 September 2021), where in many cases it has competitively replaced previously circulating variants. This has been demonstrated most clearly with the replacement of the previously dominant Alpha variant in countries such as the United Kingdom10. Here, we describe in the context of South Africa how the Delta variant rapidly replaced the previously dominant Beta variant, fueling a third wave in South Africa associated with a high number of cases and deaths.
The South African epidemic has been characterized by three distinct waves, causing almost 250 000 excess deaths by the end of August 2021, of which 85-95% are estimated to be COVID-19 deaths11–13. The first wave from March 2020 to September 2020 was characterized by multiple co-circulating SARS-CoV-2 lineages14, whereas the second wave between November 2020 and February 2021 was dominated by the Beta variant1. Initial estimates suggested that the Beta variant was associated with a transmission advantage of 23-50% compared to ancestral lineages, although there remains uncertainty around whether this was predominantly related to an inherent increase in the transmissibility of the Beta variant, or partly to reduced cross-protection from natural immunity following prior infection with a different lineage4,15,16. After the second wave, estimates of seroprevalence from blood donor surveys ranged from 32% to 62% across the nine provinces of South Africa, with a weighted national estimate of 47%17. Between March and May 2021, restrictions were eased (adjusted level 1 lockdown18), daily case counts stabilized at below 1500 per day and the effective reproduction number (Re) hovered around 119. The national vaccination rollout in South Africa only began on 17 May 2021 (around 480 000 health care workers had been vaccinated before then as part of a phase 3b clinical trial20). Starting in May, with minimal restrictions still in place (adjusted level 1 lockdown18), a resurgence in cases began in the inland provinces such as the Northern Cape, Free State and, Gauteng, which was followed by all other provinces. Despite the re-introduction of higher-level restrictions at the end of May18, incidence continued to increase, peaking at around 20 000 cases per day in early July (Fig. 1a). By the end of August, there had been over 90 000 excess deaths in the third wave, similar to the number observed in the second wave11 (Extended Data Fig. 1).
Figure 1b shows the distribution of variants over time at a national level based on genomic surveillance throughout the second and third waves. Beta remained dominant from November 2020 through the second wave and the post-wave period of lower-level transmission. We detected several additional variants of interest and variants of concern sporadically during this post wave period, including the Alpha variant from January 2021 onwards, but it remained at a low frequency nationally and was not associated with a significant resurgence of cases in any province (Fig 1c). In April 2021, through intensified sampling of cases associated with recent travel from India, we started to detect the Delta variant.
However, retrospective sequencing of samples collected earlier in the Northern Cape demonstrated the presence of Delta in that province from as early as 10 March 2021, although Beta remained dominant as the incidence of SARS-CoV-2 increased in that province in April and May (Fig 1c). Through May and June 2021, Delta rapidly displaced Beta and this shift was associated with a rapid increase in SARS-CoV-2 incidence (Fig. 1a, b). The dominance of Delta was consistently observed through increased genomic surveillance during the third wave, with the detection of Beta drastically decreasing to almost none in the last weeks (Extended Data Fig. 2a). We estimated that Delta has a growth advantage of 0.089 (95% confidence interval [CI] 0.084-0.093) per day compared to Beta (Fig. 1d). Assuming the same generation time of 5.2 days for both variants21, this corresponds to a growth advantage of 46% (95% CI 44-48%) per generation of viral transmission. Even though Delta has been shown to exhibit earlier viral growth and high viral load in vivo compared to earlier variants22, the serial interval of Delta (and consequently the viral generation time) does not seem to differ substantially23. However, the degree to which the growth advantage of Delta is mediated by inherent increased transmissibility and/or immune evasion cannot be determined16.
Phylogenetic analysis of 5602 Delta genomes from South Africa (sampled between 10 March and 20 August 2021) against a globally representative set of other Delta genomes (n=4983) revealed at least 72 introductions of the Delta variant into South Africa (Fig 2b), with just under half (43%) originating from India (Extended Data Fig. 4). Between January and May 2021, India was in the top five leading countries in terms of overseas international travelers entering South Africa, with an average of 1000 travelers entering per month 24. Following introductions into South Africa, the Delta variant appears to have spread in several monophyletic clusters spanning multiple provinces (Fig. 2a and Extended Data Fig. 3a), including one particularly large transmission cluster at the top of the tree consisting of 2680 sequences (Fig. 2a). In fact, cluster analysis revealed that 38.5% (2161/5602) of Delta genomes from South Africa belonged to monophyletic clusters of five or more sequences (Extended Data Fig. 3a). Phylogeographic reconstruction of the largest cluster (Fig. 2a, Cluster A) shows its origin in the Gauteng province dated back to 8 May 2021 (95% highest posterior density ranging from 1-12 May 2021) and followed by dissemination to all provinces (Fig. 2d). This illustrates nationwide monophyletic transmission of the Delta variant, characteristic of high transmissibility. Delta sequences in South Africa comprised various sub-lineages with the most represented ones being B.1.617.2, AY.4 and AY.12 (Extended Data Fig. 2b).
These findings provide more support that, at this stage in the pandemic, the inherent increased transmissibility of the Delta variant compared to ancestral strains and other VOC, gives it a transmission fitness advantage over other VOC. Although genomic surveillance in Africa remains heterogeneous25,26, there is good evidence that Delta has also rapidly replaced Beta in many other African countries27. This has also been observed in other countries outside Africa such as Bangladesh, where Beta dominated a second wave and then Delta replaced Beta, driving a more severe third wave27. This also suggests that regions where other VOC or variants of interest (VOI) have dominated, especially South America with the dominance of Gamma and Lambda, could remain susceptible to further resurgences driven by the Delta variant.
Whether partial immune evasion also contributed to the transmission fitness advantage of Delta is difficult to establish at the moment. We previously showed that infection with the Beta variant elicited humoral immune responses that offered strong neutralizing antibody cross-protection against ancestral lineages and some other VOC, but this was prior to the emergence of the Delta variant5,28. Beta and Delta are antigenically distinct29, and it remains possible that, in the context of the South African epidemic, reduced cross-protection may have contributed to the transmission fitness advantage of Delta.
In conclusion, the Delta variant rapidly replaced the Beta variant in South Africa and fueled a third wave throughout the country which was associated with a high number of deaths. This again highlights the importance of strengthening genomic surveillance to support the ongoing pandemic response. In this phase of the epidemic, as population immunity will have increased further (from a mix of natural infection and vaccination), we are now closely monitoring for the emergence of variants that combine partial immune evasion with enhanced transmissibility30. To this end, we recently reported the emergence and functional characterization one such lineage (C.1.2) and we are monitoring this closely to assess how it competes with the dominant Delta variant31. The continued evolution of SARS-CoV-2 is a reminder of the ongoing public health threat and highlights the importance of addressing vaccine inequity and accelerating vaccine delivery in all parts of the world.
Data Availability
All sequence data is public at GISAID. Short reads (i.e. FastQ files) are available at the Short Read Archive of the NCBI.
Data availability
All of the SARS-CoV-2 Delta genomes generated by NGS-SA and presented in this article are publicly accessible through the GISAID platform (https://www.gisaid.org/). The GISAID accession identifiers of the Delta sequences analysed in this study are provided as part of Extended Data Table 1. Other raw data for this study are provided as a supplementary dataset at https://github.com/krisp-kwazulu-natal/SARSCoV2_Delta_South_Africa. The reference SARS-CoV-2 genome (MN908947.3) was downloaded from the NCBI database (https://www.ncbi.nlm.nih.gov/).
Code availability
All custom scripts to reproduce the analyses and figures presented in this article are available at https://github.com/krisp-kwazulu-natal/SARSCoV2_Delta_South_Africa.
Author Contributions
Produced SARS-CoV-2 genomic data: J.G., S.P., Y.N., U.R., H.T., J.E.S, D.G.A, J.E., T.M., A.N., B.M., N.N., Z.T.K., Z.M., N.W., C.S., A.Ismail, D.D., R.J., A.S., A.M., M.D., S.H.M., Y.R., A.M., W.A.K., D.T., U.J.A, L.S., S.E., T.M., G.v.Z, G.M., A.Iranzadeh, P.A.B., M.M.N., K.S.
Collected samples and curated metadata: N.M., K.Mlisana, K.Marais, S.K., D.H., N-y.H.,
Analysed the data: H.T., E.W., C.L.A., M.G., J.E.S, V.F., R.J.L, and T.dO.
Helped with study design and data interpretation. H.T., E.W., D.M., L.C.J.A, F.T., M.V., D.G., W.P., J.N.B., A.v.G., C.W., R.J.L and T.dO
Wrote the initial manuscript, which was reviewed by all authors. H.T., E.W., J.E.S, R.J.L, and T.dO
Materials and Methods
Ethics statement
Delta SARS-CoV-2 sequences that were used in the present study were generated by the six laboratory hubs in the Network for Genomic Surveillance in South Africa (NGS-SA - https://www.ngs-sa.org/)32. The genomic surveillance was approved by the University of KwaZulu–Natal Biomedical Research Ethics Committee (ref. BREC/00001510/2020), the University of the Witwatersrand Human Research Ethics Committee (HREC, ref. M180832, M210159, M210752), Stellenbosch University HREC (ref. N20/04/008_COVID-19), and the University of Cape Town HREC (ref. 383/2020), the University of Pretoria HREC (100/2017), and the University of Free State Health Sciences Research Ethics Committee (ref. UFS-HSD2020/1860/2710). Individual participant consent was not required for genomic surveillance-this requirement was waived by the Research Ethics Committees. Following sequencing, all consensus sequences are uploaded to GISAID33 and their use is subject to the database terms and conditions.
Epidemiological data
We analyzed daily cases of SARS-CoV-2 in South Africa up to 2nd September 2021, from publicly released data provided by the National Department of Health and the National Institute for Communicable Diseases. This was accessible through the repository of the Data Science for Social Impact Research Group at the University of Pretoria (https://github.com/dsfsi/covid19za)34,35. The National Department of Health releases daily updates on the number of confirmed new cases and reported COVID-19 deaths with a breakdown by province. Population size estimated for each province was based on 2021 mid-year population estimates 36. We retrieved estimates of the effective reproduction number (Re) of SARS-CoV-2 in South Africa from the ‘covid-19-Re’ data repository (https://github.com/covid-19-Re/dailyRe-Data), as of 2 September 202119. Data on weekly excess deaths were retrieved from the South African Medical Research Council Burden of Disease Research Unit report on weekly deaths in South Africa11–13.
Genomic data
NGS-SA partner laboratories primarily use two main sequencing technologies, namely Illumina (Illumina, San Diego, CA) and Oxford Nanopore (Oxford Nanopore Technologies, United Kingdom).
Tiling-based Polymerase Chain Reaction
Ribonucleic acid (RNA) is extracted from residual nasopharyngeal and oropharyngeal specimens of quantitative polymerase chain reaction (qPCR)-confirmed COVID-19 cases using either manual or automated extraction methods. Complementary DNA synthesis was performed on extracted RNA using either SuperScript IV reverse transcriptase (Life Technologies, Carslbad, CA) or LunaScript RT SuperMix Kit (New England Biolabs), followed by gene-specific multiplex PCR, using the ARTIC protocol, as described previously37,38. In summary, SARS-CoV-2 whole-genome amplification by multiplex PCR was performed using primers designed on Primal Scheme (http://primal.zibraproject.org/) to generate 400 base pair (bp) amplicons with 70bp overlaps, covering the 30 kilobase SARS-CoV-2 genome. Some labs used the Illumina COVIDSeq Kit (Illumina, San Diego, CA) to generate amplicons as per the manufacturer’s recommendations. PCR products were purified in a 1:1 ratio, using AmpureXP purification beads (Beckman Coulter, High Wycombe, UK), and were quantified using the Qubit double strand DNA (dsDNA) High Sensitivity assay kit on a Qubit fluorometer (Life Technologies, Carslbad, CA).
Library Preparation and Next-Generation Sequencing
Depending on the partner institution, library preparation and sequencing was done either on the Illumina or Oxford Nanopore Platform.
Illumina Nextera Flex DNA Library Preparation and Sequencing
Depending on the lab, libraries were prepared using either the Nextera DNA Flex Library Prep kit with Nextera CD indexes (Illumina, San Diego) or the Illumina COVIDSeq kit (Illumina, San Diego), according to the manufacturer’s instructions. The libraries were then cleaned using 0.9x sample purification beads (Beckman Coulter, Brea, CA) and eluted in 32 μL resuspension buffer. Libraries were quantified by using the Qubit dsDNA High Sensitivity assay kit on a Qubit fluorometer (Life Technologies, Carslbad, CA). Each sample library was normalized to 4 nM concentration, and denatured with 5 μL of 0.2 N NaOH. The final library was diluted to 8 pM and spiked with 1% PhiX Control v3 (Illumina, San Diego) prior to sequencing on an Illumina MiSeq platform (Illumina), using a MiSeq Reagent Kit v2 (500 cycles). All primer sequences are provided in Table S1. This protocol is available at protocols.io coronavirus-method-development community website (https://www.protocols.io/view/illumina-nextera-dna-flex-library-construction-and-bhjgj4jw) since 17 June 202039.
Oxford Nanopore Library Preparation and Sequencing
Sequencing libraries were generated from the barcoded products using the Genomic DNA Sequencing Kit SQK-LSK109 (Oxford Nanopore Technologies). Briefly, tiling PCR amplicons were generated as described above. Ligation was carried out using UltraII End Prep Reaction Mix and UltraII End Prep Enzyme Mix and barcoded using the Native Barcoding Kit (Oxford Nanopore Technologies, Oxford, UK). Ninety-six barcodes were used in each run. This included 94 samples and two controls. The libraries were cleaned up using AmpureXP purification beads (Beckman Coulter, High Wycombe, UK) in a 1:1 ratio and eluted in 15μl of elution buffer. Quantification was done using the Qubit dsDNA High Sensitivity assay on the Qubit flourometer (Life Technologies, Carslbad, CA). Sequencing libraries were loaded onto a R9.4 flow cell and data were collected for up to 21 hours.
Genome assembly and Quality control
Sequences generated on the Illumina platform were assembled using Genome Detective 1.132/3 while sequences generated using the Nanopore sequencing technology were assembled with the Artic-nCoV2019 SARS-CoV-2 assembly pipeline (https://github.com/connor-lab/ncov2019-artic-nf).
Estimating relative transmission advantage
We analyzed 16,471 SARS-CoV-2 South African sequences from GISAID (with sample collection dates from 1 August 2020 to 5 September 202133. We used a multinomial logistic regression model to estimate the growth advantage of Delta compared to Beta in South Africa4,40. We added splines to account for time-varying growth rates in the model fit and estimated the overall growth advantage of Delta compared to Beta at the time point where the proportion of Delta reached 50%. We fitted the model using the multinom function of the nnet package41 and estimated the growth advantage using the package emmeans42 in R.
Phylogenetic analysis
We analyzed 5602 Delta variants from South Africa, publicly available on GISAID33 as of 2 September 2021, in a phylogenetic context against a globally representative (n=4983) set of other SARS-CoV-2 Delta variants from around the world (selection obtained from Nextstrain publicly available Delta build at the time of analysis: https://nextstrain.org/groups/neherlab/ncov/21A.Delta). The full set of sequences were aligned with NextAlign43 to obtain a good codon quality alignment against the Wuhan-Hu-1 universal reference. The subsequent alignment was then used to infer a Maximum Likelihood tree topology in IQTREE244 (-m GTR, -b 100). Transfer bootstrap support for splits in the topology was inferred using Booster45. The resulting consensus ML tree topology was assessed for molecular clock signal in TempEst46 (Extended Data Fig. 3b). Potential outlier sequences or sequences lacking required metadata (e.g. date and location of sampling) were pruned off the topology with the ape package47 in R prior to dating. The branches in the ML-tree topology were then converted into units of calendar time in TreeTime48 using a constant rate of 0.0008 substitutions/site/year with a clock standard deviation of 0.0004 substitutions/site/year. Following the dating of the phylogeny, we annotated the tips and internal nodes using the “mugration” package, an extension of TreeTime and then counted the state changes from one country to another and their inferred time points. This gave us the number and timing of SARS-CoV-2 Delta viral exchanges between South Africa and the rest of the world. To infer some measure of confidence in the time and source of viral transitions, we performed the discrete ancestral state reconstruction on 10 bootstrap replicate trees.
Cluster analysis was performed with the phylotype software package49 to identify monophyletic clades of South African Delta sequences within the ML-tree. Sequences of the largest monophyletic clade (n=286) were extracted to infer continuous phylogeography histories in BEAST v1.1050. Briefly, sequences from the selected cluster were aligned in NextAlign and ML-tree topologies were inferred in IQTREE2, as previously described. The temporal signal of each cluster was assessed in TempEst v1.5.346 followed by the removal of potential outlier sequences that may violate the molecular clock assumption. Linear regression of root-to-tip genetic distances against sampling dates indicated that the SARS-CoV-2 sequences in that cluster evolved in a relatively strong clock-like manner (correlation coefficient = 0.37, R2 = 0.13) (Extended Data Fig. 3c). Duplicate Markov Chain Monte Carlo (MCMC) analyses for each cluster were executed in BEAST v1.10.4 for 100 million iterations with sampling every 10000 steps in the chain. Convergence of runs was assessed in Tracer v.1.7.151 based on high effective sample sizes and good mixing. Maximum clade credibility trees for each run were summarized using TreeAnnotator after discarding the first 10% of the chain as burn-in. The R package “seraphim”52 was used to extract and map the spatiotemporal information embedded in the MCC trees.
Extended Data Figures
Acknowledgements
We thank the global laboratories that generated and made public the SARS-CoV-2 sequences (through GISAID) used as reference dataset in this study (a complete list of individual contributors of sequences is provided in Extended Data Table 1). This research reported in this publication was supported by the Strategic Health Innovation Partnerships Unit of the South African Medical Research Council, with funds received from the South African Department of Science and Innovation (DSI). Genomics Surveillance in South Africa was supported in part through National Institutes of Health USA grant U01 AI151698 for the United World Antiviral Research Network (UWARN) and by the Rockefeller Foundation (Prof. Tulio de Oliveira and Dr. Eduan Wilkinson). CERI and KRISP have received donations from Chan Soon-Shiong Family Foundation (CSSFF) and Illumina . CW is funded by the South African MRC; Wellcome Trust (2222574/Z/21/Z) and the EDCTP (RADIATES Consortium; RIA2020EF-3030)
Footnotes
↵* These authors jointly supervised this work
Funding: CA received funding from the European Union’s Horizon 2020 research and innovation programme - project EpiPose (No 101003688).