Summary
We report the first local transmission of the Delta SARS-CoV-2 variant in mainland China. All 167 infections could be traced back to the first index case. The investigation on daily sequential PCR testing of the quarantined subjects indicated the viral load of the first positive test of Delta infections was ∼1000 times higher than that of the 19A/19B strains infections back in the initial epidemic wave of 2020, suggesting the potential faster viral replication rate and more infectiousness of the Delta variant at the early stage of the infection. The 126 high-quality sequencing data and reliable epidemiological data indicated some minor intra-host single nucleotide variants (iSNVs) could be transmitted between hosts and finally fixed in the virus population during the outbreak. The minor iSNVs transmission between donor-recipient contribute at least 4 of 31 substitutions identified in the outbreak suggesting some iSNVs could quickly arise and reach fixation when the virus spread rapidly. Disease control measures, including the frequency of population testing, quarantine in pre-symptomatic phase and enhancing the genetic surveillance should be adjusted to account for the increasing prevalence of the Delta variant at global level.
During the global spread of SARS-CoV-2, the genetic variants of the viruses emerged, and some have been proved to be more transmissible or could escape from the host immunity, which posed an increased risk to global public health1–3. An emerging genetic lineage, B.1.617, has been dominant in the largest outbreak of COVID-19 in India since March 2021, gaining global attention. One sublineage, B.1.617.2, with spike protein mutations L452R, T478K and P681R, accounts for ∼28% sequenced cases in Indian and rapidly replaced other lineages to become dominant in multiple regions and countries (https://outbreak.info/)4. The B.1.617.2 has been labeled as Variant of Concern (VOC), Delta (https://www.who.int/activities/tracking-SARS-CoV-2-variants). The virological profile of this VOC is needed to be urgently illustrated.
On May 21, 2021 the first local infection of the Delta variant in mainland China was identified. Similar to what has been done to the early epidemic in January 20205, strict interventions including population screen testing, activate contact tracing, and central quarantine/isolation have been carried out. However, in contrast to the restricted transmissions in 20205, a successive intergenerational transmission has been observed in the 2021 epidemic. Here, we investigated the epidemiological, genetic, and serological data from this well-traced outbreak to characterize the virological profile of the Delta SARS-CoV-2 variant and discuss how the intervention strategies need to be improved on the racing against this emerging variant.
Results
The viral loads in the Delta infections were ∼1000 times higher than those in the earlier 19A/19B strain infections on the day when viruses were firstly detected
From the first index case identified on May 21, 2021 to the last case reported on June 18, 2021, a total of 167 local infections were identified (Figure 1A). All these cases could be epidemiologically or genetically traced back to the first index case. One notable epidemiologic feature of the Delta variant is the shorter serial interval compared with the early Wuhan strains or other VOC variants6–8. However, critical parameters before the illness onset remain elusive, including when the viruses can be detected in a subject after exposure and how infectious they are. Here, we investigated the data from the quarantined subjects in this outbreak and compared it to the previous 2020 epidemic caused by 19A/19B genetic strains. The central quarantined subjects are the close contacts of the confirmed cases/asymptomatic infections. Once a new infection was identified, his/her close contacts were immediately traced, centrally isolated and daily PCR testing was performed. The dataset from quarantined subjects allowed us to determine the time interval between the exposure and reaching detectable viral load by PCR in the infected subjects. Considering the exact exposure time of the intra-family transmissions was difficult to pinpoint, we deducted the intra-family transmission pairs from our time interval analysis. Our results showed the time interval from the exposure to first PCR positive in the quarantined population (n=29) was 6.00 (IQR 5.00-8.00) days in the 2020 epidemic (peak at 5.61 days) and was 4.00 (IQR 3.00-5.00) days in the 2021 (n=34) epidemic (peak at 3.71 days) (Figure 1B). We next evaluate the relative viral loads when SAS-CoV-2 viruses were firstly detected in hosts. Compared to the 19A/19 B strains, the relative viral loads in the Delta variant infections (62 cases, Ct value 24.00 (IQR 19.00∼29.00) for ORF1ab gene) were 1260 times higher than the 19A/19B strains infections (63 cases, Ct value 34.31 (IQR 31.00∼36.00) for ORF1ab gene) on the day when viruses were first detected (Figure 1C). Considering the daily testing performed for the central isolated subjects since the beginning of quarantine, the higher within host growth rate of the Delta variant was proposed, which led to the higher viral loads on the time points when viral nucleotides excess the PCR detection threshold (Figure 1D). Similar to the study done by Roman et.al9., we found samples with Ct value above 30 (<6×105 copies/mL viruses) never yield an infectious isolate in-vitro. For the Delta variant infections, 80.65% samples contained >6×105 copies/mL in oropharyngeal swabs when viruses were firstly detected compared to the 19.05% samples in 19A/19B infections. These data highlight that the Delta variant could be more infectious during the early stage of the infection (Figure 1D).
As we know, individuals undergo a latent period after infection, during which viral titers are too low to be detected. As viral proliferation continues within host, the viral load will eventually reach detectable level and become infectious. Knowing when an infected person can spread viruses is essential for designing intervention strategies to break chains of transmission. However, this is difficult to study based on clinical investigations since over 50% of transmission occurred during the pre-symptomatic phase10. Our investigation on the quarantined subjects suggested for the Delta variant, the time window from the exposure to the detection of viruses was peaks at ∼3.7 days and presented a higher infectiousness/transmission risk when the virus was first detected. In response to this notable viral parameter, the government required people leaving the Guangzhou city from airports, train stations and shuttle bus stations to show proof of a negative COVID-19 test within 72 hours on June 6 and further shorten into 48 hours on June 7, in contrast to the seven days in the 2020 epidemic.
Minor iSNVs transmitted between hosts and fixed in the virus population
The non-pharmaceutical interventions in Guangdong mainly focus on the epidemiological investigation, contact tracing and mass testing. Approximately 30 million PCR tests have been performed from May 26, 2021, to June 8, 2021. The intense testing and screening of high-risk population make cryptic transmissions unlikely and all identified infections could be directly (via direct contact) or indirectly (staying in or visiting the same area) connected. In addition, all sequences could be genetically traced back to the first index case. This provided an unique opportunity for us to characterize the viral transmission at a finer scale, particularly the extent of virus genetic diversity transmitted among hosts. Whole-genome deep sequencing was performed on all identified infections by using the Artic primer set with Illumina platform, and 126 high-quality viral genomes (coverage>95%) were obtained, covering 75% of all identified infections (Figure 1A).
Phylogenetic analysis performed by including 346 sequences of imported cases who travelled from 66 countries to Guangdong between March 2020 and June 2021. For reference sequences, 50 genomes sequences were randomly selected from each defined clades (13 clades) based on the nextstrain classification (https://nextstrain.org/) and notified VOC (Alpha, Beta, Gamma, Delta). The dynamic of viral lineage distribution in these imported cases roughly revealed the circulation of SARS-CoV-2 genetic lineages at the global level and also highlighted the challenges for diseases control and prevention in Guangdong, China (Figure 2A).
Viral phylogenies of the Guangzhou outbreak were constructed based on the consensus sequence of each sample, with branches indicating the number of mutations in the consensus sequence among samples and the x-axis indicated the total mutations from the reference sequence (Wuhan-Hu-1, MN908947). The consensus sequence for each sample was generated based on majority variance (>50%) on each position. All Guangzhou outbreak sequences segregated into a single cluster (Figure 2A). Compared with the first index case (XG5137_GZ_2021/5/21), 31 substitutions were identified from 126 cases during the 26-days outbreak (Figure 2B). The most distant sequence presented only four nucleotides’ difference from the index case sample. This suggests, during an outbreak, the relatively low substitution rate of SARS-CoV-2 presents challenges to infer transmission chains purely base on consensus sequences11–13. To infer the minor sequence variations along with the viral spreading, we estimated within-host virus diversity for each sample by mapping polymorphic sites onto the consensus genome of the index case (XG5137_GZ_2021/5/21) to generate a list of intra-host single-nucleotide variants (iSNVs). Minor iSNVs were called by setting 3% as the threshold of minor allele frequency to exclude the potential PCR and sequencing errors 12,14,15. There were sequences having minor intra-host single nucleotide variants (iSNVs) in 10 of 31 substituted positions which may shed lights on how variants have emerged, grew and finally fixed during the epidemic. We listed out sequences with these minor iSNVs and the sequences have the corresponding variants fixed (Figure 2B). The core question we would like to answer is that the accumulated genetic variations observed in the SARS-CoV-2 epidemic were dominant by de-novo mutations in an individual or the iSNVs transmission followed by the fixation in the new host. Contact tracing and epidemiological investigation enabled us to assign the epidemiological relation of these sequences with a high confidence. As shown in Figure 2C, at least some minor iSNVs could successfully transmitted from the donor to the recipient(s). The minor iSNVs C27086T could be transmitted from the index case to 2 of 3 recipients. With the spread of the viruses, this substitution was fixed in virus population in the outbreak (Figure 2B). The substitutions C925T, T21673C, and G27265T found in 6191, 5371 and 6486 cases could have the corresponding iSNVs been traced back to the possible donors (Figure 3C). However, we also noticed that some minor iSNVs including G11083T in samples 5851 and 5859, G21137A and T25082G in sample 5851 which could not be called out (threshold as 3%) due to low frequency but could be observed in the reads mapped to the reference genome (Figure S1). The high-density sampling and sequencing indicate the genetic variation generated during the epidemic was at least partly due to the iSNVs transmission between donor-recipient, although the successive iSNVs transmissions were not observed possibly due to the interventions on the virus transmission. More importantly, this transmission could be repeatedly observed when one donor had multiple recipients (Figure 2C, site 27086). In this circumstance, some transmission-enhancing or immune-escape SARS-CoV-2 iSNVs may be more likely to arise and reach fixation when the virus spread rapidly.
In this study, we characterize a large transmission chain originated from the first local infection of the SARS-CoV-2 Delta variant in mainland China. A potential higher viral replication rate of the Delta variant is proposed, which leads the viral loads in Delta infections to be ∼1000 times higher than the 19A/19B strains infections on the day when the testing turns to be positive. This highlights more infectiousness of Delta variant during the early stage of infection is very likely, and the frequency of the population screening should be optimized for the intervention16. The more infectiousness of the Delta variant infections in pre-symptomatic phase highlights the need of timely quarantine for the suspicious infection cases or closely contacts before the clinical onset or the PCR screening. Although the intra-host SNVs are at a low level, the minor iSNV transmission is observed resulting a part of fixed substitutions in the virus population during the outbreak. These data indicate some advantage or neutral mutations even at a low frequency could potentially rise and be fixed in the one generation of transmission, and further reach predominant in virus population if the epidemic could not be well contained.
Data Availability
Some or all data, models, or code generated or used during the study are proprietary or confidential in nature and may only be provided with restrictions.
Competing interests
The views expressed in this article are those of the authors and not necessarily those of the Guangdong Provincial Center for Diseases Control and Prevention, or the Guangdong Provincial Institute of Public Health.
Methods
Ethics
This study was approved by the institutional ethics committee of the Guangdong Provincial Center for Disease Control and Prevention (GDCDC). Written consent was obtained from patients or their guardian(s) when samples were collected. Patients were informed about the surveillance before providing written consent, and data directly related to disease control were collected and anonymized for analysis.
Sample collection, clinical surveillance and epidemiological data
Since the first local SARS-CoV-2 infection reported on May 21 in the capital city of Guangdong, the enhanced surveillance was performed by Guangdong CDC and local CDCs to detect suspected infections. Epidemiological investigations had been done on all confirmed cases. Population screening were performed by third-party detection institutions. Once virus positive samples were confirmed by local CDCs or other institutions, the samples were required to send to Guangdong CDC immediately. To make the results comparable, in Guangdong CDC, the real-time reverse transcription PCR (RT-PCR) were performed by using the same commercial kit (DaAn Gene) and RT-PCR machine (CFX96) as the previous studies5,17. The exposure history for positive cases and their close contacts were obtained through an interview, public video monitoring systems and cell phone apps, etc. Information regarding the demographic and geographic distribution of SARS-CoV-2 cases can be found at the website of Health Commission of Guangdong Province (http://wsjkw.gd.gov.cn/xxgzbdfk/yqtb/). The surge population screening test ensure all possible infections were identified and the donor-recipient transmission pairs were assigned with high confidence.
Virus amplification and sequencing
Virus genomes were generated by two different approaches, (i) using commercial sequencing kit of BGI (ATOPlex 1000021625) and sequencing on the BGI MGISEQ-2000 (n=25), and (ii) using version 3 of the ARTIC COVID-19 multiplex PCR primers (https://artic.network/ncov-2019) for genome amplification, followed by library construction with Illumina Nextera XT DNA Library Preparation Kit and sequencing with PE150 (n=63) or SE100 (n=38) on Illumina Miniseq. We report only high-quality genome sequences for which we were able to generate >95% genome coverage.
Total RNAs were extracted from oropharyngeal swab samples by using QIAamp Viral RNA Mini Kit (Qiagen, Cat. No. 52904). The multiplex PCR amplification were performed by using the commercial kit of BGI or following the general method of multiplex PCR as described in (https://artic.network/ncov-2019). The bioinformatics pipeline for BGI platform (https://github.com/MGI-tech-bioinformatics/SARS-CoV-2_Multi-PCR_v1.0) was used to generate consensus sequences and call single nucleotide variants relative to the reference sequence. For sequence data from Miniseq, the raw data were first quality controlled (QC) using fastp18 to trim artificial sequences (adapters), to cut low-quality bases (quality scores < 20). PCR primers were trimmed by using cutadapt version 3.119 or other published method20. The mapping of cleaned reads was performed against the genome of the first index case (5137_GZ_2021/5/21) using BWA 0.7.1721. The consensus sequences were determined with iVar 1.2.122, taking the most common base as the consensus (>50% frequency). An N was placed at positions along the reference with the sequencing depth fewer ≤ 10. We identified intrahost single nucleotide variants relative to the reference genome (5137_GZ_2021/5/21) with iVar 1.2.1 using the following parameters: alternated frequency at SNV site ≥ 3%; alternated depth at SNV site ≥ 30; iVar p-value of < 0.0001. The nextstrain pipeline23 was used to analyze and visualize the genetic distribution of SARS-CoV-2 infections and its dynamic change in Guangdong between January 2020 and June 2021. Maximum likelihood (ML) tree was estimated with phyml24 using the HKY+Q4 substitution model with gamma-distributed rate variation25. The branch length was recalculated as the number of mutations to the reference sequence of the first index case. The tree was visualized with R package of ggtree26.
Data availability
All sequencing reads after primer trimming and mapped to the reference sequence (the sequence of the first index case, XG5137_GZ_2021/5/21) have been submitted to the National Genomics Data Center (https://bigd.big.ac.cn/) with submission number CRA004484. The generated consensus sequences were submitted with accession number GWHBDIM01000000 – GWHBDNH01000000.
Acknowledgements
We gratefully acknowledge the efforts of China national CDC, Guangdong local CDCs, hospitals, and the third-party detection institutions in epidemiological investigations, sample collection, and detection. This work was supported by grants from Science and Technology Planning Project of Guangdong (2018B020207006), the Key Research and Development Program of Guangdong Province (2019B111103001), and Guangdong Workstation for Emerging infectious Disease Control and Prevention, Chinese Academy of Medical Sciences (2020-PT330-004).
Footnotes
↵# Joint first authors.