Abstract
SARS-CoV-2, the causative agent of COVID-19, emerged in late 2020. The highly contagious B.1.617.2 (Delta) Variant of Concern (VOC) was first identified in October 2020 in India and subsequently disseminated worldwide, later becoming the dominant lineage in the U.S. Despite considerable genomic analysis of SARS-CoV-2 in the U.S., several gaps in the understanding of the local dynamics during early introductions remain, which when elucidated could translate the results of viral genomic epidemiology to actionable mitigation efforts. Here, we explore the early emergence of the Delta variant in Florida, U.S. using phylogenetic analysis of representative Florida and globally sampled genomes. We find multiple independent introductions into Florida primarily from North America and Europe, with a minority originating from Asia. These introductions lead to three distinct clades that demonstrated varying relative rates of transmission and possessed five distinct substitutions that were 3-21 times more prevalent in the Florida sample as compared to the global sample. Our results underscore the benefits of routine viral genomic surveillance to monitor epidemic spread and support the need for more comprehensive genomic epidemiology studies of emerging variants. In addition, we provide a model of epidemic spread of newly emerging VOCs that can inform future public health responses.
Introduction
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19 disease, emerged in Wuhan city, China at the end of 2019 [1]. The virus subsequently spread globally, resulting in over 260 million infections and five million deaths. This global account included more than 45 million cases in the United States (U.S.) as of November 30th, 2021. Several SARS-CoV-2 variants have since emerged globally through continuous viral genome evolution [2]. The emergence of novel SARS-CoV-2 variants of concern (VOC) with increased infectivity, transmissibility, and/or virulence potential as compared to their progenitor lineage poses serious public health concerns. In particular, it is difficult to predict how VOCs will affect pandemic dynamics in the context of varying population immunity. Most concerning are VOCs that evade vaccine-induced immunity due to immune pressure[3].
Emergence of VOCs in the U.S. began in the fall of 2020 [4,5]. The SARS-CoV-2 Delta variant (B.1.617.2) was first identified in India in October, 2020 and became the dominant variant with a ferocious wave of infection in April-May of 2021. After April, 2021, the variant had spread in and among 65 countries across six continents globally [6] and was designated as a VOC by the U.S. Centers for Disease Control and Prevention (CDC)[7] and World Health Organization (WHO)[8] on May, 2021. In the United Kingdom (UK), the variant spread rapidly due to travel importation followed by community transmission and became the dominant variant responsible for 90% of the cases by June, 2021, overtaking the previously predominant Alpha variant (B.1.1.7) [9] [10]. In the U.S., the Delta variant was first detected by mid-March, 2021, and rapidly led to a new wave of infection in many locations. A nationwide study in the U.S. reported a drop of Alpha variant cases from 70% in late April to 42% in mid-June, with the B.1.617.2 variant leading much of the shift [11]. Beginning in the summer of 2021, Delta became the predominant lineage in North America, as well as in South Africa, India, UK and Canada [10,12–15].
Compared to its predecessor Alpha, the Delta variant possesses 12 notable mutations in the genomic region encoding the spike protein that provide a fitness advantage. In particular, the D614G substitution, which is shared with other VOCs including Alpha, Beta, Gamma and the more recently emerging Omicron, has been associated with increased transmissibility, infectivity, and severity relative to the original circulating strains [13,16–20], and experimental studies showed reduced neutralization of B.1.617.2 by vaccine/convalescent sera [21,22]. Multiple observational studies also reported on the reduced efficacy of the COVID-19 vaccines [19,23,24], higher replication efficiency with an average doubling time of 25 days [25], and increased attack rate among younger, unvaccinated age groups [26]. The prevalence of Delta variants in unvaccinated individuals was three times that in fully vaccinated individuals in the UK, and social mixing had the potential to increase in infections, even with high levels of vaccination [26]. A recent study in South Korea indicated that pre-symptomatic transmission, superspreading potential, and the higher transmissibility of the Delta variant likely contributed to the secondary infection rate [27].
Florida is the 3rd most populous state in the U.S., a popular tourist destination, and a conduit to the rest of the country through its 26 major airports and eight major ports. Historically, Florida has been the epicenter of notable public health events and responses such as the emergence of Zika Virus in 2016 [28]. As such, the dynamics of introduction, spread, and exportation are important for the region. These dynamics are of particular importance since the emergence and subsequent spread of Delta varied considerably across geographic locations based on host demographic structure, circulating variants, historical incidence, and vaccination rates[29–31]. At the time Delta was introduced to Florida, there were no enforced statewide travel restrictions or mask requirements, schools were conducted in-person, and vaccination was limited to individuals over 40 years of age or in target risk groups. Here, we sought to investigate the early emergence of Delta in Florida during a period with high background circulation of another variant (Alpha) by analyzing SARS-CoV-2 viral genome sequence data from Florida in the context of global data. Our findings provide insight into the introduction dynamics of future SARS-CoV-2 variants.
Materials and Methods
Dataset
All available genomes assigned as Delta were downloaded, excluding the low coverage and incomplete records, from GISAID (www.gisaid.org) as of July 13th, 2021. Subsampling was performed to reduce the dataset for feasible computational analysis. A subsampling strategy was chosen to minimize the unbalanced representation of location-specific data while maximizing the genetic diversity and temporal distribution of sampling times for each geographic location using the Temporal And diveRsity Distribution Sampler for Phylogenetics (TARDiS) [32]. Each country was considered a distinct geographic location, with Florida considered distinct from the remainder of the United States. Subsampling 213 genomic sequences from each of these 18 locations resulted in 5,500 total sequences.
Viral genome sequences were aligned using ViralMSA with default parameters [33] using Wuhan-1 (MN908947.3) as reference. Manual curation using Aliview was performed to specify start site and remove artifacts at the terminal regions [34]. Maximum likelihood (ML) tree reconstruction using IQTREE (version 1.6.10) was employed, using the best-fit model of nucleotide substitution according to the Bayesian Information Criterion (BIC) as indicated by the Model Finder function [35]. The statistical robustness of individual nodes was determined using 1,000 bootstrap replicates. The final dataset comprised 5,500 complete or near-complete SARS-CoV-2 genome sequences.
Phylodynamic Analysis
We then sought to estimate the change in virus effective population size (Ne) over time and dynamic aspects of population structure in the context of Florida transmission. The ML tree was first assessed for the presence of temporal signal by evaluating the linear relationship between genetic distance of each tip from the best-fitting root of the tree (rooted on the branch that minimizes the mean square of the residuals) with its respective sampling time using TempEst v1.5.3 [36]. Temporal signal was evidenced by a positive relationship, allowing for subsequent dating of internal nodes via molecular clock calibration of tree tips using the treedater package in R v3.6.0 [37,38]. A constant evolutionary rate of 8.0×10−4 nucleotide substitutions per site per year[12,39] was used to re-scale branch lengths (substitutions/site) of the ML tree in units of time within TimeTree[40]. To estimate an upper limit of the number of and source of viral introductions into Florida, we then fit a migration model in TreeTime for which geographical locations were assigned to external (known based on sampling location) and internal nodes (inferred) [40]. Using the resulting annotated tree topology, we quantified transitions between internal nodes assigned to Florida and the other geographic locations included in the dataset, which represented putative importations and exportations.
The population structure of Florida Delta genomes was assessed to determine dominant sub-lineages indicative of distinct epidemic foci. Focused Bayesian molecular clock analysis was then conducted on the resulting individual clades using BEAST 1.10.4 [41]. Simultaneous molecular clock calibration and coalescent Ne estimation was modeled using a strict molecular clock (constant evolutionary rate), HKY model of nucleotide substitution [42], and exponential growth in population size [43]. We computed Markov chain Monte Carlo (MCMC) triplicate runs of 100 million states each, sampling every 10,000 steps for each data set. Tracer v.1.7.1 was used to evaluate effective sampling sizes for relevant parameters, using 200 as the minimum threshold for sufficient MCMC sampling [44]. Maximum clade credibility trees were summarized from the MCMC samples using TreeAnnotator after discarding 10% as burn-in.
Last, we assessed variation in relative transmission dynamics among individual clades as well as local Florida sub-trees within each clade in the context of mutational patterns. For each cluster, we calculated the Oster statistic, which is a function of the size in tips, the sum of the branch lengths, and the length of the longest branch within an individual subtree [45,46]. This method was recently developed to infer transmission rates using human immunodeficiency virus (HIV) sequencing data.
Results
By July 2021, 2.4 million COVID-19 cases were reported in Florida and an estimated 11.3 million people had received at least one dose of the vaccine. A significant reduction in the number of infections in Florida was observed at the beginning of 2021, during a period that was dominated by the Alpha variant (B.1.1.7, Figure 1A). After a period of seemingly stable epidemic recession, with few new cases detected between February – June 2021, a new epidemic wave impacted the state, resulting in a significant increase in incidence. This wave coincides with the introduction of the Delta variant to the state, increasing from 28.7% to 54.3% of genomes submitted to GISAID (July 13th, 2021, Figure 1 A). Within 3 months of its introduction, the Delta variant had completely supplanted Alpha as the most prevalent variant, accounting for 7.5% of all Florida sequenced genomes. When the Delta wave began in March 2021, vaccination was limited to healthcare professionals, individuals >40 years of age, and persons at high risk for severe disease (e.g., immune-compromised). As a result, only 28% of the Florida population had received one vaccine dose and ∼10% were fully vaccinated (Figure 1B). On April 5, vaccine eligibility was extend to individuals >18 years of age [47] and the Food & Drug Administration approved the Pfizer COVID-19 vaccine for individuals 12 years of age or older on May 12th (Figure 1) [48].
A: Distribution (y-axes) of previously considered variants of concern (VOC) over time (x-axis) in terms of number of weekly confirmed cases (top), count of sequenced isolates stratified by VOC (middle), and the proportion of VOC (bottom). Remaining lineages are grouped into “Other”. B: Vaccinated individuals (defined as incomplete for a single dose of Moderna or Pfizer and fully for two doses of Moderna or Pfizer and a single dose of Johnson&Johnson, top) and cumulative number of weekly confirmed cases (bottom) over time (x-axis).
Analysis of 5,500 publicly available genomes (446 Florida genomes with a representative subset of 5,054 genomes sampled globally) showed that Florida samples possessed characteristic Delta substitutions in the region encoding the spike protein except for a E156G substitution, which was not found among early Florida genomes (Figure 2). Five, less common substitutions emerged among Delta variants from Florida, including K77T, A222V, V289I, and V1264L, which were 3-21 times more prevalent in the Florida sample as compared to the global sample. In addition, T95I, common also among Iota and Mu variants, was found to be at a lower prevalence than the global sample (8.5% instead of 22.1%).
The x-axis shows the coordinates of the region encoding the spike protein with its regions and the y-axis shows the proportion of genomes carrying the substitution. The substitutions with a prevalence greater than 2% are annotated. The inlaid table includes the Delta variant characterizing substitutions in spike protein region, the prevalence among Florida genomes, and the prevalence among worldwide Delta genomes sampled from GISAID (*) (https://outbreak.info/compare-lineages?pango=B.1.617.2&gene=S&threshold=0.2).
Linear regression of root-to-tip genetic distances against sampling dates indicated that the SARS-CoV-2 sequences evolve in a clock-like manner (r = 0.30) (Supplementary Figure 1).
Root-to-tip genetic divergence for the whole dataset against time of sampling.
Ancestral location reconstruction of the time-scaled tree elucidated the number and timing of viral migrations between Florida and the rest of the world (Figure 3). A total of 88 transitions from internal nodes assigned to allopatric locations to those assigned to Florida were observed in the phylogeny. Multiple inferred introductions to Florida occurred in late March to early April 2021 from India, followed by onward migration to the U.S. starting in May (Figure 3A). Subsequent inferred introductions from other geographical locations were observed during the end of June 2021 as the Delta epidemic expanded. Overall, the greatest number of introduction events to Florida were observed to originate from the U.S., followed by Europe (lines marked in black, Figure 3B). Taxa from Florida were also intermediate in transitions between allopatric locations and the remaining U.S., accounting for nearly two-thirds of the number of migration events (double the number of inferred introductions into Florida, Figure 3B).
Movement was inferred using ancestral reconstruction of geographic states within the phylogenetic tree of genomic samples. A. Total number of viral introductions over time into Florida. B. Graphical representation of the estimated number of migration events between the geographic areas; introductions into the state of Florida are outlined in black.
Independent concurrent migrations of Delta to Florida from North America, Europe, and Asia, occurring as early as April 2021, can be visualized on the ML time-tree (Figure 4). Based on internal support for divergence events within the tree (bootstrap values > 0.70), three distinct clades were identified: Subset I (n = 1026) comprised of 188 FL genomes; Subset II (n = 327) comprised 101 FL genomes, and Subset III (n = 470) comprised 83 FL genomes (Figure 4A, clade highlighted in grey and marked as I-III). These well-supported clades represent distinct foci of the local epidemic seeded by multiple introductions.
A. Full time-resolved Maximum Likelihood tree for all sequence data. The three clades identified as being well-supported using bootstrap (BS) analysis (BS>70) are highlighted in grey (referred to as clades I-III). A pie chart indicating the location distribution is located on top of each of the three clades. B-D. MCC tree reconstruction for the three clades (I-III) individually. Tree tips are colored according to their location (continent) and the color legend is on the top right of the figure.
To assess these clades in more detail, we performed a separate Bayesian coalescent analysis (Figure 4B-D). The maximum clade credibility (MCC) trees showed four subtrees comprised predominately of Florida taxa, representing local spread (Figure 4B-D); all Florida subtrees presented with similar upper estimates of the timing of introduction - April 2021 (Figure 4B-D). However, the Florida clusters in panel C and D have very low statistical support due to the low genetic diversity of SARS-CoV-2 genomes sampled within a short time period [49–52]. When assessing clade-defining substitutions, we identified that spike A222V and V289I were isolated to Subset I genomes and K77T associated with Subset II. Further, a subclade of Subset I possessed both A222V and V289I, while another only carried A222V.
Comparison of transmission dynamics of each three clades and their respective subtrees found that Subset II exhibited the highest relative rate of transmission, followed by Subset III, then Subset I; however, there was no clear association between the above-described mutations and relative transmission rates. Further, there was no significant difference in geographic composition of each clade when assessed in context of transmission rates (Figure 4). Florida subtrees exhibited a broader range of estimated transmission rates (0.62 vs. 0.14 units), which were generally higher than the background Subsets (Table 1), except for Florida Subtree C and its Subset II. Florida Subtree D showed evidence of the highest relative rate of transmission (2.13 units) as compared to its progenitor clade, Subtree III.
Subset/subtree statistics (number of taxa, percentage of FL taxa, minor mutations associated and absolute Oster value).
Discussion
Genomic epidemiology has been integral in understanding SARS-CoV-2 emergence and spread and for tracking the evolutionary dynamics of viral transmission. In view of this, we investigated the introduction of the SARS-CoV-2 Delta variant in Florida, U.S. from the early identification and the main peak of infection that occurred in July 2021. By June 2021, the Delta variant had become the dominant lineage in the US, unseating the Alpha variant as the most prevalent lineage in Florida, U.S., and abroad [12,14]. Subsequently, Delta received considerable attention due to its rapid dispersal, high transmission rate, and association with vaccine failure through vaccine breakthrough cases [15].
These findings provide insight into the introduction and emergence of a novel SARS-CoV-2 variant (Delta) to a region with moderate population-level immunity and ongoing transmission of a resident variant (Alpha). The introduction of Delta variant to Florida was observed in early and late April 2021 primarily from India, which is consistent with the previously reported origins [12]. Multiple international introductions were observed at the end of June 2021 when about 45% of the state’s eligible population was fully vaccinated, and about 53% had received at least one dose of vaccine in Florida [53].
Early Florida Delta isolates were interspersed among the isolates from other countries in the phylogeny, suggesting multiple introductions with most dating to early April 2021. These introductions coalesced into three distinct clades that dominated the early period of the Delta wave and resulted in the near complete replacement of the resident Alpha variant. A similar dynamic of multiple repeated introduction followed by rapid interstate transmission was observed in the U.S. after introduction of the Alpha variant [54,55]. While international and domestic travel was muted at the time Delta was introduced to Florida, there were no explicit restrictions. Likewise, as Delta completely displaced Alpha, Omicron could potentially displace Delta worldwide [56]; however, several scenarios could be considered, such as a long-term co-circulation or an Omicron wave followed by resurgence of Delta. The outcome of these scenarios may be affected by the combination of immune escape and intrinsic transmissibility. This will influence the endemic state of the virus, public health response, and changes to vaccination rates or other social practices. Last, as evidenced by our analysis of relative rates of transmission, there was considerable variation in transmission dynamics of dominant Florida Delta clades independent of mutational profile or geographic composition of the progenitor lineages that seeded the local epidemic. This points to differences in host factors or public health measures that warrant further investigation.
As vaccine coverage rates remain low globally, the continued emergence of variants of concern remains likely, thus emphasizing the need for targeted viral genomic surveillance and expansion of novel surveillance tools. Two promising approaches are the use of wastewater surveillance at both the municipality level and targeted settings (e.g., airports) as well as the application of air collection monitors to detects SARS-CoV-2 in the environment (DH O’Connor, 2022, In Preparation). When deployed, these tools can be used to assess the risk of infection in a setting, detect the emergence of novel variants, and track viral diversity and coupled with expanded viral genomic surveillance of COVID-19 cases create a robust network for situational awareness of the evolving pandemic.
Although the sequence data alone do not represent the true number of epidemiologically linked transmission chains, our phylogenetic findings clearly elucidate multiple introductions of a novel variant to a population with moderate levels of population immunity, a prevailing resident variant, and the absence of mandates. When interpreting our results, it’s worth noting that importations and exportations were inferred from available sequence data available in GISAID. The availability of these data is influence by several factors including SARS-CoV-2 testing and genomic surveillance capacity among countries. Further, whereas increased rates for Florida subtrees likely reflects local transmission dynamics, additional case-level data is required for more resolved investigation into the relative contribution of host and pathogen factors associated with differences. Nonetheless, our results further underscore the benefits of routine pathogen genomic surveillance to monitor outbreak investigations and support the need for more comprehensive epidemiological studies of newly emerging variants. They also provide a likely scenario for the trajectory of Omicron and subsequent variants whether emerging domestically or imported, reinforcing the necessity of continued genomic surveillance efforts globally.
Data Availability
All data produced in the present study are available upon reasonable request to the authors
Author Contributions
Conceptualization, MS and TA; Analysed the data: EC, BR, SM; Helped with study design and data interpretation: EC, SA, and TA; Writing – Original Draft Preparation, EC, SA, BR. Writing – Review & Editing, SES, MS, JB, TA. Supervision, MS, TA.
Funding
This work was funded in part by a grant from The Rockefeller Foundation to the University of Florida and the University of Central Florida in order to accelerate regional genomic surveillance. The views and findings are those of the author and not those of the funder (The Rockefeller Foundation). This work was funded in part by Epidemiology and Laboratory Capacity award from the CDC Office of Advanced Molecular Detection to the Florida Department of Health Bureau of Public Health Laboratories.
Institutional Review Board Statement
Not applicable
Informed Consent Statement
Not applicable.
Data Availability Statement
All input files along with all resulting output files and scripts used in the present study will be made available upon request.
Conflicts of Interest
The authors declare no conflict of interest.
Acknowledgments
We thank the global laboratories that generated and made public the SARS-CoV-2 sequences (through GISAID) used as reference dataset in this study (Supplementary table 1).