1 Introduction

1.1 The SARS-CoV-2 genome

COVID-19 is caused by the RNA virus SARS-CoV-2, a betacoronavirus with a nearly 30 kb positive-sense, single-strand RNA genome that encodes 29 proteins (Wu et al. 2020). These include structural proteins utilized by the virus to package its RNA as well as proteins for enabling its entry and propagation in the host by hijacking the host cellular machinery for viral replication. SARS-CoV-2 is an enveloped virus with a host-derived lipid membrane. The viral capsid assembly is mediated by several structural proteins encoded by the virus, the most important being the S (spike protein that forms a crown-like structure), M (a hydrophobic membrane protein), E (an integral membrane protein or envelope protein) and N (an abundant nucleocapsid protein that binds the RNA genome) proteins (figure 1). The Spike protein encoded by the S gene has a receptor-binding domain specifically evolved to bind to the human angiotensin-converting enzyme-2 (ACE2) receptor found on the surface of many human cells, including those of the nasal cavity, lungs, kidneys, intestines, brain, heart and blood vessels (Li et al. 2020a, b). Respiratory transmission is the primary route of infection via the nose and mouth when infected individuals in close contact with uninfected people spread the viral particles that bind to the epithelial cells of the new hosts and enter their body. A few studies suggest a correlation between the extent of ACE2 expression in individuals and the clinical outcome of SARS-CoV-2 infection, especially in elderly populations and those with comorbidities (Li et al. 2020a, b; Wang et al. 2020).

Figure 1
figure 1

The SARS-CoV-2 genome is ~30Kb and consists of genes encoding structural and non-structural proteins. The structural proteins are nucleocapsid (N), spike (S), membrane (M), and envelope (E) proteins. Each box indicates a gene. The numbers on the axis indicate genome coordinates.

So far in the pandemic, it has been the elderly, those with comorbidities (including hypertension, diabetes, asthma and chronic lung disorders), and immune-compromised systems that have been the most susceptible to the adverse effects of COVID-19 infection (Mueller et al. 2020; Moderbacher et al. 2020). The demographics of the most affected populations may however change due to adaptations of the viral genes, or depend on other host and environmental factors. Studies correlating the incidence and severity of COVID-19 with the host genetic make-up among Indian populations are still underway. Acquiring immunity to the virus via unchecked exposure can lead to unacceptable levels of mortality in susceptible populations, as seen repeatedly in countries across the globe (Azkur et al. 2020; Catanzaro et al. 2020). Over 86 million people have been afflicted with the disease, with nearly 1.9 million deaths reported so far throughout the world. Early diagnosis and treatment of COVID-19 are crucial, and the recent success in the development of vaccines is promising (Anderson et al. 2020; Polack et al. 2020; Voysey et al. 2020), but prevention measures will prove to be the most effective in mitigating the worldwide spread of the disease and decrease the scope for harmful mutants to evolve. A key aspect of prevention would include a focused approach towards surveillance and monitoring of the mutations in the virus, by constant and widespread analysis of its genome sequence.

1.2 Genome sequencing-based phylogenetic analysis

The first genome sequences of the novel betacoronavirus became available on the global public repository, Global Initiative on Sharing All Influenza Data (Elbe and Buckland-Merrett 2017) or GISAID (https://www.gisaid.org/) around 10 January 2020, named as the original virus from Wuhan (WIV 04-reference or hCoV-19/Wuhan/WIV04/2019). Since then the repository has amassed over 320,000 sequences from all over the globe. India was the 5th country in the world to sequence the viral genome (isolated from the first patients in Kerala) for inclusion in GISAID (ICMR 2020). Sequencing efforts across many labs in India have since led to the submission of more than 6000 SARS-CoV-2 viral genomes. Over time, viruses accumulate mutations that alter the genomic sequence, either due to random replication errors or via a defense mechanism of the host called RNA editing (Van Dorp et al. 2020a, b). The mutations are called synonymous when there is no change to the amino acid encoded and non-synonymous when the protein acquires a change due to the mutation. SARS-CoV-2 has acquired new mutations at the rate of ~2 changes per month so far. Thus, the viral sequences seen today differ from the Wuhan variant at around 20 points in their genomes.

Phylogenetic analysis of the GISAID sequences highlights multiple clusters of related genomes, called clades, grouped based on common mutations. The nomenclature of SARS-CoV-2 lineages is explained in table 1. As shown in figure 2, Clade O was the ancestral type which originated from Wuhan (Wu et al. 2020; Zhou et al. 2020). In January and early February, this diversified into Clades 19A and 19B (also known as L and S) (Tang et al. 2020). The L-type was more prevalent (~70%) in the early stages of the outbreak in Wuhan, even though the S-type was closer to the ancestral type, and then its frequency decreased over the next few months. A new clade, A2a or Clade G, the ancestor of clades 20A-C, was then identified in February, characterized by a specific non-synonymous mutation (D614G) in the Spike protein or gene S.

Table 1 Various nomenclatures of SARS-CoV-2 clades*
Figure 2
figure 2

A simplified phylogenetic tree showing the divergence of clades from the ancestral root (Wuhan) and their corresponding clade-defining mutations. The boxes indicate clades. Boxes of the same color indicate derived clades sharing the same mutations as the parent. The clade nomenclature (19 and 20) as specified by Nextstrain is given within the box, while the GISAID clade is mentioned in parentheses. The text on the right shows the clade defining mutations of the respective clade and is in the order of protein, amino acid and position followed by the amino acid change. The month above the branches indicates the inferred month of emergence of specific clades. Currently, 20A, 20B and 20C are the globally dominant clades.

The D614G mutation replaced the 614th amino acid D (aspartic acid) with G (glycine) in the Receptor Binding Domain (RBD) of the Spike protein. Glycine being a less bulky amino acid than aspartic acid it is believed to contribute to a more flexible hinge region in the Spike that enables more efficient cutting for receptor binding (Korber et al. 2020; Turoňová et al. 2020). This offered the virus a selective advantage in infection and transmission, making it predominant all over the world (Zhang et al. 2020). At present, almost all new infections of COVID-19, in India as well as globally, are by viruses containing this mutation. Monitoring such mutations is critical in the context of vaccines and therapeutics developed globally. Importantly, the D614G mutation falls outside the region that is responsible for raising neutralizing antibodies and has not been a cause for concern in the context of vaccine efficacy and therapeutics (Li et al. 2020a, b).

Over the last 10 months, we have analyzed over five thousand SARS-CoV-2 genomes isolated from Indian patients of COVID-19 to build a phylogeny with 6888 mutation events (Singh et al. 2020). In the following sections, we review the rise and spread of different variants of the SARS-CoV-2 virus across India.

2 Spread of SARS-CoV-2 in India

2.1 Summary of early spread: Rise and decline of an India-specific variant

In late-March, a unique cluster of sequences was identified in India which could not be classified into any previously annotated global clades. This cluster, named the clade I/A3i, is characterized by a set of four mutations as described in our earlier work (Banu et al. 2020). Clade I/A3i potentially arose from a single outbreak and rapidly spread across the country and has a lower mutation rate compared to other clades. The evolution of the I/A3i clade is largely determined by changes in the Nucleocapsid (N) and Membrane (M) genes, in contrast with the predominant A2a clade, which is characterized by changes in the Spike (S) gene. When first characterized in late May, 42% of all genomes sequenced in India belonged to this clade. Members of the Clade I/A3i formed the predominant class of isolates from the states of Delhi, Telangana, Maharashtra, Karnataka, and Tamil Nadu and were the second largest in membership in Haryana, Madhya Pradesh, West Bengal, Odisha, Uttar Pradesh, and Bihar. Globally, around 300 genomes sampled from Singapore, Malaysia, Australia, United States, Canada, Taiwan, Japan, Thailand, Philippines, Oman, Guam, and Saudi Arabia belonged to this clade and a few of them had a sampling date earlier than the earliest sample of this cluster from India. Though originally a dominant clade, its representation has become non-existent in recent samples as predicted by its mutation profile (Banu et al. 2020). Currently, considering all the genomic data available from India, 547 genomes (10%) from 17 of the 20 states from which the genomes originated fall under the clade I/A3i.

2.2 Current SARS-CoV-2 variants in India

The A2a takeover from A3i and other minor variants in India is summarized in figure 3. The first instance of the A2a variant was identified from samples collected in early March with increasing prevalence in the following months. Currently, two subtypes of A2a are dominant across India, characterized by differing mutations in the N gene and other ORFs, in addition to the D614G spike mutation.

Figure 3
figure 3

Timeline of clade distribution in India (top) and across different states (state abbreviations are indicated on the left). Clades are differentiated by colors as per the legend on the top while size of the bubble indicates their prevalence. As seen in the country track on the top, A3i clade (green) was prevalent during the months of March-May and was eventually overtaken by the A2a clade (blue).

The variant landscape is mostly concordant across states in India. However, there appear to be instances of high representation of specific variants in selected states. These include the ORF3a mutation L46F from Telangana and the Spike mutation L54F seen in Gujarat (Singh et al. 2020; Hassan et al. 2020). Another Spike mutation N440K was first identified in late June in the state of Andhra Pradesh and has been present in ~6% of the samples collected from India since then (Jolly et al. 2020). The top three genes where most mutations have been identified are ORF1a (2333), ORF1b (1278) and S (714). Figure 4 shows the frequency of mutations identified in Indian samples across all the viral genes and table 2 summarizes the mutations in Indian variants identified over the last 10 months of the pandemic. The clade analysis and variant information can be explored interactively at our website https://data.ccmb.res.in/gear19/. Most variants appear overrepresented in states that currently sequence and submit the most samples to GISAID. More sequencing from underrepresented areas needs to be carried out for a clearer picture of country-wide trends of the viral spread.

Figure 4
figure 4

Plot showing the number of mutations identified across genes in the SARS-CoV-2 genome, from Indian samples. ORF1a, ORF1b and S show higher frequency of mutations compared to the rest of the genes.

Table 2 Most prevalent SARS-CoV-2 mutations in India

The first positive case in India was reported from Kerala in January of a patient who had traveled from Wuhan, and the state provides a unique opportunity to study viral diversity (Yadav et al. 2020). A study conducted on 200 samples identified 4 novel genetic variants and 89 variants that were exclusive to Kerala and not present in other parts of the country (Radhakrishnan et al. 2020). This work is currently being scaled up by the local state government to gain insights into the transmission and needs to be performed across all Indian states.

As of December 2020, a new country-wide consortium named INSACOG has been established to identify new and circulating variants by genome sequencing across multiple states (MoHFW 2020). One of the main goals of this consortium is to sequence 5% of all COVID-19 positive cases in the country.

2.3 New global variants and causes for concern

The mutation landscape of SARS-CoV-2 has been under constant global scrutiny to understand the effect of these changes on the infectivity and antigenicity of the virus. While most mutations are of little to no consequence, sometimes the virus acquires a mutation that gives it an advantage over other strains. The Spike protein is used by the virus to enter human cells via the ACE2 receptor. Thus, Spike mutations can potentially facilitate better affinity or binding and enable easier entry into the host cell, as seen in the case of the D614G mutation described in the preceding section. The receptor-binding domain (RBD) in the spike protein is the most variable part of the coronavirus genome (Zhou et al. 2020). Mutations can putatively also render the virus resistant to neutralization by host antibodies and thus need to be identified and monitored for the efficacy of antibody therapeutics. Figure 5 shows the position of some of the key Spike mutations that can alter its biology in terms of transmission, infectivity and enabling immune evasion.

Figure 5
figure 5

Representation of the spike gene indicating key mutations that are a cause for concern and require monitoring in India. The various sub-domains of the spike gene are shown as colored boxes and defined in the legend. The position and color of each line indicates the location of the particular mutation, defined above the gene.

Some of the spike mutations recently identified that are of concern include the N439K, N440K, Q493K and E484K, which are prone to immune escape (Andreano et al. 2020; Thomson et al. 2020; Weisblum et al. 2020). Of these, the N440K variant has been found in ~42% of the samples from Andhra Pradesh and E484K in 3 samples from Maharashtra (Jolly and Scaria 2020; Singh et al. 2020). Most of the other mutations are absent in currently sequenced samples from Indian isolates and need to be actively monitored. Table 3 highlights the key Spike mutations of global concern which are a priority for surveillance in the Indian landscape.

Table 3 S gene mutations of concern that require monitoring via genome sequencing

2.3.1 European lineages identified in Denmark and Spain

SARS-CoV-2 was recently introduced into minks from humans and since then has adapted to the mink host. A unique strain called Cluster 5 was identified in both hosts which encompasses three amino-acid changes (I692V, M1229I and Y453F) and two deletions (del 69–70) in the spike protein (Oude Munnink et al. 2020; Van Dorp et al. 2020a, b). This variant was last seen in September across genomes. Recent surveillance studies by European consortia have identified several other strains of SARS-CoV-2 that show increased transmission. Sequencing and analysis efforts by Spain have identified the EU1 and EU2 strains, which harbor two mutations in their Spike proteins (A222V and S477N respectively) (Hodcroft et al. 2020a, b). These strains were associated with the surge of cases in various European countries during the summer. In Indian isolates, five samples with A222V and one sample with S477N mutations have been identified till date. Additional studies are required to understand their potential implications in terms of diagnostics, therapeutics and vaccines under development.

2.3.2 Lineage B.1.351 identified in S. Africa

The last month of the year 2020 began with worrying news regarding new variants of SARS-CoV-2 that show increased transmissibility, first identified in the UK and S. Africa. Recent reports from South Africa mention concerns regarding lineage B.1.351 which has a mutation in the RBD of the Spike protein (N501Y) which may be associated with faster transmission and possible adverse illness in young and healthy individuals (Tegally et al. 2020). Characterized by another non-synonymous Spike mutation, the variant replaces asparagine (N) with tyrosine (Y) in the RBD and increases viral affinity to the ACE2 receptor on the host cells. This might explain the dominant spread of 501Y.V2 in the region over the last couple of months, though further studies are needed to understand its epidemiology.

2.3.3 Lineage B.1.1.7 identified in the UK

Even more concerning has been the latest report by the COVID-19 Genomics UK Consortium (COG-UK), detailing the variant VUI-202012/01 (lineage B.1.1.7) that is associated with fast-growing outbreaks across London, Kent, and the other UK counties (COG-UK 2020; Volz et al. 2020). A four-fold increase in cases in a span of just 10 weeks prompted immediate monitoring and investigation of the new variant, followed by global measures to limit its spread. Lineage B.1.1.7 is believed to be 70% more transmissible than other strains and has mutated at a much faster rate than other variants (European Centre for Disease Prevention and Control 2020). Sequencing of this strain has identified a cluster of about 23 mutations (Public Health England 2020), 17 of which are non-synonymous including N501Y and P681H as well as the two deletions (69-70 del and 144 del) in Spike protein, associated with a capacity to escape previous immune responses (Kemp et al. 2020). Viral fusion with host cells is facilitated by the cleavage of S into S1 and S2 sub-regions (via host enzyme furin) and the P681H mutation adjacent to the cleavage site is thus also a cause for concern (figure 5). So far, 54 Indian isolates have P681H mutation and it is present in ~4% of isolates from Maharashtra. Over 12% of samples analyzed by the COG-UK currently belong to this lineage. Despite large-scale restrictions on global travel being implemented from the latter half of December 2020, the VUI-202012/01 variant has already spread to a few other European countries from the UK, while the South African variant 501Y.V2 has now been identified in the UK (European Centre for Disease Prevention and Control 2020). Variants with efficient transmission can thus spread very rapidly all over the globe without appropriate care and surveillance.

The VUI-202012/01 has recently also been identified in the Indian population, with instances of infected travelers from the UK testing positive for the B.1.1.7 lineage fast approaching 100 cases. So far there is no evidence of community transmission of the variant in India but this can only be confirmed once a sufficient number of positive samples have been sequenced across the country. If this strain indeed possesses a transmission advantage it is likely to overtake the D614G and drive the pandemic in 2021, unless strict measures are adopted for global containment. Understanding the genomic epidemiology of the virus in India will be crucial for anticipating variant emergence, tracing transmission networks, discerning selective pressure and evaluating disease severity of outbreaks.

2.3.4 Lineage P.1 identified in Brazil

A variant circulating in Manaus, Brazil was identified from international travellers in Japan in viral genomes sampled from mid-late December 2020 to early January 2021. The new lineage P.1 (descendant of B.1.1.28) has 17 amino acid mutations including those of concern such as E484K, K417T, and N501Y in spike gene. This coincided with a resurgence of infections in Manaus and a rapid increase in the number of COVID-19 hospitalizations in January 2021 despite high seroprevalence; 76% of the population had already been shown to have antibodies to the virus in October 2020 (Sabino et al. 2021). A new sublineage P.2 (that independently acquired the spike E484K mutation associated with immune evasion) has now been detected in many locations in Brazil including in Manaus. Three cases of reinfection have recently been identified in Brazil, one of which belongs to P.1 and two others belong to P.2 lineage (Resende et al. 2021; Naveca et al. 2021; Vasques Nonaka et al. 2021). Such variants containing multiple mutations that can drive higher transmission and/or immune escape are strong contenders for enabling reinfections and lowering vaccine efficacy globally.

3 Genomic epidemiology: implications for origin and surveillance

3.1 Zoonotic transmission and origin of the virus

Genomic studies can aid in the identification of the origin of the virus and possible sources of transmission to humans. Zoonoses are infectious diseases transmitted from animals to humans and can evolve to become efficiently transmissible human-to-human infections such as malaria, SARS, HIV, pandemic influenzas and, most recently, COVID-19. Transmission of the causative pathogens from animals to humans and livestock can be via many routes such as infected meat (wet markets, wildlife trade, contaminated feed), direct contact with the pathogen (carcasses, fresh meat), contamination of water and produce, as well as air-borne and vector-borne (mosquitoes, ticks, rodents) routes. The current pandemic caused by the SARS-CoV-2 virus is believed to have originated from a wildlife food market in China's Wuhan city towards the end of 2019 (Wu et al. 2020; Zhou et al. 2020). Current evidence points to its origin from a bat-borne virus and the global pandemic represents the first time that the virus has been transmitted into humans (Andersen et al. 2020; MacKenzie and Smith 2020; Zhou et al. 2020). Consequently, this is a novel pathogen for the human immune system and many individuals are susceptible to its devastating systemic effects. Even prior to this pandemic, some of the earlier known coronaviruses have been transmitted to humans via intermediate hosts such as civet cats (Severe Acute Respiratory Syndrome (SARS) in 2002) and dromedary camels (Middle East Respiratory Syndrome (MERS) in 2012) although their spread has not been as prevalent globally (Song et al. 2005; Hemida et al. 2014). In the last century, the world has seen at least six major outbreaks of novel coronaviruses causing a range of diseases from a mild cold to infections with high mortality. COVID-19 has been the most devastating pandemic in this century, but it was preceded by the recent outbreaks of SARS and MERS, as well as the H5N1 bird flu, Ebola, HIV, Lyme disease, Rift Valley fever, Lassa fever and Nipah virus infections.

Genome sequencing has enabled the retrospective dating of the first known cases of COVID-19 as appearing in December 2019 in Wuhan and many of the early cases in the Chinese city were epidemiologically pinned to a local wet market. There have been some reports suggesting an earlier origin of the human to human transmission in locations other than China but these need to be examined further and verified by sequencing the viral samples to establish their lineage of origin (Apolone et al. 2020; Basavaraju et al. 2020). The zoonotic origin of the SARS-CoV-2 pandemic is still unknown and the reservoir host species unproven. The viral genome sequence clusters with SARS-CoV and has been placed within the SARS- related coronaviruses (SARSr-CoVs) found in bats, in the subgenus Sarbecovirus of the genus Betacoronavirus. With 96.2% genomic sequence similarity to a horseshoe bat coronavirus (RaTG13; Zhou et al 2020), and a high degree of relatedness with other bat coronaviruses, SARS-CoV-2 is believed to have originated from a bat reservoir although information regarding the intermediate host, if any, remains sparse (Andersen et al., 2020). Typically, pathogens that cause outbreaks in humans, including the current COVID-19 pandemic, evolve in 'reservoir' hosts such as rodents, bats and small mammals (Shereen et al. 2020). Over time they become endemic within their populations, causing relatively no harm to these animals. A few key mutations then allow them to infect humans directly, or via 'intermediate hosts' that are closely related mammals, such as livestock. This pandemic is an urgent example of the increasing danger from zoonotic transmissions as humans come into greater conflict with their environment.

The inter-species transmission from humans to minks and back described in the previous section is another example of the creation of host reservoirs that offer conditions for viral evolution and adaptation. Tracking mutations across these genomes can aid in understanding viral diversity and transmission after such events of zoonotic crossover. In the light of SARS-CoV-2 infection eventually becoming endemic to the human population, it will be important to have mechanisms in place to monitor disease crossover to non-human species in contact with human habitation, including pets, livestock and wildlife.

3.2 Vaccine efficacy and immune evasion mutations

SARS-CoV-2 can theoretically evolve to evade immunity when brought under the stress of therapeutic or preventive interventions. A prevalent mutation of the Spike receptor binding motif (RBM) - N439K - has enhanced binding affinity to the hACE2 receptor, and can likely evade neutralizing antibodies since it is a part of the epitope recognized by these antibodies (Thomson et al. 2020). Another mutation in the RBD region, E484K has been described in lineages in South Africa (501Y.V2 (B.1.351)) and Brazil (descended from the B.1.1.28 lineage) and is shown to reduce the neutralization potency of some human sera by >10-fold. It is of concern that this mutation can impact binding and can escape even a potent polyclonal serum targeting multiple neutralizing epitopes (Greaney et al. 2020; Andreano et al. 2020; Weisblum et al. 2020).

The currently approved vaccines raise a host immune response against multiple epitopes of the viral proteins, decreasing the chances of a few mutations facilitating efficient vaccine escape and there is hope that immune evasion will therefore be controlled before such variants spiral out of control. Currently, none of the variants of SARS-CoV-2 appear to have higher virulence or contribute to greater disease severity.

However, such mutations that maintain virulence and viral fitness need to be identified and monitored to inform the future of Covid-19 vaccines and therapeutics, so that combinations of antibodies based on distinct epitopes can be designed for laboratory analysis of escape prevalence (Weisblum et al. 2020). A recent study has characterized the novel N501Y and other Spike mutations for the potential of infection as well as vaccine-based immune evasion (Shang and Axelsen 2020). Even as the vaccination process has now been initiated globally, the current vaccines are also being evaluated for their potential against the new viral mutations as they arise.

3.3 Surveillance and detection

An earlier study based on in silico analysis of 2086 whole-genome sequences from India documented extensive deletion of amino acid residues in the C-terminal region of the envelope glycoprotein in some SARS-CoV-2 genomes (Kumar et al. 2021). These amino acid deletions map to the C-terminal region of E protein which is just beyond the reverse primer binding site used in the detection of positive cases; thus, E gene-based RT-qPCR could still detect these isolates. However, a handful of genomes from the State of Odisha had deletion even in the primer binding site. This opens yet another front for genomic sequencing and surveillance to ensure accurate testing in the months to come.

The variant VUI-202012/01 in the UK includes a deletion in the Spike gene (69–70del) that does not amplify in RT-PCR tests using the S-gene primers while the other two primer pairs can be used to detect the viral presence. This can be exploited for a first pass identification of an outbreak involving this variant using the routine RT-PCR based testing methods and further confirmed by genome sequencing. A typical whole-genome sequencing strategy involves tiled primers-based amplification of the entire viral genome, as described by the ARTIC network (DNA pipelines R&D; Farr et al. 2020). The amplified product is then sequenced on high throughput platforms such as Illumina (short reads) or Oxford Nanopore (long reads). The entire procedure starting from viral RNA to sequencing takes 3–4 days at an average cost of INR 7000 and is invaluable for (i) discovering and tracking new mutations that appear from local transmissions and (ii) monitoring the import of harmful variants from elsewhere.

4 Conclusion

As the COVID-19 pandemic enters its second year, it is crucial to keep a lookout for new and emergent strains and localized disease outbreaks. The evolution of SARS-CoV-2 can render it more infectious via adaptive mutations that increase affinity or enhance binding to host cells, while escape mutations that can help it evade the immune response have serious implications for vaccines and therapeutics and can adversely impact the severity and mortality of the disease. As multiple vaccines are rolled out in the year ahead, the virus will be subjected to new selection pressures and evolution modes. India has so far not been sequencing SARS-CoV-2 isolates to full capacity, having deposited only about 6,400 genomes of the over 10.4 million recorded cases (0.06%). Exploiting advances in genomic epidemiology by monitoring and increasing sequencing efforts following local spikes will go a long way in staying on top of mutations of concern while their biology and effects are studied in greater detail.

Studying the virus under a genomic lens has played a pivotal role in tackling key challenges in pandemic management so far. Other issues beyond the scope of this article include the role of mutations in reinfections and disease severity. The extent to which genomic surveillance can help answer these questions and control outbreaks is only limited by the availability of data and will be crucial to controlling the pandemic in the future.