Cooccurrence of N501Y, P681R and other key mutations in SARS-CoV-2 Spike

Carol Lee; Shruthi Mangalaganesh; Laurence O.W. Wilson; Michael J. Kuiper; Trevor W. Drew; Seshadri S. Vasan

doi:10.1101/2021.12.25.21268404

Abstract

Analysis of circa 4.2 million severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome sequences on ‘Global Initiative on Sharing All Influenza Data (GISAID)’ shows the spike mutations ‘N501Y’ (common to Alpha, Beta, Gamma, Omicron variants) and ‘P681R’ (central to Delta variant’s spread) have cooccurred 3,678 times between 17 October 2020 and 1 November 2021. In contrast, the N501Y+P681H combination is present in Alpha and Omicron variants and circa 1.1 million entries. Two-thirds of the 3,678 cooccurrences were in France, Turkey or US (East Coast), and the rest across 62 other countries. 55.5% and 4.6% of the cooccurrences were Alpha’s Q.4 and Gamma’s P.1.8 sub-lineages acquiring P681R; 10.7% and 3.8% were Delta’s B.1.617.2 lineage and AY.33 sub-lineage acquiring N501Y; remaining 10.2% were in other variants. Despite the selective advantages individually conferred by N501Y and P681R, the N501Y+P681R combination counterintuitively didn’t outcompete other variants in every instance. Although a relief to worldwide public health efforts, in vitro and in vivo studies are urgently required in the absence of a strong in silico explanation for this phenomenon. This study demonstrates a pipeline to analyse combinations of key mutations from public domain information in a systematic manner and provide early warnings of spread.

1. Introduction

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which causes the ongoing novel coronavirus disease 19 (COVID-19) pandemic, has a single-stranded positive-strand ribonucleic acid genome (ssRNA(+) genome) of size 26–32 kb, with high fidelity replication due to 3′-to-5′ exoribonuclease ‘proof-reading’ mechanism [1,2]. As this virus adapts to its human host, we have seen it evolve and present quasispecies diversity [2,3]. While most of the several thousands of mutations catalogued to date aren’t substantial functional changes they have proven aetiologically useful [4,5].

Two years on since the start of the COVID-19 pandemic, we now have a good idea of the key mutations, especially in the Spike protein, which are punctuations in the evolutionary story of this virus to date. The D614G mutation reported in April 2020 resulted in the ‘G-strain’ with increased infectivity replacing the genomic background of this virus globally, although there was no impact on vaccine efficacy [5,6,7]. However, the N501Y mutation common to the Alpha, Beta, Gamma and Omicron variants of concern (VOC), has contributed to enhanced infection and transmission, reduced vaccine efficacy, and the ability of SARS-CoV-2 to infect new species such as wild type mice [8,9,10,11]. Another key mutation is the P681R which alters the furin cleavage site, and has been responsible for increased infectivity, transmission and global impact of the Delta variant [12,13,14].

Our primary objective is to investigate the risk of the three aforementioned ‘mutations of current interest’ (MOCI) cooccurring naturally due to convergent evolution and resulting in a SARS-CoV-2 variant that is of greater concern than those declared to date, noting that the latest Omicron VOC has the N501Y but not the P681R mutation. Our secondary objective is to demonstrate a methodology and pipeline to analyse the spread of variants containing combinations of important mutations. We have achieved this by mining data from GISAID, the Global Initiative on Sharing All Influenza Data, which is the largest and the most comprehensive repository of SARS-CoV-2 sequences [15,16]. We have used quasispecies theory and in silico modelling to interpret our findings to the extent possible, and recommended future research directions.

2. Results: Frequency and distribution of D614G, N501Y and P681R mutations

The earliest record of these three MOCI coming together was in Slovenia on 17 October 2020, however, no further observations were recorded since then in that country. Mining of GISAID data found 3,678 entries (0.1%) containing the MOCI and a majority of these were in current VOC – Alpha (61.3%), Beta (0.4%), Gamma (6.4%), Delta (21.2%), Omicron (0.0%) – and a small proportion in current or former VOI – Mu (0.3%), Kappa (0.2%) and Iota (0.02%). A small proportion (10.2%) was also observed in other variants not classified as VOC, VOI, Variant Under Monitoring (VUM), or specific sub-lineages (Figure 1).

Figure 1. Summary of SARS-CoV-2 lineages and sub-lineages containing the D614G, N501Y and P681R mutations of current interest (MOCI) on GISAID.

3,678 sequences were found with our MOCI from a total of 4,177,098. Bold numbers in each box shows the number of observations for each VOC/VOI and their sub-lineages (where such information was available). ‘Others’ indicate samples not further classified on GISAID.

Figure 2 illustrates the MOCI frequency in countries with at least 50 recorded sequences – Brazil (5.8%), Denmark (1.9%), France (22.7%), Germany (3.6%), Sweden (5.2%), Turkey (21.4%), UK (3.6%) and USA (22.1%). These eight countries represent 86.4% of the cases containing the MOCI (Figure 3A, which shows a continuous timeline). Fifty-seven other countries recorded less than 50 entries for the period 17 October 2020 to 15 September 2021 (Supplementary Table S1).

View this table:

Supplementary Table S1. Raw data for countries with cooccurring P681R, N501Y and D614G mutations.

Frequencies for each region and continent are provided as sub-totals and totals respectively.

Figure 2. Discrete timeline depicting the start and end which SARS-CoV-2 variants with the P681R, N501Y and D614G mutations were observed in various countries, states and cities between October 2020 and October 2021.

Countries with 50 or more observations, and states/cities with 10 or more observations are deemed significant and thus included in this timeline. Raw data with the full list of countries are provided in Supplementary Table S1.

Three countries – France, USA and Turkey – stand out as they each contribute to over 22% of the total instances of the MOCI cooccurring. It’s unsurprising that these trends overlap with the spread of VOC in these nations (Figures 3B-3D), because there would have been more opportunities for Delta to acquire N501Y, and for Alpha or Gamma to acquire P681R (Figures 3B-3F). Such cooccurrences also appear to have happened over short periods of time, for instance during March to May 2021 in France and USA with the Alpha VOC; and between June and August 2021 in Turkey with the Delta VOC (Figures 2 and 3B-3D). This could be due to founder effects, but we cannot establish this by mining GISAID data alone; it will require detailed epidemiological investigations by national public health authorities, which is beyond the scope of this work. Nevertheless, from Figure 2, we can be reasonably sure that a number of independent events of convergent evolution (i.e. MOCI cooccurring) have taken place. It is worth investigating why we do not see as many instances of Delta acquiring N501Y in France and USA, even though they had a very high number of this VOC from July 2021.

Figure 3. Continuous representation of the timeline during which SARS-CoV-2 variants with the P681R/H, N501Y and D614G mutations were observed between October 2020 and October 2021.

(A) Number of isolates with MOCI, plotted as 14-day rolling average, in the eight countries which had at least 50 observations. The vertical red line shows the first recorded cooccurrence in Slovenia on 17 October 2020; (B-D) Number of VOC sequences observed in the top three countries (France, Turkey, USA respectively); (E-F) Number of isolates with MOCI which are also VOC, plotted as 14-day rolling average, either from the notional start of the pandemic in December 2019, or from September, May, November and October 2020 as the first reported months for Alpha, Beta, Gamma and Delta respectively. (G-H) Number of isolates with the P681H (rather than P681R), N501Y and D614G, plotted as 14-day rolling average, either from the notional start of the pandemic in December 2019, or from the month of first report for each VOC. ‘Other’ indicates MOCI found in variants other than these VOC.

3. Discussion

3.1. Y501-R681 almost always in the G614 background

From Supplementary Table S2 we see that 3,688 entries contain N501Y+P681R, just ten more than the 3,678 entries that contain D614G+N501Y+P681R; in other words, the instances of N501Y cooccurring with P681R have almost always happened on the D614G background as predicted [5,6,7]. Although these three mutations are positioned sufficiently far apart in the 3-dimensional protein structure to suggest they don’t interact (Figure 4), further studies are required to understand whether there may be indirect functional links that could enhance viral efficiency.

View this table:

Supplementary Table S2. Observation of key mutations and their combinations on GISAID.

Live analysis from EpiCoV (as of 3 November 2021) containing the results for each of our MOCI, and counts for VOC/VOI according to GISAID. This is compared with data downloaded from GISAID (as of 1 November 2021) and analysed using our in-house alignment pipeline (complete human-origin genomes) after filtering for correct dates in the metadata (‘YYYY-MM-DD’).

Figure 4. Model of the SARS-CoV-2 Spike trimer protein indicating the relative position of the P681R, N501Y and D614G mutations of current interest (MOCI).

The N501Y mutation occurs in the receptor binding domain (highlighted in cyan) and is associated with increased binding affinity to the ACE2 receptor (shown in yellow). The P681R mutation is implicated with more efficient cleavage of the S1/S2 furin site (required prior to viral fusion to the host cell). Mutation D614G occurs at the S1/S2 interface and has also been implicated with increased replication efficiency.

3.2. Predominance of Alpha Q.4 acquiring P681R and Delta acquiring N501Y

We see that the cooccurrence of the MOCI is largely due to; (a) the Alpha VOC acquiring the P681R (61.3%), and this has happened overwhelmingly in Alpha’s Q.4 sub-lineage (89.9%) by definition; (b) the Delta VOC acquiring the N501Y (21.2%), and this has mainly happened in Delta’s parent lineage (B.1.617.2; 50.6%) and also the sub-lineage AY.33 (17.9%); (c) the Gamma VOC acquiring the P681R (6.4%), predominantly in its P.1.8 sub-lineage (71.3%). It is worth noting that the Q.4 sub-lineage has the P681R by definition (https://outbreak.info/situation-reports?pango=Q.4). The Beta VOC acquiring the P681R only constituted 0.4% of the total occurrences, and this was almost entirely in the B.351.2 parent lineage (93.3%). Supplementary Table S3 shows the results for other key Spike mutations in comparison to those obtained from GISAID. For H69del and Y145del, more samples were identified using our pipeline than on GISAID – 1 138 302 versus 1 123 439 before date filtering and 2 563 versus 1 364 after date filtering

View this table:

Supplementary Table S3. Observation of other key SARS-CoV-2 Spike mutations.

Live analysis from EpiCoV (as of 3 November 2021) containing the results for other key Spike mutations according to GISAID. This is compared with data downloaded from GISAID (as of 1 November 2021) and analysed using our in-house alignment pipeline after filtering for correct dates in the metadata (‘YYYY-MM-DD’).

View this table:

Supplementary Table S5. GISAID data on key SARS-CoV-2 mutations classified by VOC/VOI.

Live analysis from EpiCoV (as of 3 November 2021) containing the results for each of our MOCI, and counts for VOC/VOI according to GISAID. This is compared with data downloaded from GISAID (as of 1 November 2021) and analysed using our in-house alignment pipeline after filtering for correct dates in the metadata (‘YYYY-MM-DD’).

3.3. Alpha acquiring P681R ahead of other VOC could be due to P/H681R

Alpha, Beta and Gamma each have N501Y, however Alpha was the first VOC to acquire P681R, both in terms of absolute (Figure 3E) and relative (Figure 3F) timelines, although Beta was reported four months before Alpha in May 2020. Alpha, especially its Q.4 sub-lineage, also contributes to most instances of the MOCI cooccurring. There is a possible biomolecular basis for this; P681H is present in Alpha, but not in the other two VOC. The Grantham scores associated with P681R (103) and P681H (77) are comparable and the H681R substitution is thus a conservative change (Grantham distance 29). There is only one way to get to Histidine (H) or Arginine (R) from Proline (P); and for H and R there is only one way to get to each other; and we see no structural reason from our in silico model as to why one should be preferred to the other. Thus, we could infer that some of the instances of Alpha acquiring P681R could have been due to a substitution of H with R, which is the signature of the Q.4 sub-lineage. Spike’s 681st position is the fifth substrate sequence for cleavage recognition; both furin and the transmembrane serine protease 2 (TMPRSS2) cleave the Spike at 685/686 position with H (and possibly R) enhancing this process [7,17,18]. It is worth noting that to penetrate host cells, the SARS-CoV-2 Spike protein must be cut twice by host proteins. In the SARS-CoV-1 (SARS), both incisions occur after the virus has locked on to a cell. But with SARS-CoV-2, the presence of the furin cleavage site enables host enzymes like furin to make the first cut as newly formed viral particles emerge from an infected cell. These pre-activated viral particles can then go on to infect cells more efficiently compared to particles requiring two cuts. Thus, the P681R increases the susceptibility of the furin cleavage site in Delta VOC, and allows the exposure of the Spike’s S2 subunit for better cell integration.

3.4. Cooccurrences have been reported predominantly in eight countries

Cooccurrences of the MOCI were observed with Alpha, Beta, Gamma and Delta VOC in 41, 8, 11 and 47 countries respectively, indicating convergent evolution, especially as much of the world was under lockdown during 2020-2021. Curiously, two thirds of the observations have been reported from just France, Turkey and USA; 86.3% from these three countries plus Brazil, Denmark, Sweden, UK and Germany (Supplementary Table S1; Supplementary Figure S4). Several instances involve proximal regions, suggesting multiple founder effects (Figures 2 and 5). For example, in Denmark and Sweden, variants with the MOCI were concentrated in highly populous Hovedstaden and Svealand regions (Sörmland, Stockholm, Uppsala and Västmandland) [19]; the MOCI frequencies were also correlated with the most common lineage in respective countries and regions, suggesting community transmission, although these MOCI did not go on to become the dominant variant in those locations. This is in line with most within-host variants getting lost during transmission, and only a few founding infections being maintained in a given population [20].

Supplementary Figure S4. Frequency of the P681R, N501Y and D614G mutations cooccurring in the eight countries with more than 50 observations between October 2020 and October 2021.

For visual clarity, only the SARS-CoV-2 lineage with the maximum count is labelled.

Figure 5. World map depicting the 65 countries where SARS-CoV-2 variants containing the cooccurring MOCI were recorded between October 2020 and October 2021.

(A) world map, with zoomed in view of the regions observed with MOCI shown for (B) Sweden (C) Denmark and (D) USA. Countries are color-coded based on the number of records of MOCI observed, as shown in the legend, with greyed out regions indicating no observations. For some countries such as Turkey, region/city-level information was not available, but EP _ISL numbers can be provided upon request.

Cooccurrence of the MOCI in the UK occurred mostly with the Alpha VOC’s B.1.1.7 and Q.4 from late 2020 to May 2021; followed by the second wave of the Delta VOC since April 2021, especially in AY.4 that had the highest (58%) prevalence (Figure 2). The former period had ten times the death rate (∼1200 per week between September 2020 and March 2021) compared to the latter (∼120 per week; probably due to lockdowns and 30% of the population already double-vaccinated). In the UK which has a very high rate of sequencing, the overall numbers of cooccurring MOCI were still low (133 records, 88% in England). Of these, there were respectively 52, 39 and 18 instances of the AY.4 acquiring N501Y, and the B.1.1.7 and Q.4 acquiring P681R [21,22]. It is possible that the lockdown protocols and vaccination coverage may have influenced the observed viral transmission dynamics [23]. In the Americas, most of 1,078 cooccurrences of the MOCI were detected either in the US East Coast (673 entries), or in the Sul (Santa Catarina) and Sudeste (São Paulo, Rio de Janeiro) regions in the South and Southeast of Brazil (284 entries). These trends could be due to founder effects and lack of travel restrictions from early 2021, especially in the United States [24] (Figures 2 and 5). In Brazil, the cooccurrences of the MOCI also corresponded with the prevalence of P.1 until August 2021 after which the AY.99.2 became the dominant variant (Supplementary Figure S4). Unfortunately, regional and city level information was not available for Turkey on GISAID, preventing further analysis and insights and highlighting the importance of metadata [16].

3.5. Y501-R681 doesn’t outcompete other variants

From our analysis, SARS-CoV-2 isolates containing Y501-R681 in Spike did not outcompete other variants in every single instance we have examined (Figures 2 and 3), which is both counterintuitive and a relief to worldwide public health efforts. Although individually these two mutations have been reported to confer selective advantage [8,9,10,11,12,13,14], their combination apparently has not and we have not found any compelling biomolecular reason for this (Figure 4). It would be desirable to conduct in vitro and in vivo studies to gain a better understanding, either using infectious clones and reverse genetics [25,26,27,28], or using naturally occurring comparable isolates with and without these mutations [see for instance 7]

3.6. Implications for the Omicron VOC

Alpha and Omicron each have the Y501-H681 combination which is present in 1,085,434 entries out of the 4.2 million we have examined in this study (i.e. over 25%); this contrasts with the Y501-R681 of which we have only found 3,688 observations. Just as the Q.4 sub-lineage with Y501-R681emerged when the Alpha VOC started to spread, it is possible that a new sub-lineage of Omicron with Y501-R681 could emerge given the rapid rise of this latest VOC [29,30,31]. In this context, it is important to highlight the problem of sequencing artefact, for example 65 high coverage sequences of Omicron with P681R recently appeared on GISAID but were subsequently corrected by the submitting laboratory following repeat libraries with fewer cycles and other improvements [32]. The spread of VOC in immunocompromised patients could lead to mutations with a selective advantage for antibody escape and/or transmissibility (e.g. N501Y and P681H/R), in addition to a number of deletions in the N terminal domain [33], and this aspect needs further investigation given Omicron’s large number of mutations (many of them yet to be studied in depth). The adaptation and evolution of this virus in new hosts, for instance mice and rats, also pose additional risks such as additional reservoirs and reinfection of humans [11]. Investigations into the dynamics affecting the SARS-CoV-2 pandemic in each of the eight countries are also warranted, in the wider context of travel restrictions, population demographics and dynamics [34,35]. Lack of patient-deidentified metadata in a consistent format further complicates meaningful analysis as emphasized before [16].

3.7. SARS-CoV-2 evolution and selective pressures on quasispecies

Any virus which is new to a host will undergo a period of adaptation, as numerous selective pressures come to bear. RNA viruses, such as coronaviruses, are error-prone in their replication and exist as a cloud of variants, with the dominant clade representing the majority population, but with a number of diverse variants also represented within the population, known as quasispecies [2,3,36,37]. Among viruses which are established in their host, the dominant clade is generally relatively ‘fit’ in its ability to replicate and out-competes all the other clades. However, when a virus is new to a host, its relative fitness may be quite low, so, in any particular environment, different clades can have similar relative fitness. This means subtle differences might give a particular clade a small advantage enabling it to predominate as the population continues its host adaptation. The environment may comprise of host factors, such as host genetics, tissue tropism, age, immune status or competence, presence of other infections, as well as external factors, such as temperature and humidity. With SARS-CoV-2, the extraordinary public health measures in some regions of the world are likely exerting their own selective pressures on its evolution and creating numerous local environments. This phenomenon, called the ‘survival of the flattest’ [3] can lead to the emergence of multiple clades dominating under different environments. These is also an increasing appreciation that, within the host, different clades may predominate in different tissues at the same time and that there may be interactions among the diverse quasispecies that facilitate the infection [38].

The pathway of adaptation is not always towards severe disease, since this may not lead to maximum replication and transmission. Indeed, many viruses with a long history of acute infection in an established host species, the disease may be mild or inapparent, reflecting a relationship more towards stasis, where the impact on the host does not compromise the chances of the progeny virus being transmitted to a new host. For many coronaviruses, we have seen an evolution towards milder infection, albeit with greater transmission rates than has been seen with SARS-CoV-2 to date. Certainly, some level of reduction in the efficacy of vaccine-induced antibody in neutralising the Delta and Omicron variants has been demonstrated, but the other feature of these viruses, which is less often discussed, is the ability of these VOC to replicate by means of syncytial formation, rather than the simple apoptotic cycle seen with earlier variants [39]. This may confer an important immune evasive aspect to replication, which might offer a distinct advantage, in immune avoidance. It might also explain why these clades are more productive and less pathogenic since systemic infection is not always involved.

3.8. Error catastrophe, Muller’s ratchet and implications for SARS-CoV-2

A phenomenon related to quasispecies evolution is that of error catastrophe, which may occur if the mutational rate exceeds the ability of the fitness landscape to accommodate the resultant evolving clades [40]. The key triggers for this are not fully understood, but it is thought that the application of strict biosecurity measures may play a part by restricting the availability of new hosts. In such cases, the quasispecies becomes increasingly attenuated, with mutations accumulating via a mechanism called ‘Muller’s ratchet’, to the point where the epidemic may die out [41]. This has been seen in many prior outbreaks of Ebola, where the infection became increasingly less pathogenic and less transmissible. While the current environment of SARS-CoV-2 is complex and highly variable, the selective pressure on a virus is towards higher transmissibility and milder disease in all environments, and the current systems of control preferentially select for such. Once a virus develops either/both of these traits, it is very unusual for the reverse to occur because viruses with higher transmissibility out-compete those with lesser transmissibility and those which are milder disable the host less, so they continue to travel and mix with susceptible hosts. It remains to be seen if this is being observed with the Omicron VOC as early reports indicate higher transmissibility and less severe disease.

In comparing variants of SARS-CoV-2, it is tempting to focus on specific mutations and their location – also perhaps with overt attention being paid to the Spike protein, especially its receptor binding domain – and not consider that the observed changes could evolve independently of each other. The secondary and tertiary folding of proteins, as well as post translational modifications, can have a profound effect on function, such that a mutation at one site might require a complimentary change at another site in order to confer advantage, or be wiped out by another, seemingly unrelated change. For the same reason, convergent evolution, might also occur, exhibited by the co-existence of identical co-mutations in virus clades of different lineage [42]. Another phenomenon known to occur in many coronaviruses is recombination, where partial genomic exchange may occur between two coronaviruses when simultaneously infecting the same host. The extent to which this may have occurred in SARS-CoV-2 is the likely subject of future analysis [43,44].

4. Methods

4.1. Bioinformatics analysis

We mined GISAID data on 3 November 2021 containing 4,835,087 entries, 99.9% (4,831,865) sequenced from people who tested COVID-19 positive. Of these, 4,748,462 contained over 29 thousand nucleotides and were deemed ‘complete’. GISAID has also defined a subset of these 4.8 million sequences, containing 3,519,085 entries, as ‘high coverage’. This is because 99% of their bases are defined; there is no unverified deletion or insertion; and unique mutations not seen in other sequences constitute fewer than 0.05% in these entries. Consequently, some of our analysis of the data downloaded on 1 November 2021 (4,766,139 sequences) required manual curation whenever we encountered incorrect dates, or low coverage sequencing that could skew our results. Supplementary Table S5 provides GISAID’s break down of these 4.8 million entries by variants containing the D614G (98.8%), N501Y (27.01%) and P681R (48.2%) mutations; by Alpha (23.9%), Beta (0.8%), Gamma (2.4%), Delta (47.8%), Omicron (0.0%) VOC; and by Lambda (0.2%) and Mu (0.3%) variants of interest (VOI).

Human-origin SARS-CoV-2 sequences over 29 kb length were aligned to the nominal reference (EPI_ISL_402123) using an in-house alignment pipeline that generated a file in the ‘variant call format’ (VCF) containing all the mutations as of 1 November 2021 [1,2]. In the VCF containing 4,294,469 genome sequences, we searched for entries with MOCI, for example, the Spike N501Y was searched at nucleotide position 23403 for an A to G transition. The EPI_ISL sample IDs containing our MOCI were merged with their respective GISAID metadata (location, date of collection, lineage) to create an annotated database. Supplementary Table S5 shows the split up of these 4.3 million sequences by variants containing the D614G (98.8%), N501Y (29.2%) and P681R (44.2%) mutations; and by Alpha (26.3%), Beta (0.9%), Gamma (2.2%), Delta (43.8%), Omicron (0.0%), Lambda (0.2%) and Mu (0.3%) VOC/VOI.

Using a custom Python code, we identified the presence of the D614G, N501Y and P681R mutations, and combinations thereof (c.f. ‘Data Availability Statement’ for our code). Our final dataset contained 4,177,098 entries after filtering out GISAID records with incomplete dates, i.e. retaining those with a ‘YYYY-MM-DD’ format. We ascertained whether these isolates are one of the VOC (and if yes, which sub-lineage), their location (country, state, and city depending on information available), and sample collection date. Using the latter, we calculated rolling averages over 14-days, as this window has been shown from previous experience to be an optimal period to reduce background noise, especially as the data is discrete and highly variable in size across countries [11]. However, unlike our previous study in which we were interested in macroscopic trends, here we have looked at every instance of key mutations cooccurring; therefore, we did not use a threshold or filter based on minimal sample size. Our analysis presents the spread of these mutations from the notional start of the SARS-CoV-2 pandemic (31 December 2019) to the end of our observation period (01 November 2021) with appropriate classifications (e.g. Alpha, Beta, Gamma, Delta).

4.2. Biomolecular modelling

Fully glycosylated in silico models of the SARS-CoV-2 Spike protein were constructed based on the ‘6VSB’ protein databank (PDB) structure [45], minus the transmembrane domain (residues 1161-1272) for additional computational efficiency as described previously [ 7]. Models were simulated in aqueous solution (TIP3, water, 0.15M ions, NVT ensemble, 310K) using NAMD 2.14 software [46]. Models were visualized with the visual molecular dynamics (VMD) program for large biomolecular systems which uses 3-D graphics and built-in scripting to assess the spatial arrangement of mutations and any complementary or inhibitory interactions [47].

5. Conclusions and limitations

SARS-CoV-2 is the most sequenced virus in the world, however, there is an inherent bias introduced by the highly variable sequencing-to-COVID-19 positivity ratio (S:P) across countries (e.g., the UK has one of the highest P:S ratios in the world), and across time (S:P has generally gone up in all countries, e.g. India). Thus, the GISAID data is likely to contain non-random samples with a skew in favour of sequences associated with epidemiologically consequent cases, albeit this is difficult to decipher as most sequences have little or no meaningful patient-deidentified information. The lockdowns and vaccination coverage associated with this viral pandemic have also been unprecedented in human history and spatio-temporally variable, further complicating our analysis. Notwithstanding these limitations, which have led to the proliferation of certain clades in certain environments, we are still able to make broad conclusions thanks to the very large number of sequences available on GISAID. We are also able to demonstrate a process and pipeline to analyse mutations of structural, functional and epidemiological consequence from public domain information in a systematic manner and provide early warnings of certain combinations starting to spread. Though it is unclear which combinations of mutations provide the best selective advantage, there is concern that mutations previously observed in other dominant lineages may arise spontaneously in contemporary strains further increasing their potential for infectivity and adverse clinical outcomes. In this case-study we analysed three key mutations (viz. D614G, N501Y and P681R) cooccurring but found no evidence of this combination spreading. Although 3,678 sequences were found on GISAID between 17 October 2020 to 1 November 2021, mainly in France, Turkey and USA, the Y501-R681 combination hasn’t outcompeted other variants and this warrants further in silico, in vitro and in vivo investigations. With the latest VOC Omicron containing a large number of mutations, many yet to be studied in-depth, our methodology will be very useful to understand whether certain combinations of mutations are more transmissible. If GISAID entries are double-checked for sequencing artefact before submission and strengthened with patient-deidentified metadata, then this approach could enable early epidemiological intelligence, for instance on case severity, mortality, and factors such as age, gender, race and co-morbidities that could increase the infection risk.

Data Availability

This paper only uses publicly accessible data that can be downloaded from GISAID, the largest repository of SARS-CoV-2 genome sequences. The lists of GISAID EPI_ISL_ sample IDs and data used by our analysis can be requested from the corresponding author. Our Python code is available at https://github.com/Carol-Lee-gh/Covid-Mutation-Pipieline.git.

https://github.com/Carol-Lee-gh/Covid-Mutation-Pipieline.git

Supplementary Materials

The following supporting information can be downloaded at: www.mdpi.com/xxx/s1, Table S1: Raw data for countries with cooccurring P681R, N501Y and D614G mutations. www.mdpi.com/xxx/s2, Table S2: Observation of key mutations and their combinations on GISAID. www.mdpi.com/xxx/s3, Table S3: Observation of other key SARS-CoV-2 Spike mutations. www.mdpi.com/xxx/s4, Figure S4: Frequency of the P681R, N501Y and D614G mutations cooccurring in the eight countries with more than 50 observations between October 2020 and October 2021. www.mdpi.com/xxx/s5, Table S5: GISAID data on key SARS-CoV-2 mutations classified by VOC/VOI.

Author Contributions

Conceptualization and methodology, S.S.V.; bioinformatics, C.L. and L.O.W.W.; biomolecular modelling and interpretation, M.J.K. and T.W.D.; data analysis and original draft preparation, S.M., C.L. and S.S.V.; funding acquisition, S.S.V. and L.O.W.W.; manuscript review and editing, all authors.

Funding

This work was supported by funding (Principal Investigator: S.S.V.) from the CSIRO Future Science Platforms, National Health and Medical Research Council (MRF2009092), and United States Food and Drug Administration (FDA) Medical Countermeasures Initiative contract (75F40121C00144); and by funding (Principal Investigator: L.O.W.W.) from the Australian Academy of Science and the Australian Department of Industry, Science, Energy and Resources. The article reflects the views of the authors and does not represent the views or policies of the funding agencies including the FDA.

Institutional Review Board and Informed Consent Statements

Not applicable for the following reasons. This study does not involve any human samples, identifiable human data or patient records. It only involves bioinformatics analysis of novel coronavirus genome sequences openly accessible from the public repository ‘GISAID’, and these sequences do not contain any metadata that can be identified with any individual. This includes sample collection dates and anonymous sample IDs that are openly accessible. Our study does not report any exact age or photographs. Clinical trial registration is not applicable as this is neither an interventional study nor an observational study involving humans.

Data Availability Statement

This paper only uses publicly accessible data that can be downloaded from GISAID, the world’s largest repository of SARS-CoV-2 genome sequences. The lists of GISAID’s EPI_ISL_ sample IDs and data used by our analysis can be requested from the corresponding author. Our Python code is available at https://github.com/Carol-Lee-gh/Covid-Mutation-Pipieline.git.

Conflicts of Interest

The authors declare no conflict of interest.

Graphical Abstract

Acknowledgments

We are grateful for support from our CSIRO colleagues at the Australian Centre for Disease Preparedness (https://www.grid.ac/institutes/grid.413322.5), especially Kim Blasdell, Simran Chahal, Alexander McAuley and Nagendrakumar Singanallur, and the Australian e-Health Research Centre, especially Denis Bauer, David Hansen, Yatish Jain, Brendan Hosking and Aidan Tay. We thank Professor Adriana Heguy of New York University Langone Health for discussions on sequences of Omicron with P681R.

Footnotes

carol.lee{at}csiro.au; laurence.wilson{at}csiro.au
shruthi.mangalaganesh{at}csiro.au; trevor.drew{at}csiro.au;
sman0040{at}student.monash.edu
michael.kuiper{at}data61.csiro.au
prof.vasan{at}york.ac.uk
Added discussion on sequencing artefact under revised sections 3.6 and 5. Fixed typos and updated the manuscript as per the journal template, including a graphical abstract at the end. Redundant supplementary Figure S2 has been removed, and supplementary materials are now in the order: Table S1, Table S2, Table S3, Figure S4 and Table S5. As the journal template requires Results ahead of Method, sections and references have been appropriately reordered.

References

1.↵
Zhou, P.; Yang, X.L.; Wang, X.G.; et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020, 579(7798), 270–273, https://doi.org/10.1038/s41586-020-2012-7.
OpenUrl CrossRef PubMed
2.↵
Bauer, D.C.; Tay, A.P.; Wilson, L.O.W.; et al. Supporting pandemic response using genomics and bioinformatics: A case study on the emergent SARS-CoV-2 outbreak. Transboundary and Emerging Diseases 2020, 67(4), 1453–1462, https://doi.org/10.1111/tbed.13588.
OpenUrl
3.↵
Wilke, C.O.; Wang, J.L.; Ofria, C.; Lenski, R.E.; Adami, C. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature 2020, 412(6844), 331–333, https://doi.org/10.1038/35085569.
OpenUrl
4.↵
Grubaugh, N.D.; Petrone, M.E.; Holmes, E.C. We shouldn’t worry when a virus mutates during disease outbreaks. Nature Microbiology 2020, 5(4), 529–530, https://doi.org/10.1038/s41564-020-0690-4.
OpenUrl
5.↵
Callaway, E. The coronavirus is mutating — does it matter? Nature 2020, 585(7824), 174–177, https://doi.org/10.1038/d41586-020-02544-6.
OpenUrl CrossRef PubMed
6.↵
Korber, B.; Fischer, W.M.; Gnanakaran, S.; et al. Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020, 182(4), 812-827.e19, https://doi.org/10.1016/j.cell.2020.06.043.
OpenUrl CrossRef PubMed
7.↵
McAuley, A.J.; Kuiper, M.J.; Durr, P.A.; et al. Experimental and in silico evidence suggests vaccines are unlikely to be affected by D614G mutation in SARS-CoV-2 Spike protein. npj Vaccines 2020, 5, 96, https://doi.org/10.1038/s41541-020-00246-8.
OpenUrl
8.↵
Liu, Y.; Liu, J.; Plante, K.S.; et al. The N501Y Spike substitution enhances SARS-CoV-2 infection and transmission. Nature 2021, https://doi.org/10.1038/s41586-021-04245-0.
9.↵
Abdool Karim, S.S.; de Oliveira, T. New SARS-CoV-2 variants — clinical, public health, and vaccine implications. New England Journal of Medicine 2021, 384(19), 1866–1868, http://doi.org/10.1056/NEJMc2100362.
OpenUrl PubMed
10.↵
Riddell, S.; Goldie, S.; McAuley, A.J.; et al. Live virus neutralisation of the 501Y.V1 and 501Y.V2 SARS-CoV-2 variants following INO-4800 vaccination of ferrets. Frontiers in Immunology 2021, 12, 694857, https://doi.org/10.3389/fimmu.2021.694857.
OpenUrl
11.↵
Kuiper, M.J.; Wilson, L.O.W.; Mangalaganesh, S.; Lee, C.; Reti, D.; Vasan, S.S. But Mouse, you are not alone: On some severe acute respiratory syndrome coronavirus 2 variants infecting mice. ILAR Journal 2021, http://dx.doi.org/10.1093/ilar/ilab031.
12.↵
Callaway, E. The mutation that helps Delta spread like wildfire. Nature 2021, 596(7873), 472–473, https://doi.org/10.1038/d41586-021-02275-2.
OpenUrl
13.↵
Liu, Y.; Liu, J.; Johnson, B.A.; et al. Delta Spike P681R mutation enhances SARS-CoV-2 fitness over Alpha variant. bioRxiv 2021, https://doi.org/10.1101/2021.08.12.456173.
14.↵
Peacock, T.P.; Sheppard, C.M.; Brown, J.C.; et al. The SARS-CoV-2 variants associated with infections in India, B.1.617, show enhanced Spike cleavage by furin. bioRxiv 2021, https://doi.org/10.1101/2021.05.28.446163.
15.↵
Elbe, S.; Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges 2017, 1(1), 33–46, https://doi.org/10.1002/gch2.1018.
OpenUrl
16.↵
Bauer, D.C.; Metke-Jimenez, A.; Maurer-Stroh, S.; et al. Interoperable medical data: The missing link for understanding COVID-19. Transboundary and Emerging Diseases 2021, 68(4), 1753–1760, https://doi.org/10.1111/tbed.13892.
OpenUrl
17.↵
Remacle, A.G.; Shiryaev, S.A.; Oh, E.S.; et al. Substrate cleavage analysis of furin and related proprotein convertases: A comparative study. The Journal of Biological Chemistry 2008, 283(30), 20897–20906, https://doi.org/10.1074/jbc.M803762200.
OpenUrl Abstract/FREE Full Text
18.↵
Mahoney, M.; Damalanka, V.C.; Tartell, M.A.; et al. A Novel Class of TMPRSS2 inhibitors potently block SARS-CoV-2 and MERS-CoV viral entry and protect human epithelial Lung Cells. bioRxiv 2021, https://doi.org/10.1101/2021.05.06.442935.
19.↵
Demographics | Statista, 2016-2020. Available online: https://www.statista.com/markets/411/topic/446/demographics/ (accessed on 22 December 2021).
20.↵
Lythgoe, K.A.; Hall, M.; Ferretti, L.; et al. SARS-CoV-2 within-host diversity and transmission. Science 2021, 372(6539), eabg0821, https://doi.org/10.1126/science.abg0821.
OpenUrl Abstract/FREE Full Text
21.↵
Latif, A.A.; Mullen, J.L.; Alkuzweny, M.; et al. outbreak.info. Available online: https://outbreak.info/location-reports?loc=GBR&selected=Delta&selected=Alpha (accessed on 22 December 2021).
22.↵
Ritchie, H.; Mathieu, E.; Rodés-Guirao, L.; et al. Coronavirus pandemic (COVID-19), 2020. OurWorldInData. Available online: https://ourworldindata.org/coronavirus (accessed on 22 December 2021).
23.↵
Timeline of UK Government Coronavirus Lockdowns, 2021. The Institute for Government. Available online: https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns (accessed on 22 December 2021).
24.↵
Travel restrictions issued by states in response to the coronavirus (COVID-19) pandemic, 2020-2021. Ballotpedia. Available online: https://ballotpedia.org/Travel_restrictions_issued_by_states_in_response_to_the_coronavirus_(COVID-19)_pandemic,_2020-2021 (accessed on 22 December 2021).
25.↵
Thao, T.T.N.; Labroussaa, F.; Ebert, N.; et al. Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform. Nature 2020, 582(7813), 561–565, https://doi.org/10.1038/s41586-020-2294-9.
OpenUrl
26.↵
Xie, X.; Muruato, A.; Lokugamage, K.G.; et al. An infectious CDNA clone of SARS-CoV-2. Cell Host & Microbe 2020, 27(5), 841–848, https://doi.org/10.1016/j.chom.2020.04.004.
OpenUrl
27.↵
Hou, Y.J.; Okuda, K.; Edwards, C.E.; et al. SARS-CoV-2 reverse genetics reveals a variable infection gradient in the respiratory tract. Cell 2020, 182(2), 429–446, https://doi.org/10.1016/j.cell.2020.05.042.
OpenUrl PubMed
28.↵
Xie, X.; Lokugamage, K.G.; Zhang, X.; et al. Engineering SARS-CoV-2 using a reverse genetic system. Nature Protocols 2021, 16(3), 1761–1784, https://doi.org/10.1038/s41596-021-00491-8.
OpenUrl
29.↵
Ferguson, N.; Ghani, A.; Cori, A.; Hogan, A.; Hinsley, W.; Volz, E. Report 49 - Growth, population distribution and immune escape of Omicron in England. Imperial College London. Available online: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-49-Omicron/ (accessed on 22 December 2021).
30.↵
Cele, S.; Jackson, L.; Khoury, D.S.; et al. SARS-CoV-2 Omicron has extensive but incomplete escape of Pfizer BNT162b2 elicited neutralization and requires ace2 for infection. medRxiv 2021, https://doi.org/10.1101/2021.12.08.21267417.
31.↵
Pulliam, J.R.C.; Schalkwyk, C.V.; Govender, N.; et al. Increased risk of SARS-CoV-2 reinfection associated with emergence of the Omicron variant in South Africa. medRxiv 2021, https://doi.org/10.1101/2021.11.11.21266068.
32.↵
Heguy, A. (NYU Langone Health, New York, NY, USA). Personal communication to S.S. Vasan, 30 December 2021.
33.↵
Corey, L.; Beyrer, C.; Cohen, M.S.; et al. SARS-CoV-2 variants in patients with immunosuppression. New England Journal of Medicine 2021, 385(6), 562–566, https://doi.org/10.1056/NEJMsb2104756.
OpenUrl CrossRef PubMed
34.↵
Grépin, K.A.; Ho, T.L.; Liu, Z.; Marion, S.; Piper, J.; Worsnop, C.Z.; Lee, L. Evidence of the effectiveness of travel-related measures during the early phase of the COVID-19 pandemic: A rapid systematic review. BMJ Global Health 2021, 6(3), e004537, https://doi.org/10.1136/bmjgh-2020-004537.
OpenUrl Abstract/FREE Full Text
35.↵
Liebig, J.; Najeebullah, K.; Jurdak, R.; Shoghri, A.E.; Paini, D. Should international borders re-open? The impact of travel restrictions on COVID-19 importation risk. BMC Public Health 2021, 21(1), 1573, https://doi.org/10.1186/s12889-021-11616-9.
OpenUrl
36.↵
Domingo, E.; Perales, C. Viral quasispecies. PLoS Genetics 2019, 15(10), e1008271, https://doi.org/10.1371/journal.pgen.1008271.
OpenUrl
37.↵
Drew, T.W. The emergence and evolution of swine viral diseases: To what extent have husbandry systems and global trade contributed to their distribution and diversity. Revue Scientifique et Technique de l’OIE 2011, 30(1), 95–106, https://doi.org/10.20506/rst.30.1.2020.
OpenUrl
38.↵
Sun, F.; Wang, X.; Tan, S.; et al. SARS-CoV-2 quasispecies provides an advantage mutation pool for the epidemic variants. Microbiology Spectrum 2021, 9(1), e00261–21, https://doi.org/10.1128/Spectrum.00261-21.
OpenUrl
39.↵
Li, L.; Li, Q.; Shi, Y. Syncytia formation during SARS-CoV-2 lung infection: a disastrous unity to eliminate lymphocytes. Cell Death & Differentiation 2021, 28, 2019–2021, https://doi.org/10.1038/s41418-021-00795-y
OpenUrl
40.↵
Tejero, H.; Marín, A.; Montero, F. The relationship between the error catastrophe, survival of the flattest, and natural selection. BMC Evolutionary Biology 2011, 11(1), 2, https://doi.org/10.1186/1471-2148-11-2.
OpenUrl
41.↵
de Alcañíz, J.G.G.; López-Rodas, V.; Costas, E. Sword of Damocles or choosing well: Population genetics sheds light into the future of the COVID-19 pandemic and SARS-CoV-2 new mutant strains. medRxiv 2021, https://doi.org/10.1101/2021.01.16.21249924.
42.↵
Brüssow, H. COVID-19: emergence and mutational diversification of SARS-CoV-2. Microbial Biotechnology 2021, 14(3), 756–768, https://doi.org/10.1111/1751-7915.13800.
OpenUrl
43.↵
Haddad, D.; John, S.E.; Mohammad, A.; et al. SARS-CoV-2: Possible recombination and emergence of potentially more virulent strains. PLoS One 2021, https://doi.org/10.1371/journal.pone.0251368.
44.↵
Pollett, S.; Conte, M.A.; Sanborn, M.; et al. A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic. Scientific Reports 2021, 11, 17365, https://doi.org/10.1038/s41598-021-96626-8.
OpenUrl
45.↵
Wrapp, D.; Wang, N.; Corbett, K.S.; et al. Cryo-EM structure of the 2019-nCoV Spike in the prefusion conformation. Science 2020, 367(6483), 1260–1263, https://doi.org/10.1126/science.abb2507.
OpenUrl Abstract/FREE Full Text
46.↵
Phillips, J.C.; Braun, R.; Wang, W.; et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 2005, 26(16), 1781–1802, https://doi.org/10.1002/jcc.20289.
OpenUrl CrossRef PubMed Web of Science
47.↵
Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. Journal of Molecular Graphics 1996, 14(1), 33–38, https://doi.org/10.1016/0263-7855(96)00018-5.
OpenUrl CrossRef PubMed Web of Science

View the discussion thread.

Posted January 02, 2022.

Download PDF

Supplementary Material

Data/Code

Citation Tools

Subject Area

Infectious Diseases (except HIV/AIDS)

Subject Areas

All Articles

Addiction Medicine (316)
Allergy and Immunology (621)
Anesthesia (162)
Cardiovascular Medicine (2298)
Dentistry and Oral Medicine (280)
Dermatology (202)
Emergency Medicine (371)
Endocrinology (including Diabetes Mellitus and Metabolic Disease) (818)
Epidemiology (11623)
Forensic Medicine (10)
Gastroenterology (683)
Genetic and Genomic Medicine (3629)
Geriatric Medicine (341)
Health Economics (622)
Health Informatics (2332)
Health Policy (919)
Health Systems and Quality Improvement (871)
Hematology (336)
HIV/AIDS (758)
Infectious Diseases (except HIV/AIDS) (13202)
Intensive Care and Critical Care Medicine (760)
Medical Education (361)
Medical Ethics (101)
Nephrology (394)
Neurology (3390)
Nursing (193)
Nutrition (512)
Obstetrics and Gynecology (653)
Occupational and Environmental Health (654)
Oncology (1777)
Ophthalmology (526)
Orthopedics (211)
Otolaryngology (284)
Pain Medicine (226)
Palliative Medicine (66)
Pathology (441)
Pediatrics (1013)
Pharmacology and Therapeutics (423)
Primary Care Research (410)
Psychiatry and Clinical Psychology (3103)
Public and Global Health (6021)
Radiology and Imaging (1238)
Rehabilitation Medicine and Physical Therapy (720)
Respiratory Medicine (814)
Rheumatology (370)
Sexual and Reproductive Health (359)
Sports Medicine (319)
Surgery (390)
Toxicology (50)
Transplantation (171)
Urology (143)

[1] 1.↵
Zhou, P.; Yang, X.L.; Wang, X.G.; et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 2020, 579(7798), 270–273, https://doi.org/10.1038/s41586-020-2012-7.
OpenUrl CrossRef PubMed

[2] 2.↵
Bauer, D.C.; Tay, A.P.; Wilson, L.O.W.; et al. Supporting pandemic response using genomics and bioinformatics: A case study on the emergent SARS-CoV-2 outbreak. Transboundary and Emerging Diseases 2020, 67(4), 1453–1462, https://doi.org/10.1111/tbed.13588.
OpenUrl

[3] 3.↵
Wilke, C.O.; Wang, J.L.; Ofria, C.; Lenski, R.E.; Adami, C. Evolution of digital organisms at high mutation rates leads to survival of the flattest. Nature 2020, 412(6844), 331–333, https://doi.org/10.1038/35085569.
OpenUrl

[4] 4.↵
Grubaugh, N.D.; Petrone, M.E.; Holmes, E.C. We shouldn’t worry when a virus mutates during disease outbreaks. Nature Microbiology 2020, 5(4), 529–530, https://doi.org/10.1038/s41564-020-0690-4.
OpenUrl

[5] 5.↵
Callaway, E. The coronavirus is mutating — does it matter? Nature 2020, 585(7824), 174–177, https://doi.org/10.1038/d41586-020-02544-6.
OpenUrl CrossRef PubMed

[6] 6.↵
Korber, B.; Fischer, W.M.; Gnanakaran, S.; et al. Tracking changes in SARS-CoV-2 Spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell 2020, 182(4), 812-827.e19, https://doi.org/10.1016/j.cell.2020.06.043.
OpenUrl CrossRef PubMed

[7] 7.↵
McAuley, A.J.; Kuiper, M.J.; Durr, P.A.; et al. Experimental and in silico evidence suggests vaccines are unlikely to be affected by D614G mutation in SARS-CoV-2 Spike protein. npj Vaccines 2020, 5, 96, https://doi.org/10.1038/s41541-020-00246-8.
OpenUrl

[8] 8.↵
Liu, Y.; Liu, J.; Plante, K.S.; et al. The N501Y Spike substitution enhances SARS-CoV-2 infection and transmission. Nature 2021, https://doi.org/10.1038/s41586-021-04245-0.

[9] 9.↵
Abdool Karim, S.S.; de Oliveira, T. New SARS-CoV-2 variants — clinical, public health, and vaccine implications. New England Journal of Medicine 2021, 384(19), 1866–1868, http://doi.org/10.1056/NEJMc2100362.
OpenUrl PubMed

[10] 10.↵
Riddell, S.; Goldie, S.; McAuley, A.J.; et al. Live virus neutralisation of the 501Y.V1 and 501Y.V2 SARS-CoV-2 variants following INO-4800 vaccination of ferrets. Frontiers in Immunology 2021, 12, 694857, https://doi.org/10.3389/fimmu.2021.694857.
OpenUrl

[11] 11.↵
Kuiper, M.J.; Wilson, L.O.W.; Mangalaganesh, S.; Lee, C.; Reti, D.; Vasan, S.S. But Mouse, you are not alone: On some severe acute respiratory syndrome coronavirus 2 variants infecting mice. ILAR Journal 2021, http://dx.doi.org/10.1093/ilar/ilab031.

[12] 12.↵
Callaway, E. The mutation that helps Delta spread like wildfire. Nature 2021, 596(7873), 472–473, https://doi.org/10.1038/d41586-021-02275-2.
OpenUrl

[13] 13.↵
Liu, Y.; Liu, J.; Johnson, B.A.; et al. Delta Spike P681R mutation enhances SARS-CoV-2 fitness over Alpha variant. bioRxiv 2021, https://doi.org/10.1101/2021.08.12.456173.

[14] 14.↵
Peacock, T.P.; Sheppard, C.M.; Brown, J.C.; et al. The SARS-CoV-2 variants associated with infections in India, B.1.617, show enhanced Spike cleavage by furin. bioRxiv 2021, https://doi.org/10.1101/2021.05.28.446163.

[15] 15.↵
Elbe, S.; Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges 2017, 1(1), 33–46, https://doi.org/10.1002/gch2.1018.
OpenUrl

[16] 16.↵
Bauer, D.C.; Metke-Jimenez, A.; Maurer-Stroh, S.; et al. Interoperable medical data: The missing link for understanding COVID-19. Transboundary and Emerging Diseases 2021, 68(4), 1753–1760, https://doi.org/10.1111/tbed.13892.
OpenUrl

[17] 17.↵
Remacle, A.G.; Shiryaev, S.A.; Oh, E.S.; et al. Substrate cleavage analysis of furin and related proprotein convertases: A comparative study. The Journal of Biological Chemistry 2008, 283(30), 20897–20906, https://doi.org/10.1074/jbc.M803762200.
OpenUrl Abstract/FREE Full Text

[18] 18.↵
Mahoney, M.; Damalanka, V.C.; Tartell, M.A.; et al. A Novel Class of TMPRSS2 inhibitors potently block SARS-CoV-2 and MERS-CoV viral entry and protect human epithelial Lung Cells. bioRxiv 2021, https://doi.org/10.1101/2021.05.06.442935.

[19] 19.↵
Demographics | Statista, 2016-2020. Available online: https://www.statista.com/markets/411/topic/446/demographics/ (accessed on 22 December 2021).

[20] 20.↵
Lythgoe, K.A.; Hall, M.; Ferretti, L.; et al. SARS-CoV-2 within-host diversity and transmission. Science 2021, 372(6539), eabg0821, https://doi.org/10.1126/science.abg0821.
OpenUrl Abstract/FREE Full Text

[21] 21.↵
Latif, A.A.; Mullen, J.L.; Alkuzweny, M.; et al. outbreak.info. Available online: https://outbreak.info/location-reports?loc=GBR&selected=Delta&selected=Alpha (accessed on 22 December 2021).

[22] 22.↵
Ritchie, H.; Mathieu, E.; Rodés-Guirao, L.; et al. Coronavirus pandemic (COVID-19), 2020. OurWorldInData. Available online: https://ourworldindata.org/coronavirus (accessed on 22 December 2021).

[23] 23.↵
Timeline of UK Government Coronavirus Lockdowns, 2021. The Institute for Government. Available online: https://www.instituteforgovernment.org.uk/charts/uk-government-coronavirus-lockdowns (accessed on 22 December 2021).

[24] 24.↵
Travel restrictions issued by states in response to the coronavirus (COVID-19) pandemic, 2020-2021. Ballotpedia. Available online: https://ballotpedia.org/Travel_restrictions_issued_by_states_in_response_to_the_coronavirus_(COVID-19)_pandemic,_2020-2021 (accessed on 22 December 2021).

[25] 25.↵
Thao, T.T.N.; Labroussaa, F.; Ebert, N.; et al. Rapid reconstruction of SARS-CoV-2 using a synthetic genomics platform. Nature 2020, 582(7813), 561–565, https://doi.org/10.1038/s41586-020-2294-9.
OpenUrl

[26] 26.↵
Xie, X.; Muruato, A.; Lokugamage, K.G.; et al. An infectious CDNA clone of SARS-CoV-2. Cell Host & Microbe 2020, 27(5), 841–848, https://doi.org/10.1016/j.chom.2020.04.004.
OpenUrl

[27] 27.↵
Hou, Y.J.; Okuda, K.; Edwards, C.E.; et al. SARS-CoV-2 reverse genetics reveals a variable infection gradient in the respiratory tract. Cell 2020, 182(2), 429–446, https://doi.org/10.1016/j.cell.2020.05.042.
OpenUrl PubMed

[28] 28.↵
Xie, X.; Lokugamage, K.G.; Zhang, X.; et al. Engineering SARS-CoV-2 using a reverse genetic system. Nature Protocols 2021, 16(3), 1761–1784, https://doi.org/10.1038/s41596-021-00491-8.
OpenUrl

[29] 29.↵
Ferguson, N.; Ghani, A.; Cori, A.; Hogan, A.; Hinsley, W.; Volz, E. Report 49 - Growth, population distribution and immune escape of Omicron in England. Imperial College London. Available online: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-49-Omicron/ (accessed on 22 December 2021).

[30] 30.↵
Cele, S.; Jackson, L.; Khoury, D.S.; et al. SARS-CoV-2 Omicron has extensive but incomplete escape of Pfizer BNT162b2 elicited neutralization and requires ace2 for infection. medRxiv 2021, https://doi.org/10.1101/2021.12.08.21267417.

[31] 31.↵
Pulliam, J.R.C.; Schalkwyk, C.V.; Govender, N.; et al. Increased risk of SARS-CoV-2 reinfection associated with emergence of the Omicron variant in South Africa. medRxiv 2021, https://doi.org/10.1101/2021.11.11.21266068.

[32] 32.↵
Heguy, A. (NYU Langone Health, New York, NY, USA). Personal communication to S.S. Vasan, 30 December 2021.

[33] 33.↵
Corey, L.; Beyrer, C.; Cohen, M.S.; et al. SARS-CoV-2 variants in patients with immunosuppression. New England Journal of Medicine 2021, 385(6), 562–566, https://doi.org/10.1056/NEJMsb2104756.
OpenUrl CrossRef PubMed

[34] 34.↵
Grépin, K.A.; Ho, T.L.; Liu, Z.; Marion, S.; Piper, J.; Worsnop, C.Z.; Lee, L. Evidence of the effectiveness of travel-related measures during the early phase of the COVID-19 pandemic: A rapid systematic review. BMJ Global Health 2021, 6(3), e004537, https://doi.org/10.1136/bmjgh-2020-004537.
OpenUrl Abstract/FREE Full Text

[35] 35.↵
Liebig, J.; Najeebullah, K.; Jurdak, R.; Shoghri, A.E.; Paini, D. Should international borders re-open? The impact of travel restrictions on COVID-19 importation risk. BMC Public Health 2021, 21(1), 1573, https://doi.org/10.1186/s12889-021-11616-9.
OpenUrl

[36] 36.↵
Domingo, E.; Perales, C. Viral quasispecies. PLoS Genetics 2019, 15(10), e1008271, https://doi.org/10.1371/journal.pgen.1008271.
OpenUrl

[37] 37.↵
Drew, T.W. The emergence and evolution of swine viral diseases: To what extent have husbandry systems and global trade contributed to their distribution and diversity. Revue Scientifique et Technique de l’OIE 2011, 30(1), 95–106, https://doi.org/10.20506/rst.30.1.2020.
OpenUrl

[38] 38.↵
Sun, F.; Wang, X.; Tan, S.; et al. SARS-CoV-2 quasispecies provides an advantage mutation pool for the epidemic variants. Microbiology Spectrum 2021, 9(1), e00261–21, https://doi.org/10.1128/Spectrum.00261-21.
OpenUrl

[39] 39.↵
Li, L.; Li, Q.; Shi, Y. Syncytia formation during SARS-CoV-2 lung infection: a disastrous unity to eliminate lymphocytes. Cell Death & Differentiation 2021, 28, 2019–2021, https://doi.org/10.1038/s41418-021-00795-y
OpenUrl

[40] 40.↵
Tejero, H.; Marín, A.; Montero, F. The relationship between the error catastrophe, survival of the flattest, and natural selection. BMC Evolutionary Biology 2011, 11(1), 2, https://doi.org/10.1186/1471-2148-11-2.
OpenUrl

[41] 41.↵
de Alcañíz, J.G.G.; López-Rodas, V.; Costas, E. Sword of Damocles or choosing well: Population genetics sheds light into the future of the COVID-19 pandemic and SARS-CoV-2 new mutant strains. medRxiv 2021, https://doi.org/10.1101/2021.01.16.21249924.

[42] 42.↵
Brüssow, H. COVID-19: emergence and mutational diversification of SARS-CoV-2. Microbial Biotechnology 2021, 14(3), 756–768, https://doi.org/10.1111/1751-7915.13800.
OpenUrl

[43] 43.↵
Haddad, D.; John, S.E.; Mohammad, A.; et al. SARS-CoV-2: Possible recombination and emergence of potentially more virulent strains. PLoS One 2021, https://doi.org/10.1371/journal.pone.0251368.

[44] 44.↵
Pollett, S.; Conte, M.A.; Sanborn, M.; et al. A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic. Scientific Reports 2021, 11, 17365, https://doi.org/10.1038/s41598-021-96626-8.
OpenUrl

[45] 45.↵
Wrapp, D.; Wang, N.; Corbett, K.S.; et al. Cryo-EM structure of the 2019-nCoV Spike in the prefusion conformation. Science 2020, 367(6483), 1260–1263, https://doi.org/10.1126/science.abb2507.
OpenUrl Abstract/FREE Full Text

[46] 46.↵
Phillips, J.C.; Braun, R.; Wang, W.; et al. Scalable molecular dynamics with NAMD. Journal of Computational Chemistry 2005, 26(16), 1781–1802, https://doi.org/10.1002/jcc.20289.
OpenUrl CrossRef PubMed Web of Science

[47] 47.↵
Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. Journal of Molecular Graphics 1996, 14(1), 33–38, https://doi.org/10.1016/0263-7855(96)00018-5.
OpenUrl CrossRef PubMed Web of Science

Cooccurrence of N501Y, P681R and other key mutations in SARS-CoV-2 Spike

Abstract

1. Introduction

2. Results: Frequency and distribution of D614G, N501Y and P681R mutations

3. Discussion

3.1. Y501-R681 almost always in the G614 background

3.2. Predominance of Alpha Q.4 acquiring P681R and Delta acquiring N501Y

3.3. Alpha acquiring P681R ahead of other VOC could be due to P/H681R

3.4. Cooccurrences have been reported predominantly in eight countries

3.5. Y501-R681 doesn’t outcompete other variants

3.6. Implications for the Omicron VOC

3.7. SARS-CoV-2 evolution and selective pressures on quasispecies

3.8. Error catastrophe, Muller’s ratchet and implications for SARS-CoV-2

4. Methods

4.1. Bioinformatics analysis

4.2. Biomolecular modelling

5. Conclusions and limitations

Data Availability

Supplementary Materials

Author Contributions

Funding

Institutional Review Board and Informed Consent Statements

Data Availability Statement

Conflicts of Interest

Graphical Abstract

Acknowledgments

Footnotes

References

Citation Manager Formats

Subject Area