Summary
Monovalent vaccines using RNA or adenoviruses have successfully controlled the COVID-19 epidemic in many countries. However, viral mutations have hampered the efficacy of this approach. The Omicron variant, in particular, has caused a pandemic which has put pressure on the healthcare system worldwide. Therefore, administration of booster vaccinations has been initiated; however, there are concerns about their effectiveness, sustainability, and possible dangers. There is also the question of how a variant with such isolated mutations originated and whether this is likely to continue in the future. Here, we compare the mutations in the Omicron variant with others by direct PCA to consider questions pertaining to their evolution and characterisation. The Omicron variant, like the other variants, has mutated in its human vectors. The accumulated mutations exceeded the range of acquired immunity, causing a pandemic, and similar mutations are likely to occur in the future. We also compare Omicron with variants that have infected animals and discuss the possibility of a vaccine using a weaker variant of the virus.
Introduction
The COVID-19 epidemic continues, despite the efforts of many countries to bring it to closure1. It is postulated to have started in Wuhan, China. Then, by April 2020, it had spread to Europe and North America. During this progression, it rapidly mutated to form four major sub-groups, three of which are still prevalent today2. COVID-19 is also known to have spread among several susceptible animal species; a problem that, as in the case of humans, continues to manifest itself3-6. Subsequently, increased surveillance at national borders has slowed the spread of the disease across national lines. Further, more potent variants have emerged in each country independently7. The most infectious sub-types have spread across borders and have been designated as variants of concern (VOC) by the WHO8.
Countries have taken measures to prevent the spread of the disease by surveying and isolating the infected people. Vaccines have been rapidly developed; particularly monovalent vaccines. This has been made possible through the use of new technologies, such as RNA vaccines, that have become popular globally. These have proven very effective, and have led to a significant reduction in the number of people infected for a time; even in countries where detection and isolation did not work well9-10.
However, as the virus continues to mutate, the vaccines ‘ effectiveness is waning. The Delta variant has become emblematic of this situation. In turn, more recently, infections by the Omicron sub-type have exploded11-14. Even in Australia, where vaccination rates are high and effective control measures are in place, the latter variant has caused many cases15. Due to its virulence, despite rigorous counter-measures, the Omicron is thought to be the sub-type with a mutation in the Spike protein. This is because the said alteration almost abrogates the effect of the monovalent vaccine; thus it is likely to be infectious even after the third dose16.
Here, we characterise the Omicron variant ‘s genetic sequence using direct principal component analysis (PCA)17 and discuss the mechanism informing its virulent manifestation. This is an objective method to evaluate the characteristics of a sample based on sequence differences; with each PCA axis presenting differences in the nucleotide sequences at specific positions. In this analysis, several axes are used to determine factors, such as the origin of each variant, how it has changed, and its basic characteristics. This data will be compared with that obtained from variants infecting animals to discuss the possibility of developing a vaccine using a weakened variant of the virus.
Results
When we evaluated PCA axes from the data up to April 2020 and compared it to the recent data on these axes, the variants appeared as several groups on three routes (Fig. 1A)2. At this stage, the virus ‘s acclimation to human vectors seems to be complete. The changes made here are of great importance, and this is probably why the current variants retain the same characteristics. In Fig. 1A, the axis shows 27,000 random samples of variants registered up to 27 December 2021. In blue is the VOC of the WHO. All the Omicron variants belonged to group 1.
Incidentally, approximately 1500 sequences of the SARS-CoV-2 variants infecting animals have been registered as of 27 December 2021. Notably, they too belong to one of these mentioned groups (Fig. 1B). In particular, many of the sub-forms prevalent in minks, deer, dogs, cats, and zoo animals are thought to have been transmitted by humans; specifically, they likely originated from variants where many human cases have been observed.
The currently prevalent variants have many more mutations. Fig. 2A reflects the magnitude of mutations in the variants. To equalise the weights, two WHO-VOC each were selected to set the axes. The Omicron variants were observed to be distant from the others. The samples from African countries recorded changes in the variant. As the samples in the upper right corner increased, the number of reported cases increased, suggesting that they became more infectious. The upper rightmost variants have a three-amino acid insertion in the spike protein sequence. This is noteworthy because, while many of the newer variants have some deletions, insertions are rare. Thus, this is probably the variant that has mutated the most. However, the spread of mutations is not the process of change observed on a time-series basis. The first two reported cases in South Africa were already heavily mutated (10/12 and 10/24). The earliest Omicron variant, which was still less mutated, would have been located farther down to the left. It is likely that the disease spread elsewhere, matured, and then the most prevalent variant moved to the sequencing countries (Fig. S1).
The global data in Fig. 2A are shown over time (Fig. 2C and 2D). It can be seen that each epidemic was caused by a single variant, where the change in variants was discontinuous. The gap to the Omicron variant is emphasized by the absence of sufficient African records. This is distinctly different from the case of H1N1 influenza. If the mutations were to accumulate sequentially in one variant, PCs would show sine curves, as seen with H1N1 mutations (Fig. S2). There was one variant of H1N1 per year somewhere in the world, which moved annually while changing itself. After a few years, the variant would change by approximately 15– 30/1000 bases and then return to the same location to cause another epidemic. This is likely because the flu infects many people who then gain acquired immunity.
Omicron was first reported in South Africa; however, Group 1 variants to which Omicron belongs were not prevalent in this country after August 2020 (Fig. 2B). The only Group 1 variant that appeared briefly in July 2021 was C.1.2, which is also quite far from the Omicron variant (Fig. 2A). A closer group 1 variant is B.1.1.519, which was reported by Botswana and Morocco. The relationship between this variant and Omicron and its origin remains unknown because of lack of records.
Omicron is a mutated human variant of COVID-19. However, this variant ‘s mutations did not resemble any of the existing coronaviruses (Fig. 3A and 3 B)18, nor did it have anything in common with SARS-CoV-2 that had infected animals (Fig. 3C). Thus, this eliminated the possibility that it was transmitted from animals19. In particular, the rodent data were completely unrelated to Omicron ‘s mutations (Fig. S4). This animal vector hypothesis originally arose as a result of processing the phylogenetic results with PCA. However, phylogenetic trees are a form of one-dimensional data created based on the distances between sequences. Therefore, these sequences are not comparable. Further, given that PCA is a method for observing multidimensional data, processing one-dimensional results is not its original purpose. Artefacts caused by inappropriate data processing were apparently the source of this concern. As seen in Fig. 1A and 2A, this variant gradually changed from group 1.
When SARS-CoV-2-infected animals, such as mink, deer, dogs, and cats, a ping-pong effect occurred, thereby increasing the number of infected animals. In these animals, acclimatisation occurred quickly. This is similar to the situation in which the initial SARS- CoV-2 variants were acclimatised to humans by April 2020. For example, mutations in PC21 and PC25 (Fig. 3D) on the animal sample axis suggest acclimation to minks and deer in some countries.
The concern about re-infection from these animals to humans is natural. However, variants that are sufficiently far from the human variants, as shown in Fig 3D, are not evident. This is why 27,000 human samples are clustered in the centre. If massive re-emergence should occur in the future, it would be easily confirmed by sequencing. In fact, the only variant that has ever been prevalent in humans is the one in the Netherlands, indicated by the blue arrow. This variant is far from human viruses, but it is even farther from the mink viruses. Thus, it is probably the process of acclimation to the mink. During the epidemic phase dominated by this variant, the mortality rate in the Netherlands reduced by a factor commensurate with the variant titer20.
The mutations occurred mainly in spike glycoprotein (S) and nucleocapsid phosphoprotein (N) (Fig. 4). This is very different from influenza, in which all ORFs change simultaneously at the same rate21. The mutations in Delta variant are larger than those of Alpha. Further, since these are opposite mutations across the initial variant (Fig. 2A), Delta would have been spared much of the immunity gained by Alpha. Lambda has more mutations than these, with Omicron having even more of them. The mutations are mainly in S and N, which the surface proteins of the virus; therefore, there must be strong selection pressure to avoid immunity22. In Omicron, there was a high density of S mutations suggesting that there was selection pressure to avoid the acquired immunity imparted by monovalent vaccines. In Omicron, the mutations are also in the smaller ORFs, which are relatively well preserved. The mutation in the envelope (E) is only one amino acid, but it is very rare. In addition, there are three amino acid mutations in M.
The animal viruses did not show the same concentration of S and N mutations as human viruses, for example, Alpha. Fig. 5 shows the number of mutations for the variants farthest from the human virus population (Fig. 3D). There were more missense mutations; therefore, some amino acid mutations may have been desirable for each host ‘s specificity. However, many small ORFs were retained, and none of them caused major mutations, such as Alpha and Omicron. This does not necessarily mean that variants that are more acclimated to humans are less likely to infect animals, but the examples shown here were relatively early in the process of infecting animals incidentally (so they would have had more time to get away from humans). Newer variants, for example, delta and lambda, can also infect animals (Fig. S5).
Discussion
Omicron did not arise in South Africa. Specifically, the parent of this variant was not prevalent in South Africa. Rather, it probably originated in areas without sequence testing, matured sufficiently to overcome the vaccine-acquired immunity and then entered the sequencing countries. By the time the danger was recognised in the South African survey, the variant had probably already spread to other parts of the world. The current global epidemic may be the result of this delay. Omicron variant, like other variants, has mutated among humans to overcome vaccine-induced immunity. It is likely that mutations that overcome immunity provided by newer vaccines will occur again in the future.
In contrast to the mutations of the influenza H1N1 virus, SARS-CoV-2 mutations were discontinuous. This is because there were three groups in the early stages that evolved independently, in different regions, and after the borders were closed, and the evolved stronger infectious variants were successively released on a global scale. Even a variant as infectious as Delta, for example, does not infect everyone; this is because people are consciously protecting themselves. However, if a new, more infectious variant arises, it can break through these artificial defenses. It is also possible for a very different variant to overcome acquired immunity. With the widespread use of monovalent vaccines, many people are now immune to certain variants. Omicron has been able to evade this immunity and has spread the disease due to high variations from previous variants.
Africa is home to 1.2 billion people, but there are few areas where sequencing is routinely performed. The number of sequences per population in Africa was only 1/150 of that in Europe (Fig. S3), and 40% were from South Africa. Hence, there is a relative dearth of records compared to other regions. This is also the case in many Asian and Latin American regions. Similar gaps in the records can be seen in the H1N1 influenza viral mutations21; which mutate continuously every year, but still sometimes reveal large gaps. Thus, the gaps in Lambda and Omicron (Fig. 2A) are likely due to this lack of records.
The USA and the UK are the most prolific sequencers. However, considering the COVID-19 situation in these countries, it seems that their huge amount of sequencing is not doing much to prevent the spread of infection. If these countries had been more generous lending some of their capacity for sample sequencing to developing countries, they would have been able to detect the new variants more quickly. If detection had occurred at an earlier stage, quarantine could have stopped the spread. Thus, there is a need for international cooperation to conduct such surveys.
Monovalent vaccines have been used to combat the COVID-19 epidemic. These targeted the S- protein and worked well, but the Omicron variants were more capable of evading this immunity. For this reason, many countries and regions are rushing to grant booster vaccinations. However, repeated vaccinations may not be sustainable23-25. In fact, in many areas, even the first round of vaccination has not been completed26. With regard to Israel, the effectiveness of boosters is said to be questionable27. In fact, there is a report that repeated boosters do not work16. There have also been concerns about the dangers of repeated booster doses28. Therefore, quarantine based monovalent vaccines must be revised.
I wish to point out the possibility of using animal-adapted variants to develop a multivalent SARS-CoV-2 vaccine, such as that for the vaccinia virus for smallpox. In fact, a half-adapted mink variant was barely able to spread among humans. It probably had low virulence and was quickly replaced by a more infectious variant. A more adapted variant would probably not be able to spread from humans to humans. Once a weakly toxic variant is selected, it can be maintained and propagated in its host and cultured cells. The efficacy can be expected from the fact that SARS-CoV-2 does not mutate, particularly small ORFs. Perhaps the virus does not have sufficient flexibility. However, in the body, all proteins are presented as antigens. This is why all the ORFs were altered in the influenza virus and this virus has been prevalent for decades21. Such viruses may be less effective in preventing infection than RNA vaccines targeting the S protein. However, they are more resistant to S protein mutations and may hold the potential for preventing severe symptoms.
If a new RNA vaccine becomes available for Omicron, a mutation may occur that overcomes the newly-developed immunity and causes the next pandemic. If this cycle repeats itself, SARS-CoV-2 may continue to change in a discontinuous fashion. This is a calamity that is difficult to control and will take many years to overcome. If, on the contrary, a multivalent vaccine is approved for practical use, the selective pressure would not be concentrated on the spike protein (S), even if SARS-CoV-2 continues to mutate, similar to influenza viruses. In this case, the epidemic will probably be small, making it possible to relax preventive measures. The H1N1 influenza haemagglutinin mutated and replaced most of the protein ‘s surface between the 1970s and 200921. If similar degrees of freedom exist in the S and N of SARS- CoV-2, then these should still have a high mutation potential. Omicron did not simply have many variations. Rather, they mutated just like the other variants which is identical to what we suspect will continue in the future. It is very important to stop this epidemic in each country so that we do not have another VOC. Hence, this effort must be coordinated on a global scale. The production and transportation of weaker variants are much more lower-tech than RNA vaccines, and is probably more sustainable.
Materials and methods
2.1 PCA
Sets of nucleotide sequences were downloaded from GISAID29 on 27 December 2021. However, the set did not include samples from African countries other than South Africa. Thus, to increase the number of African samples, those with complete sequences from 1 July 2021 to 15 January 2022 were also downloaded. Only the complete sequences that contained less than 1,000 N were selected. The sequences were aligned using the DECIPHER30. Subsequently, they were converted to a Boolean vector and subjected to PCA17. Sample and sequence PCs were scaled based on the length of the sequence and the number of samples, respectively31.
The PCA axis shows differences in a specific set of bases. The axis is determined using our designated search dataset. Therefore, depending on the set of samples used, the observed differences will vary. Depending on one ‘s aim, there are several viable sets of axes available. One is the initial axes on human acclimatization, which was created using data up to April 2020, and spread radially across four groups, and was used to determine variant origin. The other axis was derived using two WHO-VOCs8, Alpha to Omicron, to avoid weighting errors due to differences in the number of data. In this axis, the most highly mutated Omicron variant formed PC1. The remaining variants were divided in PC2. This was used to determine variation in the micron variants. In addition, to characterise the samples infecting animals, we used 1500 samples and two WHO-VOCs.
All calculations were performed using R32. The ID, acknowledgements, list of samples used for the WHO-VOC, PCA axes, and scaled PCs of samples and bases can be downloaded from Figshare33. The newest version of the R code is publicly available at GitHub34.
Data Availability
All data produced are available online at Figshare https://doi.org/10.6084/m9.figshare.19029653.v1
Supplement
Footnotes
A timecourse presentation was added to Fig. 2. Supplement for data simulation was added (Fig. S2). A section of results was added to cover these issues. Also, the discussion was added.