Introductory paragraph
SARS-CoV-2 genomic surveillance in Uganda provides an opportunity to provide a focused description of the virus evolution in a small landlocked East African country. Here we show a recent shift in the local epidemic with a newly emerging lineage A.23 evolving into A.23.1 which is now dominating the Uganda cases and has spread to 26 other countries. Although the precise changes in A.23.1 as it has adapted are different from the changes in the variants of concern (VOC), the evolution shows convergence on a similar set of proteins. The A.23.1 spike protein coding region has accumulated changes that resemble many of the changes seen in VOC including a change at position 613, a change in the furin cleavage site that extends the basic amino acid motif, and multiple changes in the immunogenic N-terminal domain. In addition, the A.23.1lineage encodes changes in non-spike proteins that other VOC show (nsp6, ORF8 and ORF9). The clinical impact of the A.23.1 variant is not yet clear, however it is essential to continue careful monitoring of this variant, as well as rapid assessment of the consequences of the spike protein changes for vaccine efficacy.
Main Text
The novel Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2)(1) and the associated disease Coronavirus Disease 2019 (COVID-19)(2)(3) continue to spread throughout the world, causing >120 million infections and >2.6 million deaths (16 Mar 2021, Johns Hopkins COVID-19 Dashboard). Genomic surveillance has played a key role in the response to the pandemic; sequence data from SARS-CoV-2 provides information on the transmission patterns and the evolution of the virus as it enters new regions and spreads. As COVID-19 vaccines become available and are implemented, monitoring SARS-CoV-2 genetic changes, especially changes at epitopes with implications for immune escape is crucial. A detailed classification system has been defined to help monitor SARS-CoV-2 as it evolves (4) with virus sequences classified into 2 main phylogenetic lineages (Pango lineages) A and B, representing the earliest divergence of SARS-CoV-2 in the pandemic and then into sub-lineages within these. Several Variants of Concern (VOC) have emerged showing increased transmission patterns and reduced susceptibility to vaccine and/or therapeutic antibody treatments. These VOC include lineage B.1.1.7 first identified in the UK (5), B.1.351 in South Africa (6) and lineage P.1 (B.1.1.28.1) in Brazil (7).
Status of the SARS-CoV-2 epidemic in Uganda
SARS-CoV-2 infection was first detected in Uganda in March 2020, initially among international travellers until passenger flights were stopped in late March 2020. A second route of virus entry with truck drivers from adjacent countries then became apparent (8). Since August 2020, community transmission dominated the Uganda case numbers. By March 2021 total cases in Uganda were 40,535, with 334 deaths attributed to the virus. We have continued our efforts to generate SARS-CoV-2 genomic sequence data to monitor virus movement and genetic changes and we report here on a novel sub-lineage A (A.23.1) that emerged and is dominating the local epidemic. The A.23.1 variant encodes multiple changes in the spike protein as well as in nsp6, ORF8 and ORF9, some predicted to be functionally similar to those observed in VOC in lineage B.
Changes in prevalence of lineage A viruses
The genomes generated here were classified into Pango lineages(4) using the Pangolin module pangoLEARN (https://github.com/cov-lineages/pangolin) and into NextStrain clades using NextClade (9) (https://clades.nextstrain.org/). The distribution of virus lineages circulating in Uganda changed dramatically over the course of the year. A clear feature of the earlier COVID-19 epidemic in the country was the diversity of viruses found throughout the country attributed to frequent flights into Uganda from Europe, UK, US and Asia; this is reflected in the 9 lineages seen from March to May 2020 with a mixture of both lineage A and B viruses (Figure 1, panel a). After passenger flights were limited in March 2020, the virus entered via land travel with truck drivers. Uganda is landlocked country, characterised by its important geographical position, i.e. the crossing of two main routes of the Trans-Africa Highway in East Africa. The essential nature of produce and goods transport allowed virus movement from/to Kenya, South Sudan, DRC, Rwanda and Tanzania. In the period of June to August 2020 lineage B.1 and B.1.393 strains were abundant, similar to patterns observed in Kenya (10) (Figure 1, panel b) although lineage A viruses did not decline as seen in US and Europe. Lineage A.23 strains were first observed in two prison outbreaks in Amuru and Kitgum, Uganda in August 2020 and by the September-November period, the A.23 was the major lineage circulating throughout the country (Figure 1, panel c). The A.23 virus continued to evolve into the lineage A.23.1, first observed in late October 2020. Given the diversity of virus lineages found in the country from March until November 2020, it was unexpected that by late December 2020 to January 2021, lineage A.23.1 viruses represented 90% (102 of 113 genomes) of all viruses observed in Uganda (Figure 1, panel d). In all time periods, the SARS-CoV-2 positive sample were obtained from multiple clinical and surveillance locations throughout Uganda indicating that the differences are unlikely to be due to sampling different subpopulations in the country at different times.
Virus sequence diversity including fatal cases
All newly and previously generated Uganda genomes that were complete and high-coverage (n=322) were used to construct a maximum-likelihood phylogenetic tree (Figure 2).
A number of A and B variant lineages were observed briefly at low frequencies and may have undergone extinction, similar to patterns observed in the UK (11) and Scotland (12). Genomes identified from a truck driver are often observed basal to community clusters (Figure 2), suggesting the importance of this route in the introduction and spread of the virus into Uganda. Most of genomes from truck drivers sampled at points of entry (POEs) bordering Kenya belonged to lineage B.1 and B.1.393 consistent with the pattern reported in Kenya (10). However, genomes identified from truck drivers from Tanzania, and from the Elegu POE bordering South Sudan, albeit small numbers, belonged to both A and B.1 lineages. Continued monitoring of truck drivers coming in and out of the Uganda provides a useful description of the inland circulation of strains in this part of world, where genomic surveillance is not as detailed as in other parts of the world.
Emergence of A.23 and A.23.1
Outbreaks of SARS-CoV-2 infections were reported in the Amuru and Kitgum prisons in August 2020 (13)(14). The SARS-CoV-2 genome sequences from individuals in the prisons were exclusively belonging to lineage A (Figure 2) with three amino acid (aa) changes encoded in the spike protein (F157L, V367F and Q613H, Figure 3) that now define lineage
A.23 (see below). By October 2020, lineage A.23 viruses were also found outside of the prisons in a community sample from Lira (a town 140 km from Amuru), in two samples from the Kitgum hospital, in several community samples from Kampala, Jinja, Mulago, Tororo, Soroti as well as in 2 truck drivers collected at POE bordering Kenya. By November 2020, the A.23 viruses spread further to northern Uganda in Gulu and Adjumani. Lineage A.23 viruses were not seen in Uganda (or anywhere in the world) before August 2020 (Figure 3 panel c), yet the A.23 viruses were attributed to 32% of the viruses in Uganda (Figure 1) from June to August 2020 and 50% of the observed viruses in September to November 2020. In late October, the A.23.1, a variant evolving from A.23, with additional change in the spike protein (P681R) was observed (Figure 3, panel b, c) and by December 2020-January 2021, 90% of identified genomes (102 out of 113) belonged to the new A.23.1 lineage (Figure 1 and 2). The mutations in A.23.1 were consistent with evolution from an original A.23 virus observed in Amuru/Kitgum cluster (Figure 2 and Supplemental Figure 1) as well as changes nsp6 and ORF9 (Supplemental Figure 2 and 4).
Important changes observed in the spike protein
The spike protein is crucial for virus entry into host cells, for tropism, and is a critical component of COVID-19 vaccine development and monitoring. The changes in spike protein observed in Uganda and global A.23 and A.23.1 viruses are shown in Figure 3 panel b.
Many amino acid (aa) changes were single events with no apparent transmission observed. However, the initial lineage A.23 genomes from Amuru and Kitgum encoded three amino acid changes in the exposed S1 domain of spike (F157L, V367F and Q613H, Figure 3 panel b). The V367F change is reported to modestly increase infectivity(15), the Q613H change may have similar consequences as the D614G change observed in the B.1 lineage found predominantly in Europe and USA; in particular, D614G was reported to increase infectivity, spike trimer stability and furin cleavage (15),(16),(17),(18). These changes were not observed in previously reported genomes from Uganda (8). Of some concern, the mutations E484K and N501Y amino acid changes in the receptor binding domain (RBD) were observed in the A.23 viruses identified in Adjumani cases on 9th to 11th November 2020 (Figure 3, panel b). These two amino acid changes are shown to substantially compromise the vaccine efficacy as well as antibody treatments.
Of concern, the recent Kampala and global A.23.1 virus sequences from December 2020-January 2021 now encoded 4 or 5 amino acid changes in the spike protein (now defining lineage A.23.1, see below) plus additional protein changes in nsp3, nsp6, ORF8 and ORF9 (Figure 3 panel b, Figure 4). The P681R spike change adds a basic amino acid adjacent to the spike furin cleavage site. This same change has been shown in vitro to enhance the fusion activity of the SARS-CoV-2 spike protein, likely due to increased cleavage by the cellular furin protease (20); importantly, a similar change (P681H) is encoded by the recently emerging VOC B.1.1.7 that is now spreading globally across 75 countries as of 5 February 2021 (5) (21). There are also changes in the spike N-terminal domain (NTD), a known target of immune selection, observed in samples from Kampala A.23.1 lineage, including P26S and R102I (Figure 3 panel b). Additionally and importantly, a A.23.1 strain identified in Kampala on 11th December 2020 carried the E484K change in the RBD, which may add further concern of this particular variant as it gains higher transmissibility and enhanced resistance to vaccine and therapeutics. Outside of the spike protein, a single nucleotide change (G27870T) leading to early termination of the ORF7b (E39*) was observed in the A.23.1 from the community cases in Tororo in late December 2020. Although the clinical implication of this change is yet to be determined, it is important to document such change for further follow-up.
New lineage A designations
The viruses detected in Amuru and Kitgum met the criteria for a new SARS-CoV-2 lineage (22)(23) by clustering together on a global phylogenetic tree, sharing epidemiological history and source from a single geographical origin, and encoding multiple defining SNPs. These features including especially the three spike changes F157L, Q613H and V367F define the new lineage A.23. Continued circulation and evolution of A.23 in Uganda was observed and two additional changes in spike R102I and P681R were observed in December 2020 in Kampala; these SNPs define the sub-lineage A.23.1. Additional changes in non-spike regions also define the A.23 and A.23.1, including nsp3: E95K, nsp6: M86I, L98F, ORF 8: L84S, E92K and ORF9 N: S202N, Q418H. These new lineages can be assigned since pangolin version v2.1.10 and pangoLEARN data release 2021-02-01.
Screening SARS-CoV-2 genomic data from GISAID (March 12, 2021), A.23 and A.23.1 viruses are now found in 26 countries outside of Uganda (Figure 3 panel c). The A.23 was first observed in Uganda in August 2020, subsequently in USA in October and Kenya and Rwanda in December (Figure 3 panel c). The A.23.1 was first seen in Uganda in the community cases in Mbale on 28th October 2020 and in Jinja on 29th October 2020, and soon spreading across the country in early November 2020. Outside of Uganda, the A.23.1 was found in England and Cambodia from the end of November, in Rwanda from the beginning of December. Of note, the international flights out of Uganda were restarted on 1 October 2020 with flights to Europe, Asia and USA. Phylogenetic analysis supports the close evolution of A.23 to A.23.1 (Supplementary Figure 1).
Additional changes in Ugandan A.23 and A.23.1 genomes compared to other VOC genomes
Although a main focus has been on spike protein changes, there are changes in other genomic regions of the SARS-CoV-2 virus accompanying the adaptation to human infection. We employed profile Hidden Markov Models (pHMMs) prepared from 44 amino acid peptides across the SARS-CoV-2 proteome (24) to detect and visualize protein changes from the early lineage B reference strain NC_045512. Measuring the identity score (bit-score) of each pHMMs across a query genome provides a measure of protein changes in 44 amino acid steps across the viral genome (Figure 4 panel a). This method applied to A.23 and A.23.1 genome sequences revealed the changes in spike (discussed above) as well as changes in the transmembrane protein nsp6 and the interferon modulators ORF8 and 9 (Figure 5 panel a).
We asked if a similar pattern of evolution was appearing in VOC as SARS-CoV-2 adapted to human infection. We gathered the sets of genomes described in the initial published descriptions of these VOC (B.1.1.7 (5), B.1.351 (28) or P.1 (7)) and applied the same profileHMM analysis. Similar to A.23/A.23.1, the B.1.1.7 lineage encodes nsp6, spike, ORF8 and 9 changes as well as changes in nsp3 and the RNA-dependent RNA polymerase (RRP, Figure 4 panel b). Lineage B.1.351 encodes nsp3, nsp6, RDRP, spike and ORF6 changes (Figure 4 panel c) and lineage P.1 encodes nsp3, nsp6, RDRP, nsp13, spike and ORF8 and 9 changes (Figure 4 panel d). Although the exact amino acid and positions of change within the proteins differ in each lineage, there are some striking similarities in the common proteins that have been altered. Of interest, the nsp6 change present in B.1.1.7, B.1.351 and P.1 is a 3 amino acid deletion (106, 107 and 108) in a protein loop of nsp6 predicted to be on exterior of the autophagy vesicles on which the protein accumulates (29).The three amino acid nsp6 changes of lineage A.23.1 are L98F in the same exterior loop region, and the M86l and M183I changes predicted to be in intramembrane regions but adjacent to where the protein exits the membrane (29), (Supplementary Figure 2). A compilation of the amino acid changes in A.23.1 and the VOC lineages is found in Supplementary Table 2 with proteins that are altered in all 4 lineages marked in red.
Discussion
We report the emergence and spread of a new SARS-CoV-2 variant of the A lineage (A.23.1) with multiple protein changes throughout the viral genome. A similar phenomenon recently occurred with the B.1.1.7 lineage, detected first in the southeast of England (5) and now globally and with the B.1.351 lineage in South Africa (6), and P.1 lineage in Brazil (30) suggesting that local evolution (perhaps to avoid the initial population immune responses) and spread may be a common feature of SARS-CoV-2. Importantly, lineage A.23.1 shares many features found in the lineage B VOC including: alteration of key spike protein regions, especially ACE2 binding region which is exposed and immunogenic, the furin cleavage site and the 613/614 change that may increase spike multimer formation. The VOC and A.23.1 strains also encode changes in similar region of the nsp6 protein which may be important for altering cellular autophagy pathways that promote replication. Changes or disruption of ORF7,8 and 9 are also present in the VOC and A.23.1. The ORF8 changes or deletion probably indicates this protein is unnecessary for human replication, similar deletions accompanied SARS-CoV-2 adaption to humans(31),(32).
We suspect that emerging SARS-CoV-2 lineages may be adjusting to infection and replication in humans and it is notable that the VOCs and lineage A.23.1 share some common features in their evolution. The spike changes are best understood due to the massive global effort to define the receptor and develop vaccines against the infection. The analysis reported in Figure 4 reveals common functions of SARS-CoV-2 that have been altered in all four variants, especially nsp6 and the ORFs 8 and 9. The functional consequences of the additional non-spike changes warrant additional studies and the current analysis may focus efforts of the proteins that are commonly changed in the variant lineages. Finally the susceptibility of A.23.1 to vaccine immune responses is of great importance to determine as vaccines become available in this part of Africa.
Methods
Sample collection, whole genome MinION sequencing and genome assembly
Residual nucleic acid extract from SARS-CoV-2 RT-PCR positive samples were obtained from Central Public Health Laboratory (Kampala, Uganda). The nucleic acid was converted to cDNA and amplified using SARS-CoV specific 1500bp-amplicon spanning the entire genome as previously described(33).The resulting DNA amplicons were used to prepare sequencing libraries, barcoded individually and then pooled to sequence on MinION R.9.4.1 flowcells, following the standard manufacturer’s protocol.
The genome assemblies were performed as previously described (8). Briefly, reads from fast5 files were base-called and demultiplexed using Guppy 3.6 running on the UMIC HPC. Adapters and primers sequences were removed using Porechop (https://github.com/rrwick/Porechop) and the resulting reads were mapped to the reference genome Wuhan-1 (GenBank NC_045512.2) using minimap2(34) and consensus genomes were generated in Geneious (Biomatters Ltd). Genome polishing was performed in Medaka, and SNPs and mismatches were checked and resolved by consulting raw reads.
Phylogenetic analyses
For the local Uganda virus comparison, all newly and previously generated genomes from Uganda (N=322) were aligned using MAFFT (26) and manually checked in AliView (35). The 5’ and 3’ untranslated regions (UTRs) were trimmed. Maximum-likelihood (ML) phylogenetic tree was constructed using RAxML-NG (36) under the GTR+I+G4 model as best-fitted substitution model according to Akaike Information Criterion (AIC) determined by ModelTest-NG (37) and run for 100 pseudo-replicates. Resulting tree was visualised in Figtree(38) and rooted at the point of splitting lineage And B.
For phylogenetic analyses of Uganda lineage A.23 and A.23.1 strains comparing to global A.23/A.23.1 strains, the global SARS-CoV-2 lineage A.23 (N=8) and A.23.1 (N=38) genomes were retrieved from GISAID on 12 March 2021. These global A.23/A.23.1 genomes combining with Ugandan A.23/A.23.1 genomes (N=191) were aligned using MAFFT and manually checked in AliView, followed by trimming 5’ and 3’ UTRs. The global and Ugandan A.23/A.23.1 genomes were used to construct a ML tree under the GTR+I+G4 model as best-fitted substitution model according to AIC determined by ModelTest-NG (37) and run for 100 pseudo-replicates using RAxML-NG. Resulting tree was visualised in Figtree and rooted using the A.23 lineage.
Profile Hidden Markov Model (profileHMM) domain analysis of A.23/A.23.1 and VOC genomes was performed as previously described (24) with some changes. A database of profileHMMs was generated from the first 65 lineage B SARS-CoV-2 genome sequences. All 3 forward open reading frames of each genome were translated computationally and then sliced into 44 amino acid segment with overlapping with 22 amino acids. All 44 amino acid query peptides were then clustered with uclust (25) and their original identity and coordinates determined by blastp search against a protein database made from the NC_045512 reference strain.
Query sets of genomes were processed to remove any genomes containing Ns (which disrupt the HMM scoring process). The hmmscan function from HMMER-3 (27) was used with the early B database. Query matches were identified using an E-value cutoff of 0.0001 and the bit-score values for each hit (a measure of the distance between the query 44 amino acid peptide and the B-lineage reference) was collected. Bit-scores for each domain were normalized by dividing each query score by the maximum score for that domain (x/x_max). In all analyses the original B lineage NC_045512 reference genome was included to define the maximum bit-score.
Ethical approvals
This study was approved by the Uganda Virus Research Institute-Research and Ethics Committee (UVRI-REC Federalwide Assurance [FWA] FWA No. 00001354, study reference. GC/127/20/04/771) and by the Uganda National Council for Science and Technology, reference number HS936ES. The novel reported SARS-CoV-2 genomes are available on GISAID (https://www.gisaid.org/) under the accession numbers EPI_ISL_954226-EPI_ISL_954300 and EPI_ISL_955136. A second tranche of genomes has been submitted and is awaiting accession numbers.
Data Availability
The novel reported SARS-CoV-2 genomes are available on GISAID (https://www.gisaid.org/) under the accession numbers EPI_ISL_954226-EPI_ISL_954300 and EPI_ISL_955136.
Author contribution statement
All authors contributed to the work presented in this paper.
Competing Interests statement
The authors declare no competing interests.
Supplementary Information
Supplementary Information
Acknowledgements
We thank all global SARS-CoV-2 sequencing groups for their open and rapid sharing of sequence data and GISAID for providing an effective platform for making these data available. We are grateful to the Oxford Nanopore Technologies and the ARTIC Network for their support and we thank Pope Moseley for his constructive comments on the manuscript. The SARS-CoV2 diagnostic and sequencing award is jointly funded by the UK Medical Research Council (MRC/UKRI) and the UK Department for International Development (DFID) under the MRC/DFID Concordat agreement (grant agreement number NC_PC_19060) and is also part of the EDCTP2 programme supported by the European Union. The UMIC high performance computer was supported by MRC (grant number MC_EX_MR/L016273/1) to PK. A.R. acknowledges the support of the Wellcome Trust (Collaborators Award 206298/Z/17/Z ARTIC network) and the European Research Council (grant agreement no. 725422 – ReservoirDOCS). The study is additionally funded by the Wellcome, DFID - Wellcome Epidemic Preparedness – Coronavirus (grant agreement number 220977/Z/20/Z) awarded to MC.
Footnotes
SARS-CoV-2 genomic surveillance in Uganda provides an opportunity to provide a focused description of the virus evolution in a small landlocked East African country. Here we show a recent shift in the local epidemic with a newly emerging lineage A.23 evolving into A.23.1 which is now dominating the Uganda cases and has spread to 26 other countries. Although the precise changes in A.23.1 as it has adapted are different from the changes in the variants of concern (VOC), the evolution shows convergence on a similar set of proteins. The A.23.1 spike protein coding region has accumulated changes that resemble many of the changes seen in VOC including a change at position 613, a change in the furin cleavage site that extends the basic amino acid motif, and multiple changes in the immunogenic N-terminal domain. In addition, the A.23.1lineage encodes changes in non-spike proteins that other VOC show (nsp6, ORF8 and ORF9). The clinical impact of the A.23.1 variant is not yet clear, however it is essential to continue careful monitoring of this variant, as well as rapid assessment of the consequences of the spike protein changes for vaccine efficacy.