P1 variant and amino acid mutations at Spike gene identified using Sanger protocol

SARS-coV-2 variants, along with vaccination, mark the second year of the pandemic. The spike region is a focal point in COVID-19 pathogenesis, with different amino acid changes potentially modulating vaccine response and some being part of variant signatures. NGS is the standard tool to sequence the virus but limitations of different sources hinders expansion of genomic surveillance in many places. To improve surveillance capability we developed a Sanger based sequencing protocol to obtain coverage of most (>95%) spike gene. Eleven nasopharyngeal swabs collections had RNA extracted for real time PCR diagnosis and leftover RNA had up to 3785 bp sequenced at an ABI3500 using dye termination chemistry of nested PCR products of two reactions of one Step RT-PCR. P1 amino acid mutations signatures were present in 18% (2/11), with 82% (9/11) with three or more additional amino acid changes (GISAID CoVsurver list). Most sequences (86%, 7/8) from 2021 have the E484K, whereas the mutation was not present in samples collected in 2020 (0/4, p=0.015). The swiftness that favorable mutations to the virus may prevail and their potential impact in vaccines and other current interventions need broader surveillance and more public health attention.


INTRODUCTION
As COVID-19 pandemic enters its second year, positive signs from advance vaccination coverage brings hope amid an unsettled scenario of new infection waves in many areas. New variants have been increasingly detected, and have been associated to increased Infectivity (Korber, 2020), transmissibility (Kirby, 2021) and reinfection (Naveca, 2021). Severity of disease is more difficult to access and data is not conclusive, but impact of variants in vaccine response is key to pandemic control.
Signs of concern have emerged both from in vitro neutralization studies from convalescent plasma (Liu, 2021;Wilfredo, 2021) and post vaccination plasma (de Souza, 2021) as well as evidence from vaccine trials (Shabir, 2021). In this setting molecular epidemiology assumes an even more important role not only for monitoring the evolution of the virus but also to inform on potential impact on plasma or monoclonal antibodies therapies as well as in vaccine strategies.
Next generation sequencing (NGS) is the standard technique to study SARS-  (Haolin & Liu, 2021). Another key mutation is E484K, an amino acid change that has been shown in vitro to decrease ability of plasma from vaccinated individuals to block viral entry (Collier, 2021).
The identification of mutation signatures for these new variants may serve as a proxy for the presence of a variant in the population studied, that eventually can be further evaluated by NGS. Therefore, the Spike gene itself seems a reasonable target to monitor the variants already identified and new emerging mutations that may give . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 rise to new variants. Moreover, Spike region analysis of cases of infections after vaccination will be instrumental to monitor vaccine effectiveness as the evolution of new variants of the virus.
To improve genomic monitoring capability, we developed a simple protocol using one-step PCR to amplify DNA sequenced using classical "BigDye/Sanger" platforms.

PATIENTS AND METHODS
RNA extracted from 11 nasopharyngeal swab (SWNF) samples were used to conduct the molecular assays, processed at the Virology Center or at the Regional Adolfo Lutz Center. All samples were previously tested by RT-qPCR (CDC, 2020) that confirmed for SARS-CoV-2 infection.
The process of primer designing was conducted manually, and no automated software packages used. The primers were design to conduct a nested reverse transcription-polymerase chain reaction (RT-PCR) protocol.
This protocol allows the amplification of up to 3785 nucleotides of the Spike protein, using two one-step PCR and 4 semi-nested-PCR. A fragment comprising the beginning of the S1 (complete region 253/253a.a) to partial S2 (561/609 a.a) and another for the S2 region (609a.a/1827pb) corresponding to 2586 pb of Spike protein.
For Spike S1 protein amplification the primers set used were: (i) First round (one- . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Nucleic acid extraction
SARS-CoV-2 RNA was extracted from Nasopharyngeal (NP) swab samples by (QIAmp® viral RNA mini kit (Qiagen, Hilden, Germany; Biogene, Bioclean, Brazil) according manufacture's protocol. Extraction followed the ongoing diagnosis routines at the laboratories and leftovers from this routines was kept in -70 0 C until use.

RT-PCR and nested PCR protocols
For both S1 and S2 Spike protein region, a similar one-step RT-PCR was designed. Extracted RNA was reverse-transcribed and amplified using SuperScript® III One-step RT-PCR system with Platinum Taq High Fidelity® (Life Technologies, USA).
RT-PCR conditions for amplification were as follows: reverse transcription at 55ºC for 30 min, initial PCR activation at 94ºC for 2min, 35 amplification cycles of denaturation at 94ºC for 30 s, annealing at 55ºC for 30s, extension at 68ºC for 2min 45s, and a final extension at 68ºC for 10 min. For nested PCR, the RT-PCR product (2,5 µL), 10 pM primers (1 µL each), and RNase-free water (8 µL) were added to a Go Taq® Green Master Mix 2X (12,5 µL) (Promega Biosciences, CA). PCR conditions were as follows: . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 24, 2021. ; https://doi.org/10.1101/2021.03.21.21253158 doi: medRxiv preprint initial denaturation at 94ºC for 2 min, 35 cycles of denaturation at 94ºC for 30s, annealing at 55ºC for 30 s extension at 72ºC for 2 min, and a final extension at 72ºC for 10 min.
The products of RT-PCR and nested PCR were loaded in to a 1% agarose gel and visualized under ultraviolet light.

Ethical approval
This study was carried out in accordance with the Declaration of Helsinki as revised in 2000, and approved by the Ethics Committee of the Adolfo Lutz Institute, São Paulo, Brazil. The study was registered at the institute, CTC 18M/2020 and CTC 39M/2020 and at the institutional ethical committee -CAAE: 31924420.8.0000.0059 and CAAE: 43250620.4.1001.0059.
All study participants were tested for SARS-CoV-2 at a public laboratory and has results made available to patients though GAL health system data. Those that did not provide informed consent had data anonymized prior to analysis and information used only for surveillance purposes.

Sequencing
The 3,804 kb PCR product (partial Spike protein) amplification was sequenced using four primers for each region (Table1). Each sequencing reaction was performed using 4µL of BigDye Terminator v3.1 cycle sequencing kit® (Applied Biosystems) and 3,2µL for each primer (1µM) plus water to a final volume 20µL per reaction. Dyelabelled products were sequenced using a Genetic Analyzer ABI 3500 (Applied Biosystems). Sequencing chromatograms were edited manually using Sequencher 4.7 software (Gene Codes, USA). Sequences were analyzed in comparison to reference sequences but mutations list was generated at o CoVsurver tool for mutation analysis of hCoV-19 at GISAID (https://www.gisaid.org/epiflu-applications/covsurver-mutationsapp/). Nucleotide sequences accession numbers are: EPI_ISL_1182103, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 24, 2021.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 24, 2021. ; https://doi.org/10.1101/2021.03.21.21253158 doi: medRxiv preprint r  e  v  e  r  s  e  A  C  T  A  T  G  G  C  A  A  T  C  A  A  G  C  C  A  G  C  T   2  5  2  2  5  -2  5  2  4  6  S  2  -B   Results   Table 2 depicts patient's demographic data and respective cycle-threshold (CT) of rt-qPCR. All cases in this collected SWNF samples for COVID rt-qPCR test for diagnostic purposes. At the time of the study a symptomatic clinical settings was required for testing. Median time on symptoms was 4 (1-8) days.  Table 2 shows age in years, gender (male or female), cycle threshold, CT, obtained for genes E and N, and city of sample collection, all at the State of Sao Paulo.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 Sequence results All the 11 sequences had the D614G amino acid change compared to Wuhan reference sequence, but 82% 9/11 had two or more additional changes.

Discussion
In this small study we describe preliminary data from an ongoing effort to develop simple RT-PCR protocols for obtaining Spike region sequences covering key amino acid mutations associated to the most concerning variants recently identified.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 The protocol was able to generate good quality sequences that may cover most (up to 95%) of the spike gene. This protocol does not yet cover all Spike protein, but additional segments of amplified DNA will allow full Spike coverage. The protocol however, can provide the identification of the key amino acid mutations that have been associated to increase transmissibility, covers most regions that have been associated to viral pathogenesis, as the RBD and may identify the UK, SA and BR variants.
Surveillance initiatives may use these key amino-acid mutations at positions that are present in the three major variants to allow the identification these variants based in the Spike signatures. Selected samples may be further evaluated with NGS.
We found a high proportion of E484K mutation at Spike protein, 6 out of these 11 sequences harbor the mutation, only two of that with the other additional amino acid mutations that are characteristics of P1 variant. This E484K change may affect recognition of host cells and may "weakens the potency of antibodies that can ordinarily disable the virus" (Ewen 2021). In vitro studies with a new B.1.1.7 carrying the E484K mutation increases the amount of serum antibody needed to block cell infection (Collier 2021).
The fact that the other P1 mutations are not present among with E484K in many samples may suggest that new variants may be evolving independently in the region that carries mutations useful for the viral life cycle, as due to immune scape potential and/or longer binding to cell receptors.
We opt to release this preliminary data to stimulate other groups that may benefit from the simplicity and potential informative power of a simpler approach to genomic surveillance. If in one hand the methodology of partial sequencing does not allow proper lineage evaluation, it can provide a swift access to information on the presence of these key mutations among the samples of a region or a subgroup of the population. If linked to proper contact tracing and preventive measures, it may provide a powerful tool to block variant expansion to new areas and populations. Therefore, this approach may be an alternative and sum to build the necessary increase in genomic surveillance. The NGS capability is currently limited, especially but not only in resource-limited settings. In Brazil, for example, with over 11 million document cases, fewer than 3,000 sequences registered at sequence databanks.
Shared cycling temperatures used in these protocols may optimized thermal cycler equipment use, and adjustments protocols are being tested to reduce cost.
Small adjustments in sample processing to better adapt to real world situations may improve surveillance capability. We are currently testing simpler alternative protocols . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 24, 2021. ;https://doi.org/10.1101https://doi.org/10. /2021 that may be easier to be implemented in resource-limited settings to improve surveillance capability. This and alternative approaches needed to be further tested for feasibility as the idea of simpler protocols to contribute to the monitoring of SAR-CoV-2 evolution may prove valuable.