Abstract
With the emergence of SARS-CoV-2 variants that may increase transmissibility and/or cause escape from immune responses1–3, there is an urgent need for the targeted surveillance of circulating lineages. It was found that the B.1.1.7 (also 501Y.V1) variant first detected in the UK4,5 could be serendipitously detected by the ThermoFisher TaqPath COVID-19 PCR assay because a key deletion in these viruses, spike Δ69-70, would cause a “spike gene target failure” (SGTF) result. However, a SGTF result is not definitive for B.1.1.7, and this assay cannot detect other variants of concern that lack spike Δ69-70, such as B.1.351 (also 501Y.V2) detected in South Africa6 and P.1 (also 501Y.V3) recently detected in Brazil7. We identified a deletion in the ORF1a gene (ORF1a Δ3675-3677) in all three variants, which has not yet been widely detected in other SARS-CoV-2 lineages. Using ORF1a Δ3675-3677 as the primary target and spike Δ69-70 to differentiate, we designed and validated an open source PCR assay to detect SARS-CoV-2 variants of concern8. Our assay can be rapidly deployed in laboratories around the world to enhance surveillance for the local emergence spread of B.1.1.7, B.1.351, and P.1.
Main
Broadly accessible and inexpensive surveillance methods are needed to track SARS-CoV-2 variants of concern around the world. While sequencing is the gold standard to identify circulating SARS-CoV-2 variants, routine genomic surveillance is not available in most countries primarily due to a lack of resources and expertise. In the current situation with the identification of the variants of concern B.1.1.7, B.1.351, and P.1, and with the likelihood that more will emerge, a lack of genomic surveillance leaves public health authorities with a patchy and skewed picture to inform decision making. The discovery of B.1.1.7 variants causing SGTF results when tested using the TaqPath PCR assay provided labs in the UK and throughout Europe with a ready-made, simple tool for tracking the frequencing of this variant9,10. As B.1.1.7 spread to other countries, TaqPath SGTF results were used as a front line screening tool for sequencing and an approximation for B.1.1.7 population frequency11. These findings highlight the usefulness of a PCR assay that produces distinctive results when targeting variants in virus genomes for both tracking and sequencing prioritization.
The TaqPath assay was not specifically designed for SARS-CoV-2 variant surveillance, and it has several limitations. The 6 nucleotide deletion in the spike gene at amino acid positions 69 and 70 (spike Δ69-70) that causes the TaqPath SGTF is also present in other SARS-CoV-2 lineages (Fig. 1, Supplementary Table 1), most notably Pango lineages B.1.258 detected throughout Europe and B.1.375 detected primarily in the US12,13, meaning that SGTF results are not definitive for B.1.1.7. Furthermore, too much focus on TaqPath SGTF results will leave blindspots for other emerging SARS-CoV-2 variants of concern that do not have spike Δ69-70. In particular, B.1.351 and P.1, which were recently discovered in South Africa and Brazil, respectively, may also be more transmissible and contain mutations that could help to evade immune responses6,7,14. For all of these reasons, a PCR assay specifically designed for variant surveillance would help to fill in many of the gaps about their distribution and frequency.
We analyzed over 400,000 SARS-CoV-2 genomes on GISAID and used custom Nextstrain builds15 to identify that a 9 nucleotide deletion in the ORF1a gene at amino acid positions 3675-3677 (ORF1a Δ3675-3677) occurs in the B.1.1.7, B.1.351, and P.1 variants, but is only found in 0.03% (103/377,011) of all other genomes (Fig. 1, Supplementary Table 1). Within the B.1.351 lineage, however, 18.4% of the sequences do not have ORF1a Δ3675-3677 (Supplementary Table 1, not shown in Fig. 1E). Therefore, by designing a PCR assay that targets both ORF1a Δ3675-3677 and spike Δ69-70 (Fig. 1A), we can detect most viruses from all three current variants of concern (ORF1a results, Fig. 1B-E), differentiate B.1.1.7 (ORF1a and spike results, Fig. 1D-F), and provide results similar to TaqPath SGTF to compare dataset (spike results).
To create a multiplexed RT-qPCR screening assay for the B.1.1.7, B.1.351, and P.1 variants, we designed two sets of primers that flank each of ORF1a Δ3675-3677 and spike Δ69-70 and probes specific to the undeleted “wildtype” sequences. As a control, we included the CDC N1 primer and probe set that will detect both the wildtype and variant viruses. As designed, testing SARS-CoV-2 RNA that contains ORF1a Δ3675-3677 and/or spike Δ69-70 will generate undetected cycle threshold (Ct) values with the specific PCR target sets as the probes cannot anneal to the deleted sequences, but will have “positive” N1 Ct values. This configuration ensures that target failures are likely due to the presence deletions and that there is sufficient virus RNA for sequencing confirmation. Our RT-qPCR conditions are highly similar to our previously published SARS-CoV-2 multiplex assay16, and a detailed protocol is openly available8.
We evaluated the analytical sensitivity of our multiplexed RT-qPCR assay using synthetic RNA designed based on the original Wuhan-Hu-1 sequence and a B.1.1.7 sequence (England/205041766/2020). As the B.1.1.7 sequence contains both ORF1a Δ3675-3677 and spike Δ69-70 and the Wuhan-Hu-1 sequence contains neither deletion, using these RNAs allows us to fully evaluate the designed primer and probe sets. We tested a two-fold dilution series from 100 copies/µL to 1 copy/µL for both RNA controls in triplicate (Table 1). Using the Wuhan-Hu-1 RNA, we found similar detection (within 1 Ct) across all three N1, ORF1a, and spike targets, and all three could detect virus RNA at our lowest concentration of 1 copy/µL, indicating that our primer and probes sets were efficiently designed. Using the B.1.1.7 RNA, we again could detect the RNA down to 1 copy/µL with the N1 set, but did not detect any concentration of the virus RNA with the ORF1a and spike sets, confirming the expected “target failure” signature when testing viruses containing both ORF1a Δ3675-3677 and spike Δ69-70. Overall, our PCR screening assay could easily differentiate between SARS-CoV-2 RNA with and without the ORF1a and spike deletions by comparing the Ct values to the N1 control.
Next, we validated our multiplex RT-qPCR variant screening assay using known COVID-19 clinical samples that we have previously sequenced (Table 2). We tested 19 samples from SARS-CoV-2 lineages without either ORF1a Δ3675-3677 and spike Δ69-70 (classified as “other” lineage, expected outcome = detection with all three primer/probe sets), 41 samples from lineages B.1.375, B.1.2, and B.1.1.50 (none are current variants of concern) that only have spike Δ69-70 (expected outcome = target failure with the spike set), and 16 samples from B.1.1.7 that have both ORF1a Δ3675-3677 and spike Δ69-70 (expected outcome = target failure with both the ORF1a and spike sets). We found that the expected outcomes were in 100% agreement with the sequence classification. Importantly, unlike the TaqPath assay SGTF results, we could differentiate between B.1.1.7 and other variants that only have the spike deletion, such as B.1.375 that is not currently a variant of concern. Thus, our clinical results demonstrate how our multiplex RT-qPCR assay can detect potential SARS-CoV-2 variants of concern and can be used to prioritize samples for sequencing.
There are some limitations to our study as presented here. First, we have observed autofluorescence of the N1 primer-probe set when testing negative template controls (average Ct = 39.4, with outliers of Ct 33.4). This could potentially lead to a false B.1.1.7 drop-out profile, and therefore we are continuing to optimize our RT-qPCR conditions. Importantly, this PCR assay should only be used to screen known SARS-CoV-2 positive clinical samples for the presence of key deletions found in variants of concern (where autofluorescence will not be a factor), and it should not be used as a primary clinical diagnostic. We also suggest using a N1 threshold Ct of 35 for calling target failures in the ORF1a and spike sets and performing whole genome sequencing to confirm the identity of variants.
Second, although our assay is suitable for detection of ORF1a Δ3675-3677 found in the B.1.351 and P.1 variants, we have not been able to empirically test clinical samples with these variants due to access limitations. We are actively seeking additional clinical samples and laboratory partners to expand our clinical validation.
Third, our assay will not be able to detect all B.1.351 viruses. There is a monophyletic clade within the B.1.351 lineage that has ORF1a Δ3675-3677 filled back in, perhaps due to recombination with viruses that did not have the deletion. How often this is expected to occur is not known, but it demonstrates that continuous monitoring for the presence of ORF1a Δ3675-3677 and spike Δ69-70 within the variants of concern will be necessary to ensure that our assay will still be effective.
The rapid emergence of the SARS-CoV-2 variants of concern necessitates an immediate roll out of surveillance tools. Although whole genome sequencing is required to definitively identify specific variants, resource and capacity constraints can limit the number of samples that can be sequenced. The ThermoFisher TaqPath assay has demonstrated the value of PCR for variant surveillance, but it is limited to B.1.1.7 and cannot differentiate between other viruses containing spike Δ69-70. By targeting two different large nucleotide deletions, ORF1a Δ3675-3677 and spike Δ69-70, our multiplex PCR can rapidly screen for B.1.1.7, B.1.351, and P.1 variants and differentiate between non-variants of concern. Thus, our multiplex RT-qPCR variant screening assay can be used to prioritize samples for sequencing and as a surveillance tool to help monitor the distribution and population frequency of suspected variants.
Methods
Ethics
The Institutional Review Board from the Yale University Human Research Protection Program determined that the RT-qPCR testing and sequencing of de-identified remnant COVID-19 clinical samples conducted in this study is not research involving human subjects (IRB Protocol ID: 2000028599). The “Yale ID” numbers displayed in Table 2 are not known outside the research group and cannot be used to re-identify any subject.
Analysis of public SARS-CoV-2 genomes
All available SARS-CoV-2 data (402,899 genomes) were downloaded on 2021-01-22 from GISAID and evaluated for the presence of ORF1a Δ3675-3677 and spike Δ69-70. Phylogenetic analysis of a subset of 4,046 SARS-CoV-2 genomes was performed using Nextstrain15, downsampled as shown using the “global build” on 2021-01-22 (https://nextstrain.org/ncov/global). A list of SARS-CoV-2 genomes used in the analysis is available in Source Data Fig. 1.
Multiplex RT-qPCR with probes
A detailed protocol of our multiplexed RT-qPCR to screen for SARS-COV-2 B.1.1.7, B.1.351, and P.1 variants of concern can be found on protocols.io8. In brief, our multiplex RT-qPCR assay consists of the CDC N117, and the newly designed Yale ORF1a Δ3675-3677 and Yale spike Δ69-70 primer-probe sets (Supplementary Table 2). We used the NEB Luna universal probe one step RT-qPCR kit with 400 nM of primers, 200 nM of probes, and 5 µL of nucleic acid in a total reaction volume of 20 µL. Thermocycler conditions were reverse transcription for 10 minutes at 55°C, initial denaturation for 1 minute at 95°C, followed by 40 cycles of 10 seconds at 95°C and 30 seconds at 55°C. During validation we ran the PCR for 45 cycles. Differentiation between variants of concern is based on drop-out of the Yale ORF1a and/or Yale spike primer-probe sets (Supplementary Table 3).
Limit of detection
We used Twist synthetic SARS-CoV-2 RNA controls 2 (Genbank ID: MN908947.3; GISAID ID: Wuhan-Hu-1) and control 14 (Genbank ID: EPI_ISL_710528; GISAID ID: England/205041766/2020) to determine the limit of detection of the screening RT-qPCR assay. We tested a two-fold dilution series from 100 copies/µL to 1 copy/µL for both RNA controls in triplicate, and confirmed the lowest concentration that was detected in all three replicates by 20 additional replicates.
Validation and sequence confirmation
We validated our approach using known SARS-CoV-2 positive clinical samples. Briefly, we extracted nucleic acid from 300 µL viral transport medium from nasopharyngeal swabs and eluted in 75 µL using the MagMAX viral/pathogen nucleic acid isolation kit (ThermoFisher Scientific). Extracted nucleic acid was tested by our multiplexed RT-qPCR assay and then sequenced using a slightly modified ARTIC Network nCoV-2019 sequencing protocol for the Oxford Nanopore MinION19,20. These modifications include extending incubation periods of ligation reactions and including a bead-based clean-up step following dA-tailing. MinION sequencing runs were monitored using RAMPART21. Consensus sequences were generated using the ARTIC Network bioinformatics pipeline and lineages were assigned using Pangolin v.2.018,22. GISAID accession numbers for all SARS-CoV-2 genomes used to validate our approach are listed in Table 2.
Data Availability
Genomic data are available on GISAID (see Table 2 for accession numbers). All RT-qPCR data are included in this article, supplementary files, and source data.
Data availability
Genomic data are available on GISAID (see Table 2 for accession numbers). All RT-qPCR data are included in this article, supplementary files, and source data.
Author information
Contributions
CBFV, RAN, JRF, and NDG designed the study; CEM, GK, JD, MM, JW, CL, PH, SM, CN, EL, MLL, AM, RD, and JR collected and provided clinical samples; CBFV, MB, TA, MEP, AEW, EBH, RAN, JRF, and NDG collected and analyzed data; JRF and NDG supervised the project; CBFV and NDG wrote and edited the manuscript; all authors read and approved the final manuscript.
Ethics declarations
Competing interests
The authors declare no competing interests.
Supplementary information
Acknowledgements
We thank A. Brito, A. Altajar, and D. Comstock for data or clinical support. A list of acknowledgements for the SARS-CoV-2 data used in Fig. 1 can be found in the Source Data Fig. 1. This work was funded by CTSA Grant Number TL1 TR001864 (TA and MEP), Fast Grant from Emergent Ventures at the Mercatus Center at George Mason University (NDG), and CDC Contract # 75D30120C09570 (NDG).
Footnotes
↵# Senior authors