PT - JOURNAL ARTICLE AU - Arana, Carlos AU - Liang, Chaoying AU - Brock, Matthew AU - Zhang, Bo AU - Zhou, Jinchun AU - Chen, Li AU - Cantarel, Brandi AU - SoRelle, Jeffrey AU - Hooper, Lora V. AU - Raj, Prithvi TI - A Short Plus Long-Amplicon Based Sequencing Approach Improves Genomic Coverage and Variant Detection In the SARS-CoV-2 Genome AID - 10.1101/2021.06.16.21259029 DP - 2021 Jan 01 TA - medRxiv PG - 2021.06.16.21259029 4099 - http://medrxiv.org/content/early/2021/06/20/2021.06.16.21259029.short 4100 - http://medrxiv.org/content/early/2021/06/20/2021.06.16.21259029.full AB - High viral transmission in the COVID-19 pandemic has enabled SARS-CoV-2 to acquire new mutations that impact genome sequencing methods. The ARTIC.v3 primer pool that amplifies short amplicons in a multiplex-PCR reaction is one of the most widely used methods for sequencing the SARS-CoV-2 genome. We observed that some genomic intervals are poorly captured with ARTIC primers. To improve the genomic coverage and variant detection across these intervals, we designed long amplicon primers and evaluated the performance of a short (ARTIC) plus long amplicon (MRL) sequencing approach. Sequencing assays were optimized on VR-1986D-ATCC RNA followed by sequencing of nasopharyngeal swab specimens from five COVID-19 positive patients. ARTIC data covered >90% of the virus genome fraction in the positive control and four of the five patient samples. Variant analysis in the ARTIC data detected 67 mutations, including 66 single nucleotide variants (SNVs) and one deletion in ORF10. Of 66 SNVs, five were present in the spike gene, including nt22093 (M177I), nt23042 (S494P), nt23403 (D614G), nt23604 (P681H), and nt23709 (T716I). The D614G mutation is a common variant that has been shown to alter the fitness of SARS-CoV-2. Two spike protein mutations, P681H and T716I, which are represented in the B.1.1.7 lineage of SARS-CoV-2, were also detected in one patient. Long-amplicon data detected 58 variants, of which 70% were concordant with ARTIC data. Combined analysis of ARTIC +MRL data revealed 22 mutations that were either ambiguous (17) or not called at all (5) in ARTIC data due to poor sequencing coverage. For example, a common mutation in the ORF3a gene at nt25907 (G172V) was missed by the ARTIC assay. Hybrid data analysis improved sequencing coverage overall and identified 59 high confidence mutations for phylogenetic analysis. Thus, we show that while the short amplicon (ARTIC) assay provides good genomic coverage with high throughput, complementation of poorly captured intervals with long amplicon data can significantly improve SARS-CoV-2 genomic coverage and variant detection.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNo specific funding for the present study.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Review waived by UT Southwestern Institutional Review Board as analyzed specimens were de-identified and were residual material.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesSequencing data (FASTQ files) from the present study have been deposited in the NCBI SRA database with accession ID PRJNA729878 for public access.