PT - JOURNAL ARTICLE AU - Rebecca Rose AU - David J. Nolan AU - Samual Moot AU - Amy Feehan AU - Sissy Cross AU - Julia Garcia-Diaz AU - Susanna L. Lamers TI - Intra-host site-specific polymorphisms of SARS-CoV-2 is consistent across multiple samples and methodologies AID - 10.1101/2020.04.24.20078691 DP - 2020 Jan 01 TA - medRxiv PG - 2020.04.24.20078691 4099 - http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078691.short 4100 - http://medrxiv.org/content/early/2020/04/29/2020.04.24.20078691.full AB - Despite the potential relevance to clinical outcome, intra-host dynamics of SARS-CoV-2 are unclear. Here, we quantify and characterize intra-host variation in SARS-CoV-2 raw sequence data uploaded to SRA as of 14 April 2020, and compare results between two sequencing methods (amplicon and RNA-Seq). Raw fastq files were quality filtered and trimmed using Trimmomatic, mapped to the WuhanHu1 reference genome using Bowtie2, and variants called with bcftools mpileup. To ensure sufficient coverage, we only included samples with 10X coverage for >90% of the genome (n=406 samples), and only variants with a depth >=10. Derived (i.e. non-reference) alleles were found at 408 sites. The number of polymorphic sites (i.e. sites with multiple alleles) within samples ranged from 0-13, with 72% of samples (295/406) having at least one polymorphic site. Correlation between number of polymorphic sites and coverage was very low for both sequencing methods (R2 < 0.1, p < 0.05). Polymorphisms were observed >1 sample at 66 sites (range: 2-38 samples). The minor allele frequency (MAF) at each shared polymorphic site was 0.03% - 48.5%. 33/66 sites occurred in ORF1a1b, and 37/66 changes were non-synonymous. At 10/66 sites, derived alleles were found in samples sequenced using both methods. Polymorphic amplicon samples were found at 10/10 positions, while polymorphic RNA-Seq samples were found at 7/10 positions. In conclusion, our results suggest that intra-host variation is prevalent among clinical samples. While mutations resulting from amplification and/or sequencing errors cannot be excluded, the observation of shared polymorphic sites with high MAF across multiple samples and sequencing methods is consistent with true underlying variation. Further investigation into intra-host evolutionary dynamics, particularly with longitudinal sampling, is critical for broader understanding of disease progression.Competing Interest StatementRR, DJN, SM, SC, and SLL are employed by Bioinfoexperts, LLC.Funding StatementSLL was the recipient of a National Science Foundation SBIR award, (#1830867).Author DeclarationsAll relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.YesAll necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll raw data is available in the SRA database. Results and scripts are provided as supplemental material.