RNA viromics of Southern California wastewater and detection of SARS-CoV-2 single nucleotide variants.

Abstract: Municipal wastewater provides an integrated sample of a diversity of human-associated microbes across a sewershed, including viruses. Wastewater-based epidemiology (WBE) is a promising strategy to detect pathogens and may serve as an early-warning system for disease outbreaks. Notably, WBE has garnered substantial interest during the COVID-19 pandemic to track disease burden through analyses of SARS-CoV-2 RNA. Throughout the COVID-19 outbreak, tracking SARS-CoV-2 in wastewater has been an important tool for understanding the spread of the virus. Unlike traditional sequencing of SARS-CoV-2 isolated from clinical samples, which adds testing burden to the healthcare system, in this study, metatranscriptomics was used to sequence virus directly from wastewater. Here, we present a study in which we explored RNA viral diversity through sequencing 94 wastewater influent samples across seven treatment plants (WTPs), collected August 2020 - January 2021, representing approximately 16 million people in Southern California. Enriched viral libraries identified a wide diversity of RNA viruses that differed between WTPs and over time, with detected viruses including coronaviruses, influenza A, and noroviruses. Furthermore, single nucleotide variants (SNVs) of SARS-CoV-2 were identified in wastewater and we measured proportions of overall virus and SNVs across several months. We detected several SNVs that are markers for clinically-important SARS-CoV-2 variants, along with SNVs of unknown function, prevalence, or epidemiological consequence. Our study shows the potential of WBE to detect viruses in wastewater and to track the diversity and spread of viral variants in urban and suburban locations, which may aid public health efforts to monitor disease outbreaks. Importance: Wastewater based epidemiology (WBE) can detect pathogens across sewersheds, which represents the collective waste of human populations. As there is a wide diversity of RNA viruses in wastewater, monitoring the presence of these viruses is useful for public health, industry, and ecological studies. Specific to public health, WBE has proven valuable during the COVID-19 pandemic to track the spread of SARS-CoV-2 without adding burden to healthcare systems. In this study, we used metatranscriptomics and RT-ddPCR to assay RNA viruses across Southern California wastewater from August 2020 - January 2021, representing approximately 16 million people from Los Angeles, Orange, and San Diego counties. We found that SARS-CoV-2 quantification in wastewater correlates well with county-wide COVID-19 case data, and that we can detect SARS-CoV-2 single nucleotide variants through sequencing. Likewise, WTPs harbored different viromes, and we detected other human pathogens such as noroviruses and adenoviruses, furthering our understanding of wastewater viral ecology.


Introduction 54
Municipal wastewater represents a matrix containing a wide diversity of microbes and is 55 representative of the collective waste of a human population across a catchment area. (1). The 56 microbial content of wastewater can be useful in determining the levels of biological 57 contamination of an area, including the presence of human and animal feces, antimicrobial 58 resistance genes, pathogenic bacteria, and viruses (1)(2)(3)(4)(5)(6). In regard to viruses specifically, 59 wastewater often contains high titers of bacteriophages and plant-infecting viruses, along with 60 generally smaller proportions of viruses that infect animals -including humans (7-10). Many 61 studies have used metagenomics to characterize the viral content of wastewater, but these studies 62 typically rely on extracted DNA, which is unable to capture the wide diversity of RNA-based 63 viruses (11-13). As RNA viruses can be important pathogens of humans and agricultural 64 organisms, using metatranscriptomic sequencing to study these diverse viruses in wastewater is 65 relevant to public health and industry and may allow for a greater understanding of the 66 ecological processes that occur in wastewater (2, 8, 14-16). 67 Wastewater based epidemiology (WBE) is a useful method to detect the presence of 68 human pathogens and may serve as an early-warning system for disease outbreaks (17). For 69 example, WBE has been used to track the prevalence of enteric viruses such as norovirus, 70 rotavirus, adenovirus, poliovirus, influenza, and SARS coronaviruses (18-23), with the added 71 benefit that WBE does not rely on public health and clinical resources (17). Aside from basic 72 detection, it has been shown that the viral load of wastewater often precedes clinical outbreaks, 73 and may offer a forecast of the severity of localized disease outbreaks (18, 24-26). Infectious 74 diseases are often underreported in clinical settings -likely due to asymptomatic cases, 75 avoidance of healthcare, and incorrect diagnoses -which prevents accurate disease surveillance 76 (range 0.06% -19.1%) were human reads. We were able to pair 172 libraries ( 97.6% of the total abundance. The average relative abundances ± standard deviation of these 136 "top ten" viruses were as follows: Tomato brown rugose fruit virus (66.0 ± 7.4%), Pepper mild 137 mottle virus (10.6 ± 3.7%), Cucumber green mottle mosaic virus (10.4 ± 3.9%), Tomato mosaic 138 virus (4.8 ± 2.9%), Tobacco mild green mosaic virus (2.1 ± 1.2%), Tropical soda apple mosaic 139 virus (1.5 ± 0.8%), Tomato mottle mosaic virus (1.4 ± 1.8%), crAssphage (0.4 ± 0.8%), Opuntia 140 virus 2 (0.2 ± 0.4%), and Melon necrotic spot virus (0.2 ± 0.2%) (Fig. 2). 141 We analyzed the data from IRV enriched samples as above, and found sequences from 142 2,215 viruses, with the top ten most proportionally abundant viruses accounting for an average of 143 97.2% of total abundance. The average relative abundances ± standard deviation of these "top 144 ten" viruses were as follows: Tomato brown rugose fruit virus (60.2 ± 8.6%), Pepper mild mottle 145 virus (13.0 ± 4.4%), Cucumber green mottle mosaic virus (11.5 ± 4.3%), Tomato mosaic virus 146 (5.0 ± 3.0%), Tobacco mild green mosaic virus (2.6 ± 1.6%), Tropical soda apple mosaic virus 147 (1.8 ± 1.4%), Tomato mottle mosaic virus (1.5 ± 1.4%), SARS-CoV-2 virus (0.9 ± 2.3%), 148 crAssphage (0.4 ± 0.7%), and Opuntia virus 2 (0.3 ± 0.6%) (Fig. 2). We were able to detect 149 many more reads of SARS-CoV-2 with respiratory virus enrichment, as there were only 337 150 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) We analyzed the Shannon indexes for unenriched samples and found that overall alpha 164 diversity was significantly different between wastewater sampling sites (F(6,87) = 7.5, P < 0.001), 165 then used Tukey's HSD post-hoc pairwise comparison testing to show that only NC-HTP, PL-166 HTP, PL-JWPCP, PL-OC, and PL-SJ alpha diversities were different from each other (Padj < 167 0.05, Fig. 4). We also compared the Bray-Curtis dissimilarities of the samples with Adonis and 168 found that overall treatment plants' beta diversity was significantly different (R 2 = 0.41, P < 169 0.001, Fig. 4). Additionally, we used ANCOM to test for differential abundance of viruses at > 170 0.0001 average relative abundance between treatment plants. This resulted in 16 viruses being 171 differentially abundant between treatment plants (W > 46, Padj < 0.05 each, Fig. 4). In enriched 172 samples, we compared the proportional abundances of human respiratory viruses at > 0.0001 173 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint average relative abundance and show that SARS-CoV-2 and HCoV-OC43 were significantly 174 different across treatment plants (W > 46, Padj < 0.05 each, Fig. 4). 175 As we sampled the treatment plants multiple times over our study (up to 145 days from 176 first sample), we were able to study how the viromes changed longitudinally. We compared 177 diversity measures over time with LMEs using treatment plant as a random effect and show that 178 both alpha and beta diversity remained stable across the sampling periods in unenriched samples 179 (Shannon: t = 0.21, P = 0.84, Bray-Curtis: t = -0.31, P = 0.76). We also used LME on viruses 180 present at > 0.0001 average relative abundance with treatment plant as a random effect and 181 showed that 19 viruses' relative abundances changed over time (Padj = < 0.05, Fig. S1). We 182 specifically were interested in SARS-CoV-2 in enriched samples and used LMEs with treatment 183 plant as a random effect, and found that the relative abundance of this virus increased over 184 August 2020 -January 2021 (t = 4.0, P < 0.001, Fig. S1). samples, however due to the low breadth of coverage in many samples, many of these sites may 195 be spurious or unresolved. After applying a more stringent cutoff of 50% breadth of coverage 196 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The vast majority of viruses present in our samples were plant viruses, mainly those 225 infecting tomatoes, peppers, and cucumbers in the genus Tobamovirus. This result is consistent 226 with other studies, which suggests that these viruses are diverse and widespread in wastewater, 227 likely originating from agricultural runoff or human feces (3, 7, 50). Even though these viruses 228 were ubiquitous throughout our samples, WTPs had significantly different overall viromes, 229 indicating there may be signatures of location and wastewater catchment throughout Southern 230 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. we could compare the effects of enrichment on virus detection, and report that viral enrichment 247 greatly improved our ability to detect influenza A and coronaviruses (especially SARS-CoV-2). 248 We suggest that if researchers and the public health community are specifically interested in 249 respiratory viruses, enrichment and the proper wastewater concentration /extraction methods 250 should be employed for appropriate sensitivity and specificity to viruses of interest (23, 25, 34, 251 46, 53). Alongside sequencing, we also used ddPCR to quantify SARS-CoV-2 in wastewater and 252 show that our results generally correlated well with county-reported 7-day rolling average 253 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint COVID-19 cases which agrees with previous studies (23, 24, 54). We note that SARS-CoV-2 254 viral load and case counts were not significantly correlated at every WTP indicating that there is 255 likely unknown variability in the survival of RNA within wastewater, or that there may be 256 variable influent flow or water quality affecting the viral load at specific WTPs (55, 56). We also 257 note that our comparison is limited to county-level data, rather than being a comparison of the 258 areas contributing to the influent stream and that this could be obscuring the relationship to case 259 data at some treatment plants.  28887C>T (40, 59, 60). We also detected 271 many SNVs also found in wastewater samples from Northern California (46) and SNVs that to 272 the best of our knowledge, have yet to be sequenced (41, 46, 48). As our sequencing data are not 273 quantitative, our study suggests that sequencing wastewater is useful for SNV detection across 274 wide catchment areas, but not the true prevalence of SNVs (46). Furthermore, we recognize that 275 wastewater likely represents a collection of different viruses' RNA rather than one intact virion, 276 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. Kenliworth, NJ) to 20 mL of wastewater as a sample processing control to assess viral RNA 305 recovery. We then added MgCl2 to a final concentration of 25mM and adjusted the pH to < 3.5 306 with 20% HCl on a mixed cellulose ester membrane (Type HA: Millipore, Bedford, MA) in 307 replicates of six. We then transferred the HA filters to pre-loaded 2 mL ZR BashingBead Lysis 308 tubes (Zymo, Irvine, CA), and bead beat the samples with a BioSpec beadbeater (BioSpec 309 Products, Bartlesville, OK) for one minute. We then extracted total nucleic acids with a 310 BioMerieux NucliSENS extraction kit with magnetic bead capture (BioMerieux, Durham, NC) 311 following the manufacturer's protocol. 312

SARS-CoV-2 reverse transcriptase droplet digital PCR quantification and correlation with 314 countywide COVID-19 cases 315
We used one-step reverse transcription droplet digital PCR (RT-ddPCR) to quantify the 316 N1 region of the SARS-CoV-2 N gene with primer and probe sequences designed by the CDC 317 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint (63, 64) and we quantified the bovine coronavirus using previously designed primers (63, 65). 318 We set up RT-ddPCR reactions following the manufacturer's instructions on a Bio-Rad Qx200 319 (Bio-Rad, Hercules, CA). For all assays, a minimum of two reactions and a total of ≥20,000 320 droplets were generated per sample and at least five no template control (NTC) reactions and two 321 positive control reactions were run per 96-well plate as well as extraction-specific NTCs. Each 322 sample was required to have a minimum of three positive droplets (66) to be included in further 323 analyses. We also assessed RNA recovery using the BoCoV exogenous control, and samples 324 with < 3% recovery were excluded from further analyses. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. We used the UC Irvine High Performance Community Computing Cluster (HPC3) for all 359 data processing on the provided FASTQ files. We removed primers, adapter sequences, and low-360 quality bases with "bbduk" in the BBTools software package (69), then removed PCR duplicates 361 with BBTools "dedupe." We downloaded the NCBI Virus RefSeq Genome database and used 362 Bowtie2 (70) to build an index of the viral genomes, then mapped the cleaned FASTQ reads 363 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; against this index. We used "Samtools" (71) to convert the resulting SAM files to sorted BAM 364 files and finally used "inStrain" (72) to calculate all viral abundances and profile viral variants 365 on read pairs with >90% average nucleotide identity to their respective reference genomes. 366 Lastly, we used "mosdepth" (73) and "bedtools" (74) to calculate genome coverage and breadth 367 across all SARS-CoV-2 sequences. 368 For community diversity analyses, we tabulated viral abundances, then normalized the 369 reads into within-sample relative abundances in R (75). We used this table to generate Shannon 370 Diversity indices and Bray-Curtis dissimilarity matrices with the R package "vegan" (76). We 371 also ran Adonis PERMANOVA tests on the distance matrices, and performed non-metric 372 multidimensional scaling on the data, compared the proportional abundances of viruses with 373 ANCOM (77), and compared the relative abundances of viruses over time with "lmerTest" using 374 WTP as a random effect (78). Lastly, we plotted all figures with "ggplot2" (79), "gggenes" (80), 375 or "pheatmap" (81). 376

Data Availability 377
Raw sequencing data has been deposited on the NCBI Sequence Read Archive under 378 accession number PRJNA729801, and representative code can be found at 379 github.com/jasonarothman/wastewater_viromics_sarscov2. 380 381 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. and dates across the "x" axis signify sampling date. 407 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint Boxplots of virus relative abundances that significantly differed between treatment plants as 418 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint tested with ANCOM, with the right bottom two panels (SARS-CoV-2 and HCoV OC43) 419 representing enriched samples only. 420 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 22, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2021.  genomes. PLoS One 15:e0241535. 578 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 22, 2021. ; https://doi.org/10.1101/2021.07.19.21260815 doi: medRxiv preprint