Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants ============================================================================ * Alexander Crits-Christoph * Rose S. Kantor * Matthew R. Olm * Oscar N. Whitney * Basem Al-Shayeb * Yue C. Lou * Avi Flamholz * Lauren C. Kennedy * Hannah Greenwald * Adrian Hinkle * Jonathan Hetzel * Sara Spitzer * Jeffery Koble * Asako Tan * Fred Hyde * Gary Schroth * Scott Kuersten * Jillian F. Banfield * Kara L. Nelson ## Abstract Viral genome sequencing has guided our understanding of the spread and extent of genetic diversity of SARS-CoV-2 during the COVID-19 pandemic. SARS-CoV-2 viral genomes are usually sequenced from nasopharyngeal swabs of individual patients to track viral spread. Recently, RT-qPCR of municipal wastewater has been used to quantify the abundance of SARS-CoV-2 in several regions globally. However, metatranscriptomic sequencing of wastewater can be used to profile the viral genetic diversity across infected communities. Here, we sequenced RNA directly from sewage collected by municipal utility districts in the San Francisco Bay Area to generate complete and near-complete SARS-CoV-2 genomes. The major consensus SARS-CoV-2 genotypes detected in the sewage were identical to clinical genomes from the region. Using a pipeline for single nucleotide variant (SNV) calling in a metagenomic context, we characterized minor SARS-CoV-2 alleles in the wastewater and detected viral genotypes which were also found within clinical genomes throughout California. Observed wastewater variants were more similar to local California patient-derived genotypes than they were to those from other regions within the US or globally. Additional variants detected in wastewater have only been identified in genomes from patients sampled outside of CA, indicating that wastewater sequencing can provide evidence for recent introductions of viral lineages before they are detected by local clinical sequencing. These results demonstrate that epidemiological surveillance through wastewater sequencing can aid in tracking exact viral strains in an epidemic context. ## Introduction The COVID-19 pandemic caused by SARS-CoV-2 reached the United States at the start of 2020, with multiple early introduction events in the states of Washington, California, and New York [1]. Since then, the total number of cases in the country has surpassed 6 million, with over 180,000 deaths and enormous implications for public health [2]. While clinical viral cases have been tracked mostly with quantitative reverse transcriptase PCR (RT-qPCR), there has also been extensive whole viral genome sequencing of clinical cases, generating over 75,000 genomes globally, including 17,000 from the US, and 2,500 from California (GISAID EpiCov database as of August 23, 2020)[3]. Genomic epidemiology, the analysis of viral genomes in order to make inferences about viral evolution, transmission, and spread, has played an important role in improving our understanding of the transmission dynamics of the SARS-CoV-2 pandemic [4]. Early in the pandemic, this approach revealed multiple introduction events into California and viral lineages present at different abundances across counties in Northern California [5]. Genome sequencing was also used to show that there was unexpectedly frequent community spread of a specific genotype after early introduction in Washington State [6]. Genome sequencing in the New York City area identified multiple viral introduction events from Europe [7] and sequencing in the Mission district of San Francisco identified distinct viral strains in a single neighborhood, with transmission between family clusters [8]. Unlike many respiratory viruses, RNA of SARS-CoV-2 and other coronaviruses can be detected in human feces [9–11]. Before the COVID-19 pandemic, members of the coronaviridae had been previously identified in municipal wastewater through both RT-qPCR and shotgun metagenomic and metatranscriptomic sequencing [12,13]. Since the start of the COVID-19 pandemic, wastewater RT-qPCR has quantified the amount of SARS-CoV-2 RNA in sewage to estimate the abundance of the virus across many different municipal regions globally [14–22]. Prior work showed that shotgun wastewater sequencing can provide information about many viruses simultaneously [12,23,24] and enable genome-resolved [25] and phylogenetic analyses [26,27]. In one study, a SARS-CoV-2 consensus genome was obtained from sewage via targeted amplification and long-read sequencing, allowing for phylogenetic analysis of the predominant lineage [27]. Here, we show that sequencing of viral concentrates and RNA extracted directly from wastewater can identify multiple SARS-CoV-2 genotypes at varying abundances known to be present in communities, as well as additional genotypic variants not yet observed in local clinical sequencing efforts. ## Results and Discussion ### Metatranscriptomic detection of SARS-CoV-2 and other viruses in wastewater Twenty-four-hour 1L composite samples of raw sewage were collected from wastewater treatment facilities in Alameda and Marin Counties in Northern California between May 19, 2020 and July 15, 2020 (**Supplementary Table S1**). We extracted nucleic acids from samples using three methods that enriched for viral particles (ultrafiltration) or total RNA (RNA silica columns or silica milk). SARS-CoV-2 viral RNA was first detected using a RT-qPCR assay (see *Methods*) of the N gene and Cq-values ranged from 29.5 to 36.2, or an estimated ∼2 to ∼553 genome copies/μL of RNA. From this we estimate that there were 2.8×105 genome copies/L of wastewater on average across our samples (**Supplementary Table S1**). For each sample, 40–50 μL of RNA was prepared for sequencing, implying an estimated ∼4438 viral genome copies on average were contained within each sequencing library. After cDNA synthesis from the total RNA, samples were enriched for a panel of human respiratory viruses using a commercially available oligo-capture approach (Illumina Respiratory Virus Panel; see *Methods*)and sequenced on a NextSeq 550 to produce on average 12 million 2×75 bp reads per sample. Reads were mapped to the human genome to estimate the amount of human RNA/DNA in the samples (0.7–16% of reads per sample). Sequencing reads were then mapped to a de-replicated set of all eukaryotic viruses contained in the RefSeq database, and stringently filtered to include only high-quality reads matching reference sequences with >97% identity (*Methods*). Viral abundances and SNVs (Single Nucleotide Variants) were then calculated using the metagenomic strain-typing program inStrain v1.12. We detected SARS-CoV-2 at varying abundances (0%-14%) across samples (** Fig 1a and 1b**; **Supplementary Table S1**). Sequencing relative abundance of SARS-CoV-2 was not strongly correlated with RT-qPCR genome copy quantification, likely due to the variability introduced by different extraction methods. Viral enrichment by ultrafiltration achieved higher relative abundances of SARS-CoV-2 RNA, although these experiments were time-intensive and often had lower absolute genome copy number recovery according to RT-qPCR. Additionally, we sequenced replicates from one set of samples with rRNA depletion but no viral enrichment. Without enrichment, we were able to only detect fewer than 40 total SARS-CoV-2 read pairs (**Supplementary Table S1**; **Fig 1c**). While this illustrates the difficulty of detecting specific viruses in wastewater in unenriched sequencing datasets, larger sequencing efforts may overcome this limitation by sequencing more deeply. ![Figure 1:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/14/2020.09.13.20193805/F1.medium.gif) [Figure 1:](http://medrxiv.org/content/early/2020/09/14/2020.09.13.20193805/F1) Figure 1: Characterized viruses detected in enriched and unenriched wastewater metatranscriptomes. Relative abundances of viruses with eukaryotic hosts in the RefSeq database as a percentage of total sequencing reads derived from the sample in **(a)** amicon ultrafiltration (viral fractionation) and **(b)** total RNA column and milk of silica samples. All samples were enriched with the Illumina Respiratory Virus Panel. **(c)** Relative abundances of RefSeq viruses in unenriched metatranscriptomics (left) and the same samples after oligo-enrichment with the Illumina Respiratory Virus Panel. **(d)** The relationship between the quantity of viral genome copies in 40 μL of purified RNA and SARS-CoV-2 genome completeness (measured in breadth of coverage) for each sample. Samples are colored by extraction methodology, and the size of the point corresponds to the mean SARS-CoV-2 depth of coverage. Other human viruses identified in the wastewater sequencing included Human bocaviruses 2c and 3 (** Fig 1a, 1b**), both of which are respiratory viruses sometimes capable of causing gastroenteritis, and are included in the Illumina Respiratory Virus Panel. Bocaviruses have been identified in sewage samples previously [28,29]. Picorna-like viruses were also detected (**Fig 1**). The most abundant viruses in the data were plant viruses including cucumber green mottle mosaic virus and pepper mild mottle virus (PMMoV) (** Fig 1a **). These viruses are known to be highly abundant in human wastewater [30] and have been used as fecal loading controls in wastewater SARS-CoV-2 quantification [19]. Near-complete (>95% breadth of coverage) genomes were obtained for SARS-CoV-2, bocavirus 3, PMMoV, and other plant viruses (**Supplementary Table S2**), implying that these viruses were at high enough abundance in the dataset for exact genomic analysis. ### Recovery of complete and near-complete SARS-CoV-2 viral genomes from wastewater Complete consensus viral genomes are required to perform viral lineage tracking for genomic epidemiology. We obtained complete consensus SARS-CoV-2 genomes (breadth of coverage > 99%) from 7 out of 22 samples (31%), while large-scale patient sequencing efforts have for example obtained genomes for ∼80% of samples [31]. Only samples with RT-qPCR Ct-values < 33 (∼25 gc/uL) yielded complete consensus genomes (** Fig 1d **), but we also recovered at least one genome using each of our three extraction methods. Mean depth of coverage for each complete genome ranged from 7x to 107x after filtering and removal of PCR duplicates. The consensus genomes from Alameda County, and the one from Marin County, were all within 4 base pair differences of each other. These consensus genomes were found to be unlikely to be chimeric, as a BLAST analysis identified SARS-CoV-2 genomes that were 100% identical at all non-gapped positions (**Supplementary Table S3**) obtained from patients in Northern California. Consensus genomes may represent predominant SARS-CoV-2 lineages in the population in the serviced areas during the summer of 2020. The results demonstrate genomic accuracy for recovery of consensus SARS-CoV-2 genomes so long as sufficient coverage is achieved in metatranscriptomic datasets. ### Identification of alternative SARS-CoV-2 variants in wastewater populations recovers locally reported clinical genotypes While consensus genotypes can describe the predominant genotype of a virus in a metatranscriptome, the strength of wastewater-based sampling and sequencing lies in the ability to identify alternative genotypes in the population being sampled. Using a recently developed pipeline for metagenomic SNV calling [32], we identified putative SNVs that are variable within the viral population sampled in each wastewater sample (** Fig 2a; Supplementary Table S4**). Due to the large scale sequencing efforts of SARS-CoV-2 in patients in both Northern California and worldwide, we established that these SNVs had also been detected in genomes from individual patients. Across all samples, 50% of SNVs observed in wastewater samples at greater than 10% frequency were also observed in patient-derived viral genomes from California; 61% were observed in viral genomes from the USA, and 71% were observed in any viral genomes collected worldwide. SNVs that have been observed in California patients had significantly higher allele frequencies in the wastewater samples than those that were not detected in clinical cases (mean 48% versus 15%, respectively; p< 0.01; two-sided t-test) (** Fig 2b **). This is likely because the more abundant a SNV is in the population, the more likely it is to be sampled in wastewater and in the clinic. Further, several of the same SNVs were observed across samples, and these recurrent SNVs were on average 2.3x more likely to be observed in California or USA patient-derived genomes than SNVs observed once (** Fig 2c **). Taken together, these are strong signals that deeper sequencing of wastewater and combining information across samples better recapitulates true viral genomic variation in the sampled population. ![Figure 2:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/14/2020.09.13.20193805/F2.medium.gif) [Figure 2:](http://medrxiv.org/content/early/2020/09/14/2020.09.13.20193805/F2) Figure 2: SARS-CoV-2 SNVs in wastewater samples. **(a)** Allele frequencies of SARS-CoV-2 in wastewater metatranscriptomes for each sample. Each point is a SNV by location on the SARS-CoV-2 genome (x-axis), and the height of the bar (y-axis) is the frequency of the alternative allele (relative to the reference genome EPI_ISL_402124) at that position. Wastewater SNVs are colored by whether they have previously been observed in clinical samples from CA, the USA, or neither. **(b)** Wastewater SARS-CoV-2 frequencies grouped by whether they have been observed in clinical samples from different regions. Most highly abundant SNVs have been observed previously in California or elsewhere in the US. **(c)** SARS-CoV-2 SNVs grouped by the number of wastewater samples observed in (out of 7 high quality samples). Most SNVs that were observed in 2 or more samples have been observed clinically in CA. **(d)** Multiple hypothesis adjusted (bonferroni correction) *p*–value distribution of hypergeometric tests for overlap between all wastewater SNVs observed and the variants clinically observed and reported in each location (a county level designation in the United States). Alameda County was the most significant comparison. Over 75,000 patient-derived SARS-CoV-2 genomes have been sequenced and deposited into the GISAID database globally, including 2,500 genomes obtained from patients in California. To understand the context of the viral genomic variation we observed within wastewater samples, we used a hypergeometric test to calculate the likelihood of overlap by chance between the set of wastewater variants and the set of variants observed in viruses from patients in a given region. This computes the probability of observing a certain amount of overlap in variants by chance, and accounts for the fact that some regions have far more sequenced patient genomes and correspondingly more alleles than others. For example, the probability of the observed overlap between wastewater variants and California clinical variants having occurred by chance was calculated to be P< 10−10, indicating a high likelihood of non-random overlap. By further comparing the probabilities of SNV overlap between patient genotypes and wastewater genotypes at the nextStrain “location” level (corresponding to counties and/or cities), we found the highest likelihood of non-random overlap between all wastewater genotypes observed and clinical genotypes from Alameda County (** Fig 2d **) – the location that the wastewater samples were also derived from. ### Identification of potential lineage transmission events previously undetected in local patient-based sequencing at time of sampling Some clinical SARS-CoV-2 viral strains can be differentiated by more than one SNV. Across the wastewater dataset, we observed one pair and one triplet of SNVs that were shared by clinical isolates. The pair and triplet of SNVs each occurred at similar frequencies, supporting their linkage in wastewater genomes (** Fig 3a and 3b**). In addition to the SNVs that also have been observed clinically in California, there were four SNVs recurrent across wastewater samples that had not been previously observed in CA, but had been observed elsewhere in the United States (**Fig 3c**). Two adjacent SNVs (14222G and 14223C) are associated with a single viral strain that has been often observed in clinical samples in Washington State. Another two SNVs (8083A and 1738T) are not linked, but both have been observed in different clinical genomes of four other states in the US. Interestingly, these variants appear to have arisen or arrived in the US only during the month of July, suggesting that they may be detected in clinical samples from California in the near future. ![Figure 3:](http://medrxiv.org/https://www.medrxiv.org/content/medrxiv/early/2020/09/14/2020.09.13.20193805/F3.medium.gif) [Figure 3:](http://medrxiv.org/content/early/2020/09/14/2020.09.13.20193805/F3) Figure 3: Time series of SARS-CoV-2 genotypes in California wastewater compared to patients. **(a)** Left two plots: Frequencies of two SNVs found in the same viral lineage across California clinical samples (black lines) and within each wastewater sample (orange points). **(b)** Frequencies of three SNVs found in the same viral lineage across California clinical samples (black lines) and within each wastewater sample (green points). **(c)** Time series of detection for recurrent wastewater genotypes in clinical samples versus wastewater samples. Each row on the y-axis is a SNV, and the presence of a point along the x-axis indicates when that SNV was detected in either a clinical sample or a wastewater sample. Overall, this study demonstrated that wastewater sequencing can accurately identify genotypes of viral strains that are clinically detected in a region, and those not yet detected by clinical sequencing. Another key advantage of this method is that it does not rely on specific PCR primers, which can fail to detect SARS-CoV-2 strains with mutations in the primed sequence [33]. With more intensive wastewater sampling, this approach also has the potential to reveal patterns of virus distribution within communities, helping understand transmission and spread of diseases during epidemics. Perhaps most significantly, the results indicate that wastewater sequencing can detect recent introductions of SARS-CoV-2 genotypes and other disease-causing viruses at a population scale. ## Methods ### Sample collection and extraction Twenty-four-hour 1L composite samples were collected at 4 different wastewater interceptors in the San Francisco Bay Area (labeled ‘Berkeley’, ‘Berkeley Hills’, ‘Oakland’, and ‘Marin’, based roughly on the municipal areas each services). Samples were immediately processed by extraction via three different methods. The first method was ultrafiltration with Amicon Ultra-15 100 kDa Centrifugal Filter Units. Wastewater was heat inactivated in a water bath at 60 degrees C for 90 minutes. Wastewater samples were then filtered on 0.22 µM SteriFlip filter units. Amicon filter units were prepared by incubation with 1% Bovine Serum Albumin in 1x phosphate buffered saline (PBS) on ice for 1 hour, after which they were spun, loaded with 2 mL PBS, and spun again to rinse. Amicon 100 kDa centrifugal filter units were then loaded with 15 mL of filtered wastewater and spun in a swinging bucket rotor at 4750 g for 30 min at 4C. Flow-through was discarded and amicons were reloaded with sample until all sample volume (40 mL) had been processed. For three samples (**Supplementary Table S1**), we processed more than 40 mL per sample, but found that this did not improve resulting SARS-CoV-2 genome quality in this specific instance. For all amicon-concentrated samples, the final volume of the concentrate was ∼250 μL. RNA was then extracted with a Qiagen AllPrep DNA/RNA Mini Kit. The second extraction method, direct RNA extraction with silica columns, began with viral and bacterial lysis of samples with 9.5 g of NaCl per 40 mL of wastewater and filtration on a 5 µM Polyvinylidene Fluoride (PVDF) filter. Resulting lysate was then loaded onto a Zymo III-P silica spin column via vacuum manifold, and RNA was directly eluted from this column. Details of this protocol are available at [https://www.protocols.io/view/v-2-direct-wastewater-rna-capture-and-purification-bjr9km96](https://www.protocols.io/view/v-2-direct-wastewater-rna-capture-and-purification-bjr9km96). The third extraction method, “milk of silica”, began with sample lysis and filtration, as in the second method. Filtered lysate is bound to free silicon dioxide particulate, eluted from the particulate, and concentrated via isopropanol precipitation. This protocol is available at: [https://www.protocols.io/view/direct-wastewater-rna-extraction-via-the-34-milk-o-biwfkfbn](https://www.protocols.io/view/direct-wastewater-rna-extraction-via-the-34-milk-o-biwfkfbn). ### RT-qPCR and genome copy quantification The number of viral genome copies in each sample was determined via qRT-PCR on an Applied Biosystems QuantStudio 3 Real-Time PCR System with the Thermo Fisher TaqPath 1-Step RT-qPCR Master Mix or TaqMan™ Fast Virus 1-Step Master Mix. The primer set was purchased as part of the 2019-CoV RUO Kit (IDT) and our quantification used the previously published CDC N1 assay [34]. Either 2 μL or 5 μL of sample were used for each reaction (**Supplementary Table S1**) in a 10 μL or 20 μL reaction, respectively. Cycling conditions were 25 °C for 2 minutes, 50 °C for 15 minutes, 95 °C for 2 minutes, and 45 cycles of 95 °C for 3 seconds, 55 °C for 30 seconds. A standard curve for absolute quantification of viral genome copies was generated with synthetic RNA standards of the SARS-CoV-2 genome (Twist Biosciences). ### Library preparation and sequencing Sequencing for a first set of samples was performed at the Microbial Genome Sequencing Center (Pittsburgh, PA) in three independent sequencing runs. The Maxima ds cDNA RT kit (Thermo Fisher) was used to generate cDNA. The Illumina Flex for Enrichment kit paired with the Illumina Respiratory Virus Oligo Panel (Illumina, Inc.) were used to enrich for respiratory virus cDNA with 15 PCR cycles in the final step. The libraries were then sequenced on a NextSeq 550 to yield on average 119 Mbp of 2×75 bp paired end sequencing reads. For a second set of samples (**Supplementary Table S1**), rRNA depletion was performed and oligo-capture enriched and unenriched sequencing strategies were compared. The rRNA depletion was done using RiboZero Plus supplemented with a comprehensive ‘Gut Microbiome’ probe set. Libraries were prepared using the Illumina RNA Prep with Enrichment (L) Tagmentation protocol. The rRNA depleted samples were amplified for 20 cycles. Enrichment was performed using the Illumina Respiratory Virus Oligo Panel. ### Metatranscriptomic viral abundances The abundances of viruses within wastewater were obtained by mapping reads with Bowtie 2 [35] to an index of all viral genomes downloaded from the RefSeq Database (Release 201). For abundance calculations, mapped read pairs with MAPQ>20 and pair percent identity to the reference >95% were retained using inStrain v1.3.2 [32]. Duplicate reads were removed with the clumpify.sh dedup command from the BBTools software suite (Bushnell 2014). Only viral genomes with at least 10% breadth of genomic coverage obtained were reported. ### SARS-CoV-2 variant analysis Seven samples with near-complete SARS-CoV-2 breadth of genomic coverage (>99%) were further investigated for a strain-resolved analysis. SNV calling was performed using inStrain v1.3.2 on all read pairs with >90% Average Nucleotide Identity to the SARS-CoV-2 reference. An absolute minimum of 2 read pairs supporting a variant allele was required for any SNV to be considered in further analysis. PCR duplicates were removed with the markdup command in the Sambamba package [36]. All analysis and SNV locations reported are with respect to the reference genome “hCoV-19/Wuhan/WIV04/2019|EPI\_ISL_402124 |2019–12–30|China”. Consensus genomes from each sample were created using a custom Python script that required a minimum of 3 reads supporting each genomic position. A multiple sequence alignment of publicly available SARS-CoV-2 genomes and their metadata were downloaded from the GISAID [3] EpiCov database on August 23, 2020. The multiple sequence alignment was processed with a custom Python script to obtain a list of variants for each genome with respect to the WIV04 reference sequence. We removed from all analyses the genomic positions recommended to be masked from SARS-CoV-2 alignments by [https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480](https://virological.org/t/masking-strategies-for-sars-cov-2-alignments/480). Hypergeometric distributions were calculated with the stats.hypergeom function in scipy [37] to compare wastewater samples to all clinical data from each NextStrain “location” with at least 20 genomes deposited. The following parameters were used for hypergeometric distribution testing: the total number of SNVs observed across all clinical SARS-CoV-2 genomes, the number of SNVs observed in wastewater, the number of clinical SNVs in a region, and the observed overlap between the two. Reproducible code is available at [https://github.com/alexcritschristoph/wastewater_sarscov2](https://github.com/alexcritschristoph/wastewater_sarscov2). ## Data Availability Sequencing data for this project has been released under NCBI BioProject ID PRJNA661613. Processed data, reproducible code, and workflows for the analyses performed have been made available at [https://github.com/alexcritschristoph/wastewater\_sarscov2](https://github.com/alexcritschristoph/wastewater_sarscov2). [https://www.ncbi.nlm.nih.gov/bioproject/PRJNA661613](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA661613) ## Code and Data Availability Sequencing data for this project has been released under NCBI BioProject ID PRJNA661613. Processed data, reproducible code, and workflows for the analyses performed have been made available at [https://github.com/alexcritschristoph/wastewater\_sarscov2](https://github.com/alexcritschristoph/wastewater_sarscov2). ## Acknowledgements We gratefully acknowledge the originating and submitting laboratories of SARS-CoV-2 genomes in the GISAID EpiCoV database ([https://www.gisaid.org](https://www.gisaid.org)) that were used for our comparisons to clinical samples and in particular the Innovative Genomics Institute SARS-CoV-2 Sequencing Group for Alameda County genomes. We also gratefully acknowledge Vinson Fan for assistance with RT-qPCR and the laboratory of Robert Tjian for sharing materials. Funding was provided to KLN and JFB by a Rapid Research Response grant from the Innovative Genomics Institute (IGI) and a seed grant from the Center for Information Technology Research in the Interest of Society (CITRIS) at UC Berkeley. * Received September 13, 2020. * Revision received September 13, 2020. * Accepted September 14, 2020. * © 2020, Posted by Cold Spring Harbor Laboratory This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/) ## References 1. 1.Team CC-19 R, CDC COVID-19 Response Team, Jorden MA, Rudman SL, Villarino E, Hoferka S, et al. Evidence for Limited Early Spread of COVID-19 Within the United States, January–February 2020. MMWR. Morbidity and Mortality Weekly Report. 2020. pp. 680–684. doi: 10.15585/mmwr.mm6922e1 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.15585/mmwr.mm6922e1&link_type=DOI) 2. 2.Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis. 2020;20: 533–534. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/S1473-3099(20)30120-1&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) 3. 3.Elbe S, Buckland-Merrett G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges. 2017. pp. 33–46. doi: 10.1002/gch2.1018 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/gch2.1018&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=31565258&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) 4. 4.Fauver JR, Petrone ME, Hodcroft EB, Shioda K, Ehrlich HY, Watts AG, et al. Coast-to-coast spread of SARS-CoV-2 in the United States revealed by genomic epidemiology. medRxiv. doi: 10.1101/2020.03.25.20043828 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wMy4yNS4yMDA0MzgyOHYxIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 5. 5.Deng X, Gu W, Federman S, du Plessis L, Pybus OG, Faria NR, et al. Genomic surveillance reveals multiple introductions of SARS-CoV-2 into Northern California. Science. 2020;369: 582–587. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjkvNjUwMy81ODIiO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8xNC8yMDIwLjA5LjEzLjIwMTkzODA1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 6. 6.Bedford T, Greninger AL, Roychoudhury P, Starita LM, Famulare M, Huang M-L, et al. Cryptic transmission of SARS-CoV-2 in Washington State. Science 2020. 7. 7.Gonzalez-Reiche AS, Hernandez MM, Sullivan MJ, Ciferri B, Alshammary H, Obla A, et al. Introductions and early spread of SARS-CoV-2 in the New York City area. Science. 2020;369: 297–301. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6Mzoic2NpIjtzOjU6InJlc2lkIjtzOjEyOiIzNjkvNjUwMS8yOTciO3M6NDoiYXRvbSI7czo1MDoiL21lZHJ4aXYvZWFybHkvMjAyMC8wOS8xNC8yMDIwLjA5LjEzLjIwMTkzODA1LmF0b20iO31zOjg6ImZyYWdtZW50IjtzOjA6IiI7fQ==) 8. 8.Chamie, Gabriel, Carina Marquez, Emily Crawford, James Peng, Maya Petersen, Daniel Schwab, Joshua Schwab et al. SARS-CoV-2 Community Transmission disproportionately affects Latinx population during Shelter-in-Place in San Francisco. Clinical Infectious Diseases. 2020. 9. 9.Wölfel R, Corman VM, Guggemos W, Seilmaier M, Zange S, Müller MA, et al. Virological assessment of hospitalized patients with COVID-2019. Nature. 2020;581: 465–469. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2196-x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) 10. 10.Jevšnik M, Steyer A, Zrim T, Pokorn M, Mrvič T, Grosek Š, et al. Detection of human coronaviruses in simultaneously collected stool samples and nasopharyngeal swabs from hospitalized children with acute gastroenteritis. Virol J. 2013;10: 46. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1186/1743-422X-10-46&link_type=DOI) 11. 11.Amoah ID, Kumari S, Bux F. Coronaviruses in wastewater processes: Source, fate and potential risks. Environ Int. 2020;143: 105962. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.envint.2020.105962&link_type=DOI) 12. 12.Bibby K, Peccia J. Identification of viral pathogen diversity in sewage sludge by metagenome analysis. Environ Sci Technol. 2013;47: 1945–1951. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1021/es305181x&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=23346855&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000315326700021&link_type=ISI) 13. 13.Wang X-W, Li J-S, Guo T-K, Zhen B, Kong Q-X, Yi B, et al. Concentration and detection of SARS coronavirus in sewage from Xiao Tang Shan Hospital and the 309th Hospital. Journal of Virological Methods. 2005. p. 165. doi: 10.1016/j.jviromet.2005.08.010 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jviromet.2005.08.010&link_type=DOI) 14. 14.Medema G, Heijnen L, Elsinga G, Italiaander R, Brouwer A. Presence of SARS-Coronavirus-2 RNA in Sewage and Correlation with Reported COVID-19 Prevalence in the Early Stage of the Epidemic in The Netherlands. Environmental Science & Technology Letters. 2020. pp. 511–516. doi: 10.1021/acs.estlett.0c00357 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1021/acs.estlett.0c00357&link_type=DOI) 15. 15.Ahmed W, Angel N, Edson J, Bibby K, Bivins A, O’Brien JW, et al. First confirmed detection of SARS-CoV-2 in untreated wastewater in Australia: A proof of concept for the wastewater surveillance of COVID-19 in the community. Sci Total Environ. 2020;728: 138764. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.scitotenv.2020.138764&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) 16. 16.Wu F, Zhang J, Xiao A, Gu X, Lee WL, Armas F, et al. SARS-CoV-2 Titers in Wastewater Are Higher than Expected from Clinically Confirmed Cases. mSystems. 2020; 5. doi: 10.1128/mSystems.00614-20 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoibXN5cyI7czo1OiJyZXNpZCI7czoxMzoiNS80L2UwMDYxNC0yMCI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE0LzIwMjAuMDkuMTMuMjAxOTM4MDUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 17. 17.Gonzalez R, Curtis K, Bivins A, Bibby K, Weir MH, Yetka K, et al. COVID-19 surveillance in Southeastern Virginia using wastewater-based epidemiology. Water Res. 2020;186: 116296. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.watres.2020.116296&link_type=DOI) 18. 18.Bivins A, North D, Ahmad A, Ahmed W, Alm E, Been F, et al. Wastewater-Based Epidemiology: Global Collaborative to Maximize Contributions in the Fight Against COVID-19. Environ Sci Technol. 2020;54: 7754–7757. 19. 19.Wu F, Xiao A, Zhang J, Moniz K, Endo N, Armas F, et al. SARS-CoV-2 titers in wastewater foreshadow dynamics and clinical presentation of new COVID-19 cases. medRxiv. 2020. doi: 10.1101/2020.06.15.20117747 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNi4xNS4yMDExNzc0N3YyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 20. 20.Weidhaas J, Aanderud Z, Roper D, VanDerslice J, Gaddis E, Ostermiller J, et al. Correlation of SARS-CoV-2 RNA in wastewater with COVID-19 disease burden in sewersheds. doi: 10.21203/rs.3.rs-40452/v1 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21203/rs.3.rs-40452/v1&link_type=DOI) 21. 21.Vallejo JA, Rumbo-Feal S, Conde-Perez K, Lopez-Oriona A, Tarrio J, Reif R, et al. Highly predictive regression model of active cases of COVID-19 in a population by screening wastewater viral load. doi: 10.1101/2020.07.02.20144865 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNy4wMi4yMDE0NDg2NXYzIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 22. 22.Peccia J, Zulli A, Brackney DE, Grubaugh ND, Kaplan EH, Casanovas-Massana A, et al. SARS-CoV-2 RNA concentrations in primary municipal sewage sludge as a leading indicator COVID-19 outbreak dynamics. doi: 10.1101/2020.05.19.20105999 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wNS4xOS4yMDEwNTk5OXYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 23. 23.Fernandez-Cassi X, Timoneda N, Martínez-Puchol S, Rusiñol M, Rodriguez-Manzano J, Figuerola N, et al. Metagenomics for the study of viruses in urban sewage as a tool for public health surveillance. Science of The Total Environment. 2018. pp. 870–880. doi: 10.1016/j.scitotenv.2017.08.249 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.scitotenv.2017.08.249&link_type=DOI) 24. 24.Martínez-Puchol S, Rusiñol M, Fernández-Cassi X, Timoneda N, Itarte M, Andrés C, et al. Characterisation of the sewage virome: comparison of NGS tools and occurrence of significant pathogens. Sci Total Environ. 2020;713: 136604. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.scitotenv.2020.136604&link_type=DOI) 25. 25.Cantalupo PG, Calgua B, Zhao G, Hundesa A, Wier AD, Katz JP, et al. Raw sewage harbors diverse viral populations. MBio. 2011;2. doi: 10.1128/mBio.00180-11 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NDoibWJpbyI7czo1OiJyZXNpZCI7czoxMzoiMi81L2UwMDE4MC0xMSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE0LzIwMjAuMDkuMTMuMjAxOTM4MDUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 26. 26.Ng TFF, Marine R, Wang C, Simmonds P, Kapusinszky B, Bodhidatta L, et al. High Variety of Known and New RNA and DNA Viruses of Diverse Origins in Untreated Sewage. J Virol. 86: 12161–12175. 27. 27.Nemudryi A, Nemudraia A, Wiegand T, Surya K, Buyukyoruk M, Cicha C, Vanderwood KK, Wilkinson R, Wiedenheft B. Temporal detection and phylogenetic assessment of SARS-CoV-2 in municipal wastewater. Cell Reports Medicine. 2020: 100098. 28. 28.Iaconelli M, Divizia M, Della Libera S, Di Bonito P, La Rosa G. Frequent Detection and Genetic Diversity of Human Bocavirus in Urban Sewage Samples. Food Environ Virol. 2016;8: 289–295 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12560-016-9251-7&link_type=DOI) 29. 29.Blinkova O, Rosario K, Li L, Kapoor A, Slikas B, Bernardin F, et al. Frequent detection of highly diverse variants of cardiovirus, cosavirus, bocavirus, and circovirus in sewage samples collected in the United States. J Clin Microbiol. 2009;47: 3507–3513. [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6MzoiamNtIjtzOjU6InJlc2lkIjtzOjEwOiI0Ny8xMS8zNTA3IjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 30. 30.Kitajima M, Sassi HP, Torrey JR. Pepper mild mottle virus as a water quality indicator. npj Clean Water. 2018;1: 1–9. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41545-018-0019-5&link_type=DOI) 31. 31.Thielen PM, Wohl S, Mehoke T, Ramakrishnan S, Kirsche M, Falade-Nwulia O, et al. Genomic Diversity of SARS-CoV-2 During Early Introduction into the United States National Capital Region. medRxiv. 2020. doi: 10.1101/2020.08.13.20174136 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoibWVkcnhpdiI7czo1OiJyZXNpZCI7czoyMToiMjAyMC4wOC4xMy4yMDE3NDEzNnYyIjtzOjQ6ImF0b20iO3M6NTA6Ii9tZWRyeGl2L2Vhcmx5LzIwMjAvMDkvMTQvMjAyMC4wOS4xMy4yMDE5MzgwNS5hdG9tIjt9czo4OiJmcmFnbWVudCI7czowOiIiO30=) 32. 32.Olm MR, Crits-Christoph A, Bouma-Gregson K, Firek B, Morowitz MJ, Banfield JF. InStrain enables population genomic analysis from metagenomic data and rigorous detection of identical microbial strains. doi: 10.1101/2020.01.22.915579 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wMS4yMi45MTU1Nzl2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE0LzIwMjAuMDkuMTMuMjAxOTM4MDUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 33. 33.Vanaerschot M, Mann SA, Webber JT, Kamm J, Bell SM, Bell J, et al. Identification of a polymorphism in the N gene of SARS-CoV-2 that adversely impacts detection by a widely-used RT-PCR assay. 2020. p. 2020.08.25.265074. doi: 10.1101/2020.08.25.265074 [Abstract/FREE Full Text](http://medrxiv.org/lookup/ijlink/YTozOntzOjQ6InBhdGgiO3M6MTQ6Ii9sb29rdXAvaWpsaW5rIjtzOjU6InF1ZXJ5IjthOjQ6e3M6ODoibGlua1R5cGUiO3M6NDoiQUJTVCI7czoxMToiam91cm5hbENvZGUiO3M6NzoiYmlvcnhpdiI7czo1OiJyZXNpZCI7czoxOToiMjAyMC4wOC4yNS4yNjUwNzR2MSI7czo0OiJhdG9tIjtzOjUwOiIvbWVkcnhpdi9lYXJseS8yMDIwLzA5LzE0LzIwMjAuMDkuMTMuMjAxOTM4MDUuYXRvbSI7fXM6ODoiZnJhZ21lbnQiO3M6MDoiIjt9) 34. 34.Centers for Disease Control and Prevention. “Research use only 2019-Novel Coronavirus (2019-nCoV) Real-time RT-PCR primers and probes.” Reviewed May 20 (2020): 2020. 35. 35.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9: 357–359 [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/nmeth.1923&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22388286&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000302218500017&link_type=ISI) 36. 36.Tarasov A, Vilella AJ, Cuppen E, Nijman IJ, Prins P. Sambamba: fast processing of NGS alignment formats. Bioinformatics. 2015;31: 2032–2034. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/bioinformatics/btv098&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25697820&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom) 37. 37.Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17: 261–272. [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41592-019-0686-2&link_type=DOI) [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32015543&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2020%2F09%2F14%2F2020.09.13.20193805.atom)