RT Journal Article SR Electronic T1 Evaluation of variant calling algorithms for wastewater-based epidemiology using mixed populations of SARS-CoV-2 variants in synthetic and wastewater samples JF medRxiv FD Cold Spring Harbor Laboratory Press SP 2022.06.06.22275866 DO 10.1101/2022.06.06.22275866 A1 Bassano, Irene A1 Ramachandran, Vinoy K. A1 Khalifa, Mohammad S. A1 Lilley, Chris J. A1 Brown, Mathew R. A1 van Aerle, Ronny A1 Denise, Hubert A1 Rowe, William A1 George, Airey A1 Cairns, Edward A1 Wierzbicki, Claudia A1 Pickwell, Natalie D. A1 Wilson, Myles A1 Carlile, Matthew A1 Holmes, Nadine A1 Payne, Alexander A1 Loose, Matthew A1 Burke, Terry A. A1 Paterson, Steve A1 Wade, Matthew J. A1 Grimsley, Jasmine M.S. YR 2022 UL http://medrxiv.org/content/early/2022/09/22/2022.06.06.22275866.abstract AB Wastewater-based epidemiology (WBE) has been used extensively throughout the COVID-19 pandemic to detect and monitor the spread and prevalence of SARS-CoV-2 and its variants. It has proven an excellent, complementary tool to clinical sequencing, supporting the insights gained and helping to make informed public health decisions. Consequently, many groups globally have developed bioinformatics pipelines to analyse sequencing data from wastewater. Accurate calling of mutations is critical in this process and in the assignment of circulating variants, yet, to date, the performance of variant-calling algorithms in wastewater samples has not been investigated. To address this, we compared the performance of six variant callers (VarScan, iVar, GATK, FreeBayes, LoFreq and BCFtools), used widely in bioinformatics pipelines, on 19 synthetic samples with known ratios of three different SARS-CoV-2 variants (Alpha, Beta and Delta), as well as 13 wastewater samples collected in London between the 15–18 December 2021. We used the fundamental parameters of recall (sensitivity) and precision (specificity) to confirm the presence of mutational profiles defining specific variants across the six variant callers.Our results show that BCFtools, FreeBayes and VarScan found the expected variants with higher precision and recall than GATK or iVar, although the latter identified more expected defining mutations than other callers. LoFreq gave the least reliable results due to the high number of false-positive mutations detected, resulting in lower precision. Similar results were obtained for both the synthetic and wastewater samples.Competing Interest StatementThe authors have declared no competing interest.Funding StatementAcquisition of the financial support for the project leading to this publication: This work was supported by the UK Health Security Agency, the Natural Environment Research Council (NERC) Environmental Omics Facility (NEOF), and NERC grant NE/V010441/1 to Terry Burke.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors