Abstract
During the COVID-19 pandemic, wastewater-based epidemiology has progressively taken a central role as a pathogen surveillance tool. Tracking viral loads and variant outbreaks in sewage offers advantages over clinical surveillance methods by providing unbiased estimates and enabling early detection. However, wastewater-based epidemiology poses new computational research questions that need to be solved in order for this approach to be implemented broadly and successfully. Here, we address the variant deconvolution problem, where we aim to estimate the relative abundances of genomic variants from next-generation sequencing data of a mixed wastewater sample. We introduce LolliPop, a computational method to solve the variant deconvolution problem. LolliPop is tailored to wastewater time series sequencing data and applies temporal regularization in the form of a fused ridge penalty. We show that this regularization is equivalent to kernel smoothing and that it makes abundance estimates robust to very high levels of missing data, which is common for wastewater sequencing. We use the bootstrap to produce confidence intervals, and develop analytical standard errors that can produce similar confidence intervals at a fraction of the computational cost. We demonstrate the application of our method to data from the Swiss wastewater surveillance efforts as well as on simulated data.
Author Summary Wastewater-based epidemiology has become a valuable tool for tracking viruses like SARS-CoV-2 across entire communities. Sequencing wastewater can reveal which viral variants are circulating, offering early and unbiased insights into variant dynamics. A central challenge is to infer the relative abundances of these variants from observed mutation data. This task is complicated by the fact that variant profiles can be highly similar, and the data is often noisy with many missing values, especially when the incidence of the pathogen is low. We developed LolliPop, a statistical method that leverages the time series structure of wastewater data to robustly deconvolve variant abundances and compute fast confidence intervals. Using both simulated data and real data from the Swiss national variant monitoring, we show that LolliPop is accurate and robust to high levels of missing data.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study was supported by the Swiss National Science Foundation (grant no. CRSII5_205933).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
Added simulation results, updated methods and results.
Data Availability
All data and code used to produce the results presented in this article is available at https://doi.org/10.5281/zenodo.15277339.