Faecal shedding models for SARS-CoV-2 RNA amongst hospitalised patients and implications for wastewater-based epidemiology

The concentration of SARS-CoV-2 RNA in faeces is not well established, posing challenges for wastewater-based surveillance of COVID-19 and risk assessments of environmental transmission. We develop versatile hierarchical models for faecal RNA shedding and apply them to data collected in six studies. We find that the mean number of gene copies per mL of faeces is 1.9x106 (2.3x105--2.0x108 95% credible interval) amongst hospitalised patients. We find no evidence for a subpopulation of patients who do not shed RNA: limits of quantification can account for negative stool samples. Our models indicate that hospitalised patients represent the tail of the shedding profile with a half-life of 34 hours (28--43 95% credible interval), suggesting that wastewater-based surveillance signals are more indicative of incidence than prevalence and can be a leading indicator of clinical presentation. Shedding amongst inpatients cannot explain high RNA concentrations observed in wastewater, consistent with more abundant shedding during the early infection course.


26
The novel virus SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) infected 27 over 80 million people in 2020, and more than 2.6 million people have succumbed to the 28 resultant disease, COVID-19 (coronavirus disease 2019) [1]. Individuals infected by the Days past symptom onset t log SARS-CoV-2 RNA load population-level distribution, describing variation in shedding between patients patient location parameter , describing the amplitude of individual shedding profiles patient-level distribution, describing variation between samples from the same patient sample-level RNA load y quantified by RT-qPCR assays shedding profile g(t), describing the evolution of RNA loads limit of quantification early shedding; currently unconstrained by data late shedding; consistent with exponential profile Figure 1: A flexible hierarchical model can capture a wide range of aspects of real-world data. The population-level distribution (shown in blue) captures variation in shedding behaviour between patients, giving rise to location parameters µ i for each patient i (see section 4.1 for details). The location parameters (blue squares) describe the amplitude of individual shedding profiles as illustrated for three patients. The patient-level distribution (shown in orange for one patient) describes the variation between samples from the same patient, and the shedding profile (solid black line) modulates typical RNA concentrations in faecal samples over time. All samples are analysed using an RT-qPCR assay with a given limit of quantification (LOQ) θ (dashed black line), and concentrations above the LOQ (orange circles) can be quantified. Concentrations below the LOQ (orange cross) cannot be quantified. Currently available data do not allow us to constrain the early shedding profile, and late-stage shedding is consistent with an exponential decay profile.
3 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  Table 1: Six studies providing quantitative data on faecal RNA loads were analysed. Microdata, i.e. sample-level RNA concentrations, provided by the first four studies were used to fit random effects models. The next two studies did not provide microdata, and we validated our models by comparing reported summary statistics with model-based predictions, as discussed in section 4.4. Samples with viral loads below the limit of quantification (LOQ) are considered negative for SARS-CoV-2. The table also reports the total number of patients n and number of samples m, including a breakdown by positivity.    is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The second result is that there is no evidence for a subpopulation of patients who do  To assess the out-of-sample predictive utility of the models, we considered predictions is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint of their protocols to simulate the studies and make predictions by sampling from the 117 posterior predictive distribution (see section 4.4 for details). Since our models have not 118 been fit to these data, their ability to predict summary statistics of those data is an 119 indication of how well the models generalise. Temporal models are preferred by the 120 data, but we do not have any temporal information about the studies conducted by  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 17, 2021. ; the infection likely explains these observations. The rapid decay of the shedding profile also implies that signals from wastewater-based surveillance of SARS-CoV-2 are likely 157 more indicative of incidence, rather than prevalence. Wastewater-based surveillance is 158 thus a promising approach for early detection of cases in the community. A better 159 understanding of the shape of the shedding profile, including prior to symptom onset, where Γ denotes the gamma function (see appendix A for details). is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Third, the RNA concentration y ij is quantified using an assay with level of quantifi- where f denotes the probability density function of the generalised gamma distribution   In particular, we let λ i (t) = λ i (t = 0)g(t), where t is the number of days since symptom

226
We also considered two further shedding profiles to assess the sensitivity of our infer-227 ences to the choice of g(t). First, we used a gamma shedding profile where α > 0 and β > 0 control the shape of the profile and t 0 is the time at which 229 shedding can first occur, i.e. g(t < t 0 ) = 0. Second, we considered the exponential is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint where α and β control the rise and decay of the profile, respectively.  We know that z i = 1 for any patient with one or more positive samples, and the likeli-241 hood follows eq. (2). However, for any patient i whose m i samples are all below the LOQ, the likelihood is where the first term accounts for a patient who cannot shed RNA (z i = 0) and the 244 second accounts for a patient whose samples are all below the LOQ (z i = 1).  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint ples to assess transmission risks associated with faecal microbiota transplants. These 266 data are not used to fit the hierarchical models but instead to assess the out-of-sample 267 predictive utility of our fitted models.

268
Five studies were excluded from the analysis despite providing quantitative informa-     is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  : Different shedding profiles yield consistent results for late-stage shedding, but early shedding behaviour (prior to day five after symptom onset) cannot be constrained given the available data. Panels (a) and (b) show the the posterior distributions for the time t peak at which the inferred gamma and Teunis shedding profiles peak, respectively. Because the data cannot constrain early shedding, the posterior for t peak has broad support. Panels (c) and (d) show posterior samples of the gamma and Teunis profiles, respectively, in blue. The exponential profile discussed in section 2 is shown as an orange line, and the shaded region corresponds to the 95% credible interval.

13
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 17, 2021. ; https://doi.org/10.1101/2021.03.16.21253603 doi: medRxiv preprint Second, the shape Q controls the tails of the population-level distribution: the larger concentration, i.e. the expected RNA concentration in a random faecal sample from 309 a previously unobserved patient, depends on the tail behaviour because the mean is 310 sensitive to outliers [36]. As shown in fig. 5 (a)  Third, the 95% credible region for the population-level shape Q and scale S is consis-318 tent with the log-normal distribution (Q = 0) or Weibull distribution (Q = 1) for both 319 the constant and temporal models, as shown in fig. 5 (b). However, the special case of  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Mean RNA copies per mL (a) 9 5 % constant standard temporal standard Weibull gamma sample mean Figure 5: The mean number of RNA copies per mL of faeces is sensitive to the shape of the shedding distribution. Panel (a) shows the conditional distribution of the mean number of RNA copies per mL of faeces given the population shape parameter Q for the constant standard model. The mean tends to be larger for smaller Q due to heavier tails. Panels (b) and (c) show the joint distributions of the shape and scale parameters for the population-level distribution and patient-level distribution, respectively. Parametrisations corresponding to the Weibull distribution (Q = q = 1) and gamma distribution (Q = S and q = σ) are shown as black dot-dashed and dotted lines, respectively. The patient-level scale parameter σ is larger for the constant than the temporal model (both without a non-shedding subpopulation) because the patient-level distribution needs to account for both temporal variability and idiosyncratic sample-tosample variability.

15
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint where s is a gamma random variable with shape q −2 . The probability density function used in eq. (2) is thus f (x; q, µ, σ) = q σxΓ (q −2 ) [s(x)] q −2 exp (−s(x)) , Similarly, the cumulative distribution function is 536 F (x; q, µ, σ) = γ q −2 , s(x) where γ is the lower incomplete gamma function.