Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease
Highlights
► We model multiple pathways of transmission in cholera and waterborne disease. ► We examine the identifiability issues involved in estimating the model parameters. ► The waterborne transmission parameters are practically unidentifiable with noise. ► This leads to unidentifiability of public health parameters such as . ► Adding environmental pathogen data can improve parameter estimates.
Introduction
There is an urgent need to understand the factors influencing waterborne disease dynamics and transmission. Waterborne diseases result in over 3.5 million deaths annually according to World Health Organization (WHO) estimates (Prüss-Üstün et al., 2008), and cholera alone is responsible for 3–5 million cases/year and over 100,000 dealths/year (World Health Organization, 2010). The ongoing epidemic in Haiti and recent severe outbreaks in Zimbabwe (Kapp, 2009), Angola (Sack et al., 2006), and South Africa (Mugero and Hoque, 2001) all emphasize the need for greater understanding of cholera and other waterborne diseases.
One of the most commonly used models for examining disease dynamics is the Susceptible–Infected–Recovered (SIR) model (first introduced by Kermack and McKendrick, 1927). For waterborne diseases such as cholera, infected individuals shed pathogen into the water where it may persist for a significant amount of time. The serial interval between infections thus depends both upon the infectious period of an individual and on the persistence time of pathogen in environmental water sources. This latter persistence time may be highly dependent upon environmental conditions such as salinity, pH, and nutrient availability, ranging from days to several weeks or longer in the case of cholera (Feachem et al., 1983, Xu et al., 1982, Nelson et al., 2009). Indeed, Vibrio cholerae can persist indefinitely outside of human hosts in marine environments in association with plankton (Tamplin et al., 1990). Furthermore, a given outbreak may involve multiple environmental reservoirs, each with different associated pathogen persistence times. For example, contamination of household water storage containers and transmission through food prepared by infected individuals have both been implicated in cholera epidemics (Swerdlow et al., 1992). The relative contributions of these different pathways can have a large impact on the basic reproduction number (Mukandavire et al., 2011). Cholera bacteria also have differential infectivity depending upon the time since the bacteria was shed, with freshly shed bacteria existing in a “hyperinfectious” state (Holmberg et al., 1984, Hartley et al., 2006, Merrell et al., 2002). Identifying these pathways and associated timescales of transmission is thus of great interest for waterborne diseases generally and cholera specifically.
We focus here on the SIWR model introduced by Tien and Earn (2010) (Fig. 1), an SIR-type model with additional compartment representing pathogen in the water. This model includes two transmission pathways: direct/fast transmission and indirect/slower transmission. Discussion of the timescales of transmission and mathematical analysis of the model are given in Tien and Earn (2010). The relevance of multiple transmission pathways to many different diseases and the suitability of the SIWR model to these diseases are also considered in Tien and Earn (2010). For cholera in particular, a key epidemiological problem is distinguishing the relative contributions of disease transmission from human (direct) vs. environmental (indirect) pathways (Hartley et al., 2006, King et al., 2008, Codeco, 2001). In this paper, we consider whether parameters for the SIWR model can be estimated from outbreak data. This includes consideration of when can be estimated from available data, and is relevant for predicting the severity of an epidemic, examining the timescales of an outbreak, and guiding public health interventions. We first present general identifiability results for the SIWR model, followed by specific results on fitting the SIWR model to cholera outbreak data from Angola 2006–2007. Previous applications of the SIWR model to cholera epidemics can be found in Tien et al. (2011) and Tuite et al. (2011).
Although the data-driven results presented here focus on cholera, the theoretical results on structural and practical identifiability for the SIWR model are more broadly applicable to other waterborne diseases, such as Giardia, Cryptosporidium, Campylobacter, hepatitis A and E, norovirus, rotavirus, and Escherichia coli O157:H7 (Ashbolt, 2004, Butzler, 2004, Ford, 1999, Gerba et al., 1996, Karanis et al., 2007, Leclerc et al., 2002, Marshall et al., 1997, Sack et al., 2004, Schuster et al., 2005). These diseases can be transmitted through a range of different pathways, from pathogen ingestion through contaminated water to direct contact with infected individuals. For example, Giardia is transmitted via drinking contaminated water, but direct contact with infected individuals is also an established risk factor (Andersen and Neumann, 2007), whereas hepatitis A transmission occurs primarily through person–person contact, with contaminated water providing a secondary transmission route (Nasser, 1994). The SIWR model may also be used to gain insight into the relative contribution of these alternative transmission pathways, as discussed further in Tien and Earn (2010).
Because many of the model parameters are not directly measurable, connecting disease models with outbreak data to yield predictive results requires a variety of parameter estimation, identifiability, and uncertainty quantification techniques. A key first step of parameter estimation is determining whether the estimation problem is well-posed for a given model and data (Evans et al., 2005, Meshkat et al., 2009). Structural identifiability analysis examines whether the model parameters can be identified in the best-case scenario of noise-free data, a necessary condition for finding solutions to the real noisy data problem. Identifiability and uncertainty quantification methods allow us to address whether or not it is possible to uniquely recover the parameters for a given data set, and with what degree of certainty. If the parameters cannot be determined (denoted unidentifiability), identifiability approaches may reveal ways to reduce the model and determine combinations of parameters that can be estimated even when individual parameters may not (Evans et al., 2005, Meshkat et al., 2009). These issues can be of particular importance in health contexts where the parameter values have biological or public health implications, e.g. estimates of in epidemiological models. Waterborne disease models lend themselves particularly well to questions of identifiability because of the public health importance of distinguishing multiple transmission pathways (Ashbolt, 2004, Eisenberg et al., 2002, Hunter et al., 2003, Hartley et al., 2006), which are often quite difficult to measure directly. Indeed, mathematical modeling and parameter estimation is used in public health practice to gain insight into waterborne disease transmission pathways and potential interventions, e.g. in the recent cholera epidemic in Haiti (Abrams et al., 2012, Tuite et al., 2011, Date et al., 2011), making the question of whether these parameters can accurately be identified of key public health importance (Koopman, 2004, Greenland and Robins, 1986, Grad et al., 2012, Alam et al., 2013, Chick et al., 2003).
Identifiability has been studied for several SIR-type models using a variety of approaches (Evans et al., 2005, Meshkat et al., 2009, Chapman and Evans, 2009). Evans et al. showed that the density independent transmission form of the SIR model, as well as several SIR variants, are unidentifiable for prevalence or incidence data measurements, using a similarity transformation approach (Evans et al., 2005, Chapman and Evans, 2009). However, by examining the identifiable combinations for the SIR model, it can be shown that a commonly used nondimensionalization in terms of these combinations results in identifiability (Evans et al., 2005, Meshkat et al., 2009). Tien and Earn (2010) showed that the SIWR model is unidentifiable in the limit where the water dynamics are fast, so that both direct and indirect transmission have similar timescales. However, SIWR identifiability was not determined for the general case. Tien and Earn also found several examples wherein an SIWR model trajectory was able to be fitted quite closely by an SIR model. Based on Tien and Earn's results, we might expect the SIWR model to be unidentifiable in general, and certainly for large pathogen decay rates.
In this paper, we use an extension (Eisenberg, 2013) of the characteristic set-based differential algebra approach to identifiability (Meshkat et al., 2009, Audoly et al., 2001, Bellu et al., 2007, Pia Saccomani et al., 2003, Ljung and Glad, 1994, Ollivier, 1990) to show that the SIWR model is structurally identifiable. However, further numerical analysis confirms that the two transmission pathway parameters are practically unidentifiable when the indirect/water transmission timescale is fast. We next apply the model to data from a recent 2006 cholera outbreak in Angola (data courtesy of the WHO Cholera Task Force). We estimate the model parameters to examine whether in practice, with real data, we can determine the model parameters. By testing several candidate models, we show that although the waterborne pathway appears more significant, both pathways are necessary to fit the data well. We also explore the local practical identifiability properties for the model under a variety of conditions using simulated data, both with and without noise. Although the model is globally structurally identifiable, the practical identifiability properties may change depending on where the true parameters lie. We examine how the model identifiability varies depending on noise and true parameter values, and establish what practical identifiable combinations emerge as these changes take place. We show that adding water information (such as knowledge of the pathogen lifetime in the water or time series measurements of pathogen concentration in the water) can improve the model identifiability and allow additional information (specifically, the pathogen shedding rate into the water) to be estimated from the data.
Section snippets
Model equations
The SIWR model equations are given bywhere S represents susceptibles, I infecteds, W the pathogen concentration in the water, R the recovered/removed population, and we take a constant total population size . The parameter represents the natural birth/death rate for the population, the transmission parameter for direct transmission (where disease is transmitted to susceptibles by contact with infected individuals), and the
Structural identifiability analysis
Our aim in this section is to determine whether parameters for the SIWR model can be identified in the ideal situation of perfect, noise-free data. This is clearly an unrealistic situation, but is nevertheless an important one. For example, the analysis in this section gives insight into model parameters which compensate for one another, leading to the inability to identify epidemiologically important quantities such as the shedding rate of bacteria into the environment. The analysis also shows
Model applications and practical identifiability
Although we have found that the scaled SIWR model (2) is structurally identifiable, we have not yet addressed practical identifiability. In this section we examine these issues in the practical situation of noisy data, using both synthetic data, as well as data from a recent (2006) cholera outbreak in Angola.
Discussion
Parameter identifiability is an important question for epidemiological modeling: the ability to estimate model parameters from a given data set will determine the ability to estimate fundamental quantities such as the basic reproduction number, and to assess the efficacy of different intervention strategies. This is particularly relevant for waterborne disease models because of the public health importance of distinguishing multiple transmission pathways, which are often quite difficult to
Acknowledgments
We thank the World Health Organization Cholera Task Force for providing us with data from the 2006 cholera outbreak in Angola. This work was supported by the National Science Foundation through the Mathematical Biosciences Institute (DMS 0931642) and Grant OCE-1115881 (to J.T. and M.E.).
References (78)
- et al.
Detectable signals of episodic risk effects on acute HIV transmissionstrategies for analyzing transmission systems using genetic data
Epidemics
(2013) Microbial contamination of drinking water and disease outcomes in developing regions
Toxicology
(2004)- et al.
Saam IISimulation, analysis, and modeling software for tracer and pharmacokinetic studies
Metabolism
(1998) - et al.
On structural identifiability
Math. Biosci.
(1970) - et al.
Daisya new software tool to test global identifiability of biological and physiological systems
Comput. Methods Programs Biomed.
(2007) Campylobacter, from obscurity to celebrity
Clin. Microbiol. Infect.
(2004)- et al.
The structural identifiability of susceptible–infective–recovered type epidemic models with incomplete immunity and birth targeted vaccination
Biomed. Signal Process. Control
(2009) - et al.
A procedure for generating locally identifiable reparameterisations of unidentifiable non-linear systems by the similarity transformation approach
Math. Biosci.
(1998) Differential-algebraic decision methods and some applications to system theory
Theor. Comput. Sci.
(1992)Complete parameter bounds and quasiidentifiability conditions for a class of unidentifiable linear systems
Math. Biosci.
(1983)
The structural identifiability of the susceptible infected recovered model with seasonal forcing
Math. Biosci.
Waterborne rotavirusa risk assessment
Water Res.
Foodborne transmission of cholera in micronesian households
Lancet
Numerical parameter identifiability and estimabilityintegrating identifiability, estimability, and optimal sampling design
Math. Biosci.
Zimbabwe's humanitarian crisis worsens
Lancet
On global identifiability for arbitrary model parameterization
Automatica
Differential algebra methods for the study of the structural identifiability of rational function state-space models in the biosciences
Math. Biosci.
An algorithm for finding globally identifiable parameter combinations of nonlinear ode models using Groebner bases
Math. Biosci.
Parameter identifiability of nonlinear systemsthe role of initial conditions
Automatica
System identifiability based on the power series expansion of the solution
Math. Biosci.
Cholera
Lancet
Waterborne transmission of epidemic cholera in Trujillo, Perulessons for a continent at risk
Lancet
Real-time modelling used for outbreak management during a cholera epidemic, Haiti, 2010–2011
Epidemiol. Infect.
Giardia intestinalisnew insights on an old pathogen
Rev. Med. Microbiol.
Global identifiability of nonlinear models of biological systems
IEEE Trans. Biomed. Eng.
Observation and model error effects on parameter estimates in susceptible–infected–recovered epidemic model
Far East J. Theor. Stat.
Inferring infection transmission parameters that influence water treatment decisions
Manage. Sci.
Parameter and structural identifiability concepts and ambiguitiesa critical review and analysis
Am. J. Physiol. Regul. Integrative Comp. Physiol.
Ideals, Varieties, and AlgorithmsAn Introduction to Computational Algebraic Geometry and Commutative Algebra
Considerations for oral cholera vaccine use during outbreak after earthquake in Haiti, 2010–2011
Emerging Infect. Dis.
Disease transmission models for public health decision making: analysis of epidemic and endemic conditions caused by waterborne pathogens
Environ. Health Perspect.
L-T4 bioequivalence and hormone replacement studies via feedback control simulations
Thyroid
Extensions, validation, and clinical applications of a feedback control system simulator of the hypothalamo-pituitary-thyroid axis
Thyroid
A statistical algorithm for the early detection of outbreaks of infectious disease
J. R. Stat. Soc. Ser. A (Stat. Soc.)
Microbiological safety of drinking waterUnited States and global perspectives
Environ. Health Perspect.
Cited by (143)
Reconstruction of incidence reporting rate for SARS-CoV-2 Delta variant of COVID-19 pandemic in the US
2024, Infectious Disease ModellingOptimal control strategies for water, sanitation, and hygiene in mitigating spread of waterborne diseases
2024, Journal of Process ControlVaccination compartmental epidemiological models for the delta and omicron SARS-CoV-2 variants
2024, Mathematical BiosciencesAnalysis of global stability and asymptotic properties of a Cholera model with multiple transmission routes, spatial diffusion and incomplete immunity
2024, Communications in Nonlinear Science and Numerical SimulationEffect of cross-immunity in a two-strain cholera model with aquatic component
2023, Mathematical BiosciencesModel selection and identifiability analysis of HIV and SARS-CoV-2 co-infection model with drug therapy
2023, Communications in Nonlinear Science and Numerical Simulation