Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease

https://doi.org/10.1016/j.jtbi.2012.12.021Get rights and content

Abstract

Cholera and many waterborne diseases exhibit multiple characteristic timescales or pathways of infection, which can be modeled as direct and indirect transmission. A major public health issue for waterborne diseases involves understanding the modes of transmission in order to improve control and prevention strategies. An important epidemiological question is: given data for an outbreak, can we determine the role and relative importance of direct vs. environmental/waterborne routes of transmission? We examine whether parameters for a differential equation model of waterborne disease transmission dynamics can be identified, both in the ideal setting of noise-free data (structural identifiability) and in the more realistic setting in the presence of noise (practical identifiability). We used a differential algebra approach together with several numerical approaches, with a particular emphasis on identifiability of the transmission rates. To examine these issues in a practical public health context, we apply the model to a recent cholera outbreak in Angola (2006). Our results show that the model parameters—including both water and person-to-person transmission routes—are globally structurally identifiable, although they become unidentifiable when the environmental transmission timescale is fast. Even for water dynamics within the identifiable range, when noisy data are considered, only a combination of the water transmission parameters can practically be estimated. This makes the waterborne transmission parameters difficult to estimate, leading to inaccurate estimates of important epidemiological parameters such as the basic reproduction number (R0). However, measurements of pathogen persistence time in environmental water sources or measurements of pathogen concentration in the water can improve model identifiability and allow for more accurate estimation of waterborne transmission pathway parameters as well as R0. Parameter estimates for the Angola outbreak suggest that both transmission pathways are needed to explain the observed cholera dynamics. These results highlight the importance of incorporating environmental data when examining waterborne disease.

Highlights

► We model multiple pathways of transmission in cholera and waterborne disease. ► We examine the identifiability issues involved in estimating the model parameters. ► The waterborne transmission parameters are practically unidentifiable with noise. ► This leads to unidentifiability of public health parameters such as R0. ► Adding environmental pathogen data can improve parameter estimates.

Introduction

There is an urgent need to understand the factors influencing waterborne disease dynamics and transmission. Waterborne diseases result in over 3.5 million deaths annually according to World Health Organization (WHO) estimates (Prüss-Üstün et al., 2008), and cholera alone is responsible for 3–5 million cases/year and over 100,000 dealths/year (World Health Organization, 2010). The ongoing epidemic in Haiti and recent severe outbreaks in Zimbabwe (Kapp, 2009), Angola (Sack et al., 2006), and South Africa (Mugero and Hoque, 2001) all emphasize the need for greater understanding of cholera and other waterborne diseases.

One of the most commonly used models for examining disease dynamics is the Susceptible–Infected–Recovered (SIR) model (first introduced by Kermack and McKendrick, 1927). For waterborne diseases such as cholera, infected individuals shed pathogen into the water where it may persist for a significant amount of time. The serial interval between infections thus depends both upon the infectious period of an individual and on the persistence time of pathogen in environmental water sources. This latter persistence time may be highly dependent upon environmental conditions such as salinity, pH, and nutrient availability, ranging from days to several weeks or longer in the case of cholera (Feachem et al., 1983, Xu et al., 1982, Nelson et al., 2009). Indeed, Vibrio cholerae can persist indefinitely outside of human hosts in marine environments in association with plankton (Tamplin et al., 1990). Furthermore, a given outbreak may involve multiple environmental reservoirs, each with different associated pathogen persistence times. For example, contamination of household water storage containers and transmission through food prepared by infected individuals have both been implicated in cholera epidemics (Swerdlow et al., 1992). The relative contributions of these different pathways can have a large impact on the basic reproduction number R0 (Mukandavire et al., 2011). Cholera bacteria also have differential infectivity depending upon the time since the bacteria was shed, with freshly shed bacteria existing in a “hyperinfectious” state (Holmberg et al., 1984, Hartley et al., 2006, Merrell et al., 2002). Identifying these pathways and associated timescales of transmission is thus of great interest for waterborne diseases generally and cholera specifically.

We focus here on the SIWR model introduced by Tien and Earn (2010) (Fig. 1), an SIR-type model with additional compartment representing pathogen in the water. This model includes two transmission pathways: direct/fast transmission and indirect/slower transmission. Discussion of the timescales of transmission and mathematical analysis of the model are given in Tien and Earn (2010). The relevance of multiple transmission pathways to many different diseases and the suitability of the SIWR model to these diseases are also considered in Tien and Earn (2010). For cholera in particular, a key epidemiological problem is distinguishing the relative contributions of disease transmission from human (direct) vs. environmental (indirect) pathways (Hartley et al., 2006, King et al., 2008, Codeco, 2001). In this paper, we consider whether parameters for the SIWR model can be estimated from outbreak data. This includes consideration of when R0 can be estimated from available data, and is relevant for predicting the severity of an epidemic, examining the timescales of an outbreak, and guiding public health interventions. We first present general identifiability results for the SIWR model, followed by specific results on fitting the SIWR model to cholera outbreak data from Angola 2006–2007. Previous applications of the SIWR model to cholera epidemics can be found in Tien et al. (2011) and Tuite et al. (2011).

Although the data-driven results presented here focus on cholera, the theoretical results on structural and practical identifiability for the SIWR model are more broadly applicable to other waterborne diseases, such as Giardia, Cryptosporidium, Campylobacter, hepatitis A and E, norovirus, rotavirus, and Escherichia coli O157:H7 (Ashbolt, 2004, Butzler, 2004, Ford, 1999, Gerba et al., 1996, Karanis et al., 2007, Leclerc et al., 2002, Marshall et al., 1997, Sack et al., 2004, Schuster et al., 2005). These diseases can be transmitted through a range of different pathways, from pathogen ingestion through contaminated water to direct contact with infected individuals. For example, Giardia is transmitted via drinking contaminated water, but direct contact with infected individuals is also an established risk factor (Andersen and Neumann, 2007), whereas hepatitis A transmission occurs primarily through person–person contact, with contaminated water providing a secondary transmission route (Nasser, 1994). The SIWR model may also be used to gain insight into the relative contribution of these alternative transmission pathways, as discussed further in Tien and Earn (2010).

Because many of the model parameters are not directly measurable, connecting disease models with outbreak data to yield predictive results requires a variety of parameter estimation, identifiability, and uncertainty quantification techniques. A key first step of parameter estimation is determining whether the estimation problem is well-posed for a given model and data (Evans et al., 2005, Meshkat et al., 2009). Structural identifiability analysis examines whether the model parameters can be identified in the best-case scenario of noise-free data, a necessary condition for finding solutions to the real noisy data problem. Identifiability and uncertainty quantification methods allow us to address whether or not it is possible to uniquely recover the parameters for a given data set, and with what degree of certainty. If the parameters cannot be determined (denoted unidentifiability), identifiability approaches may reveal ways to reduce the model and determine combinations of parameters that can be estimated even when individual parameters may not (Evans et al., 2005, Meshkat et al., 2009). These issues can be of particular importance in health contexts where the parameter values have biological or public health implications, e.g. estimates of R0 in epidemiological models. Waterborne disease models lend themselves particularly well to questions of identifiability because of the public health importance of distinguishing multiple transmission pathways (Ashbolt, 2004, Eisenberg et al., 2002, Hunter et al., 2003, Hartley et al., 2006), which are often quite difficult to measure directly. Indeed, mathematical modeling and parameter estimation is used in public health practice to gain insight into waterborne disease transmission pathways and potential interventions, e.g. in the recent cholera epidemic in Haiti (Abrams et al., 2012, Tuite et al., 2011, Date et al., 2011), making the question of whether these parameters can accurately be identified of key public health importance (Koopman, 2004, Greenland and Robins, 1986, Grad et al., 2012, Alam et al., 2013, Chick et al., 2003).

Identifiability has been studied for several SIR-type models using a variety of approaches (Evans et al., 2005, Meshkat et al., 2009, Chapman and Evans, 2009). Evans et al. showed that the density independent transmission form of the SIR model, as well as several SIR variants, are unidentifiable for prevalence or incidence data measurements, using a similarity transformation approach (Evans et al., 2005, Chapman and Evans, 2009). However, by examining the identifiable combinations for the SIR model, it can be shown that a commonly used nondimensionalization in terms of these combinations results in identifiability (Evans et al., 2005, Meshkat et al., 2009). Tien and Earn (2010) showed that the SIWR model is unidentifiable in the limit where the water dynamics are fast, so that both direct and indirect transmission have similar timescales. However, SIWR identifiability was not determined for the general case. Tien and Earn also found several examples wherein an SIWR model trajectory was able to be fitted quite closely by an SIR model. Based on Tien and Earn's results, we might expect the SIWR model to be unidentifiable in general, and certainly for large pathogen decay rates.

In this paper, we use an extension (Eisenberg, 2013) of the characteristic set-based differential algebra approach to identifiability (Meshkat et al., 2009, Audoly et al., 2001, Bellu et al., 2007, Pia Saccomani et al., 2003, Ljung and Glad, 1994, Ollivier, 1990) to show that the SIWR model is structurally identifiable. However, further numerical analysis confirms that the two transmission pathway parameters are practically unidentifiable when the indirect/water transmission timescale is fast. We next apply the model to data from a recent 2006 cholera outbreak in Angola (data courtesy of the WHO Cholera Task Force). We estimate the model parameters to examine whether in practice, with real data, we can determine the model parameters. By testing several candidate models, we show that although the waterborne pathway appears more significant, both pathways are necessary to fit the data well. We also explore the local practical identifiability properties for the model under a variety of conditions using simulated data, both with and without noise. Although the model is globally structurally identifiable, the practical identifiability properties may change depending on where the true parameters lie. We examine how the model identifiability varies depending on noise and true parameter values, and establish what practical identifiable combinations emerge as these changes take place. We show that adding water information (such as knowledge of the pathogen lifetime in the water or time series measurements of pathogen concentration in the water) can improve the model identifiability and allow additional information (specifically, the pathogen shedding rate into the water) to be estimated from the data.

Section snippets

Model equations

The SIWR model equations are given byS˙=μNbWSWbISIμSI˙=bWSW+bISIγIμIW˙=αIξWR˙=γIμRwhere S represents susceptibles, I infecteds, W the pathogen concentration in the water, R the recovered/removed population, and we take a constant total population size N=S+I+R. The parameter μ represents the natural birth/death rate for the population, bI the transmission parameter for direct transmission (where disease is transmitted to susceptibles by contact with infected individuals), and γ the

Structural identifiability analysis

Our aim in this section is to determine whether parameters for the SIWR model can be identified in the ideal situation of perfect, noise-free data. This is clearly an unrealistic situation, but is nevertheless an important one. For example, the analysis in this section gives insight into model parameters which compensate for one another, leading to the inability to identify epidemiologically important quantities such as the shedding rate of bacteria into the environment. The analysis also shows

Model applications and practical identifiability

Although we have found that the scaled SIWR model (2) is structurally identifiable, we have not yet addressed practical identifiability. In this section we examine these issues in the practical situation of noisy data, using both synthetic data, as well as data from a recent (2006) cholera outbreak in Angola.

Discussion

Parameter identifiability is an important question for epidemiological modeling: the ability to estimate model parameters from a given data set will determine the ability to estimate fundamental quantities such as the basic reproduction number, and to assess the efficacy of different intervention strategies. This is particularly relevant for waterborne disease models because of the public health importance of distinguishing multiple transmission pathways, which are often quite difficult to

Acknowledgments

We thank the World Health Organization Cholera Task Force for providing us with data from the 2006 cholera outbreak in Angola. This work was supported by the National Science Foundation through the Mathematical Biosciences Institute (DMS 0931642) and Grant OCE-1115881 (to J.T. and M.E.).

References (78)

  • N.D. Evans et al.

    The structural identifiability of the susceptible infected recovered model with seasonal forcing

    Math. Biosci.

    (2005)
  • C. Gerba et al.

    Waterborne rotavirusa risk assessment

    Water Res.

    (1996)
  • S.D. Holmberg et al.

    Foodborne transmission of cholera in micronesian households

    Lancet

    (1984)
  • J.A. Jacquez et al.

    Numerical parameter identifiability and estimabilityintegrating identifiability, estimability, and optimal sampling design

    Math. Biosci.

    (1985)
  • C. Kapp

    Zimbabwe's humanitarian crisis worsens

    Lancet

    (2009)
  • L. Ljung et al.

    On global identifiability for arbitrary model parameterization

    Automatica

    (1994)
  • G. Margaria et al.

    Differential algebra methods for the study of the structural identifiability of rational function state-space models in the biosciences

    Math. Biosci.

    (2001)
  • N. Meshkat et al.

    An algorithm for finding globally identifiable parameter combinations of nonlinear ode models using Groebner bases

    Math. Biosci.

    (2009)
  • M. Pia Saccomani et al.

    Parameter identifiability of nonlinear systemsthe role of initial conditions

    Automatica

    (2003)
  • H. Pohjanpalo

    System identifiability based on the power series expansion of the solution

    Math. Biosci.

    (1978)
  • D. Sack et al.

    Cholera

    Lancet

    (2004)
  • D. Swerdlow et al.

    Waterborne transmission of epidemic cholera in Trujillo, Perulessons for a continent at risk

    Lancet

    (1992)
  • J.Y. Abrams et al.

    Real-time modelling used for outbreak management during a cholera epidemic, Haiti, 2010–2011

    Epidemiol. Infect.

    (2012)
  • M.D. Andersen et al.

    Giardia intestinalisnew insights on an old pathogen

    Rev. Med. Microbiol.

    (2007)
  • S. Audoly et al.

    Global identifiability of nonlinear models of biological systems

    IEEE Trans. Biomed. Eng.

    (2001)
  • G.C.T.L. Burr

    Observation and model error effects on parameter estimates in susceptible–infected–recovered epidemic model

    Far East J. Theor. Stat.

    (2006)
  • S.E. Chick et al.

    Inferring infection transmission parameters that influence water treatment decisions

    Manage. Sci.

    (2003)
  • CIA. Central Intelligence Agency World Factbook, Haiti, URL:...
  • C. Cobelli et al.

    Parameter and structural identifiability concepts and ambiguitiesa critical review and analysis

    Am. J. Physiol. Regul. Integrative Comp. Physiol.

    (1980)
  • Codeco, C., 2001. Endemic and epidemic dynamics of cholera: the role of the aquatic reservoir. BMC Infect. Dis....
  • D.O.D. Cox et al.

    Ideals, Varieties, and AlgorithmsAn Introduction to Computational Algebraic Geometry and Commutative Algebra

    (1996)
  • K.A. Date et al.

    Considerations for oral cholera vaccine use during outbreak after earthquake in Haiti, 2010–2011

    Emerging Infect. Dis.

    (2011)
  • Eisenberg, M. , 2013. Generalizing the differential algebra approach to input-output equations in structural...
  • J.N. Eisenberg et al.

    Disease transmission models for public health decision making: analysis of epidemic and endemic conditions caused by waterborne pathogens

    Environ. Health Perspect.

    (2002)
  • M. Eisenberg et al.

    L-T4 bioequivalence and hormone replacement studies via feedback control simulations

    Thyroid

    (2006)
  • M. Eisenberg et al.

    Extensions, validation, and clinical applications of a feedback control system simulator of the hypothalamo-pituitary-thyroid axis

    Thyroid

    (2008)
  • C.P. Farrington et al.

    A statistical algorithm for the early detection of outbreaks of infectious disease

    J. R. Stat. Soc. Ser. A (Stat. Soc.)

    (1996)
  • Feachem, R., Bradley, D., Garelick, H., Mara, D., 1983. Vibrio cholerae and cholera. In: Sanitation and Disease—Health...
  • T. Ford

    Microbiological safety of drinking waterUnited States and global perspectives

    Environ. Health Perspect.

    (1999)
  • Cited by (143)

    View all citing articles on Scopus
    View full text