Abstract
Building realistically complex models of infectious disease transmission that are relevant for informing public health is conceptually challenging and requires knowledge of coding architecture that can implement key modeling conventions. For example, many of the models built to understand COVID-19 dynamics have included stochasticity, transmission dynamics that change throughout the epidemic due to changes in host behavior or public health interventions, and spatial structures that account for important spatio-temporal heterogeneities. Here we introduce an R package, SPARSEMODr, that allows users to simulate disease models that are stochastic and spatially explicit, including a model for COVID-19 that was useful in the early phases of the epidemic. SPARSEMOD stands for SPAtial Resolution-SEnsitive Models of Outbreak Dynamics, and our goal is to demonstrate particular conventions for rapidly simulating the dynamics of more complex, spatial models of infectious disease. In this report, we outline the features and workflows of our software package that allow for user-customized simulations. We believe the example models provided in our package will be useful in educational settings, as the coding conventions are adaptable, and will help new modelers to better understand important assumptions that were built into sophisticated COVID-19 models.
1 Introduction
The emergence of the SARS-CoV-2 pandemic has reinforced the strong role that mathematical models of disease spread play in understanding pathogen transmission and in designing effective public health interventions (Metcalf et al., 2015; Ferguson et al., 2020; Tian et al., 2020; Saad-Roy et al., 2020; Shea et al., 2020). Models of infectious disease transmission vary widely in their structural form and complexity (Keeling & Rohani, 2008; Adiga et al., 2020). Classical models take the form of ordinary differential equations describing “compartments” of the host and/or pathogen population (e.g., susceptible versus infectious hosts) (Anderson & May, 1979), but even these models can become quite complex, containing numerous equations that might, for instance, account for heterogeneities in the host population or for the progression of pathogen-induced disease through various host classes (e.g., hospitalized individuals). Early in the COVID-19 pandemic, many compartment-style models were developed to address hypotheses of how rapidly SARS-CoV-2 was spreading and what impacts non-pharmaceutical interventions might have (Pan et al., 2020; Anderson et al., 2020), if and when the virus might become endemic (Lavine et al., 2021), and even whether regional climate patterns were likely to influence transmission patterns (Baker et al., 2020). Agent-based models, which track the (probabilistic) fate of individuals, were also developed to make short-term forecasts and long-term projections of COVID-19 for public health preparedness (Ferguson et al., 2020; Hinch et al., 2021; Kerr et al., 2021; Cramer et al., 2022). Many models nested compartment-style or agent-based models within a spatial framework to incorporate the impacts of spatial contagion and human mobility on COVID-19 dynamics (Yamana et al., 2020; Pei et al., 2020; Arino, 2022; Zhang et al., 2022; Wardle et al., 2022). These examples highlight the types of dynamics and assumptions modelers might include when constructing mathematical and computational models of infectious disease transmission, and these choices have important implications for the resulting dynamics.
During the COVID-19 pandemic, some key dynamics have been built into many models that influence the insights that emerge from modeling studies, including short-term forecasts of disease or longer-term projections of system behavior. Arguably some of the most important of these model dynamics and assumptions have been: stochastic transmission processes, such as super-spreading events or probabilistic system behavior; time-varying transmission dynamics due to public health interventions and changes to human behavior (e.g., stay-at-home orders or mask-wearing); and, spatial heterogeneity in transmission, driven in part by human mobility. We will briefly highlight examples of how these dynamics have been important for modeling the spread of SARS-CoV-2, especially in the early phases of the pandemic.
Adding stochastic dynamics to models, such as demographic stochasticity and environmental stochasticity, allows us to capture key processes that are especially impactful early in epidemic behavior and in small populations (Lambert et al., 2018). Demographic stochasticity refers to the probabilistic events that befall host individuals, whereas environmental stochasticity refers to random changes to the values of model parameters (e.g., transmission rate) that are due to sources not explicitly included in the model (i.e., environmental sources). Environmental stochasticity is sometimes used to account for heterogeneity in transmission rate among host individuals, such as the impact of super-spreaders (Lloyd-Smith et al., 2005), whereas demographic stochasticity accounts for dynamics such as the probabilistic burn-out of a pathogen. For COVID-19, models that included demographic stochasticity where particularly important for explaining disease patterns in regions with small populations (Engbert et al., 2021). Moreover, models that included both demographic and environmental stochasticity helped to disentangle the effects of both sources of stochasticity on observed COVID-19 case counts during different phases of the outbreak (Hwang et al., 2022). Importantly, accounting for stochastic model behavior also facilitates a more rigorous quantification of uncertainty in model forecasts (Cramer et al., 2022). There are many methods to implement stochastic dynamics within models (Allen, 2017), for instance using Gillespie-style algorithms to iterate the differential equations that describe compartment models (Gillespie, 2001; Ganyani et al., 2021), or through agent-based modeling, which inherently deals with probabilistic events that befall individual hosts.
Almost all of the models of COVID-19 referenced thus far have needed to deal with the fact that SARS-CoV-2 transmission dynamics and the dynamics of COVID-19 hospitalization changed dramatically throughout the pandemic, due for instance to non-pharmaceutical public health interventions meant to slow transmission, or due to changes in hospital practice to treat and triage COVID-19 patients. Many COVID-19 models estimated time-varying transmission rates from case counts or hospitalization data, or models inferred changing transmission rates using estimates of the pathogen’s instantaneous reproductive number (Cramer et al., 2022; Gostic et al., 2020; Pei et al., 2020). Therefore models need to be flexible enough to allow the user to specify time-varying parameter values. For compartment-style differential equation models, allowing time-varying parameters computationally requires that the models are iterated forward, re-defining parameter values at appropriate time-points.
Modeling spatially explicit disease dynamics is important for testing hypotheses about the roles of human mobility on spatial contagion, the effectiveness of interventions that impact movement, and understanding the sources of transmission within and among regions. Spatial models also allow for more refined forecasts within specific locations. Indeed, spatial models are often necessary to explain large-scale patterns of disease transmission that cannot be captured by simpler, non-spatial models (Eggo et al., 2021). There are several approaches towards spatially explicit modeling (Riley, 2007), including agent-based network models that situate individual hosts in specific spatial positions (Ferguson et al., 2020; Kerr et al., 2021; Hinch et al., 2021), or partial differential equations that assume hosts can move continuously across spatial dimensions. One of the most popular approaches, however, is meta-population disease modeling. In meta-population models of disease, distinct host populations are explicitly situated geographically such that movement of host individuals between the populations affects local and regional transmission dynamics (Rohani et al., 1999; Lachiany & Stone, 2012; Ferrari et al., 2010). Important basic theory of spatial epidemiology has emerged from the analysis of meta-population models (Arino, 2009; Wang & Li, 2014), which has in turn popularized the use of these spatial models for understanding and forecasting real public health threats (Kraemer et al., 2019). Many early analyses of COVID-19 used meta-population models to account for spatial heterogeneities in transmission and disease outcomes (Pei et al., 2020; Yamana et al., 2020; Gatto et al., 2020; Yang et al., 2021; Zebrowski et al., 2021; Arino, 2022). The most complex meta-population models of COVID-19 parameterize human movement based on an emerging suite of mobility data, often derived from mobile devices (Hou et al., 2021; Wardle et al., 2022; Hu et al., 2021).
Here, we introduce an R package, SPARSEMODr, that includes two illustrative examples of spatially-explicit and stochastic meta-population disease models, including one of COVID-19. Some agent-based simulation models of COVID-19 have been introduced that develop software for users to experiment with the models (Kerr et al., 2021; Hinch et al., 2021), and several published models provide openly available code for users to download (Cramer et al., 2022; Ferguson et al., 2020). Our package focuses on meta-population models in which users can simulate spatial and temporal heterogeneities in transmission rates, effects of host movement, and other user-defined dynamics that influence local and regional patterns of disease. The COVID-19 model that we provide was used by our group and others (Gel et al., 2020) to characterize spatial and temporal variation in transmission patterns and disease outcomes (e.g., hospitalizations and ICU admissions) in Arizona, USA. While the model is no longer complex enough to capture many of the current critical aspects of COVID-19 transmission and disease (e.g., the model does not include age-structure, vaccination, nor the circulation of multiple variants), the models in SPARSEMODr should nonetheless be useful for educational purposes. Importantly, the package demonstrates some specific conventions of coding stochastic meta-population models that could easily be carried over to different host-pathogen systems in teaching, research, or applied contexts.
2 Available models
Currently we offer two models in the SPARSEMODr package: one is a more classic Susceptible-Exposed-Infectious-Removed (SEIR) model, and the other is an SEIR-style model that more specifically describes the transmission of SARS-CoV-2 and COVID-19 progression from exposure through hospitalization through mortality. Both models are compartmental disease models that are simulated within a meta-population context, which we describe below. We supply detailed vignettes that describe different use-cases of the SPARSEMODr package, which can be found on CRAN: https://cran.r-project.org/web/packages/SPARSEMODr/index.html and the package GitHub repository.
The SEIR model is described by the following set of ordinary differential equations:
Here, the state variables represent the numbers of hosts in each class (e.g., S is the number of susceptible hosts). Following some classic conventions, the model assumes that all host classes can reproduce at rate µ, which is equal to the natural death rate. This has the effect of maintaining total host population size static through time. Moreover, all offspring enter the susceptible class, such that N = S + E + I + R. The pathogen incubates within exposed individuals for an average latency time of 1/δ, and infectious individuals recover at rate γ. This model structure leads to exponentially distributed sojourn times. As we will describe below, we assume that susceptible and infectious hosts can commute away from their focal population at emigration rate, m, but hosts return after one day. This commuter-style movement allows susceptible individuals to become exposed by infectious individuals in other populations and infectious individuals to spread the pathogen to other populations. An example simulation of the model, in which we impose sinusoidal forcing of the transmission rate, is shown in Figure 1.
The COVID-19 model is described by the following equations (Fig. 2; see also Gel et al. (2020)):
The term λt represents a component of the force of infection, given by:
Definitions of state variables and model parameters are shown in Tables 1 and 2. Note that N represents the total number of hosts in the population. We assume that individuals in the S, Ia, Ip, and Is compartments can move between populations at emigration rate, m. We focus on these host classes as they can influence local and regional transmission dynamics, whereas, for example, commuter-style movement of exposed individuals would have no effect.
In the package manual, we provide default parameter values, but all of the COVID-19 model parameters can be user-specified. The package also conducts parameter validation steps via warnings and errors to ensure specified parameters are within feasible limits. Figure 3 shows a simulated example using the COVID-19 model.
3 Key features of SPARSEMODr disease models
We implement a common coding architecture for the two models provided in SPARSEMODr to highlight particular modeling conventions that are useful to consider in theoretical and applied modeling studies. Here we describe these key features of the spatially-explicit and stochastic disease models. The models in SPARSEMODr are coded in C++ and use the Rcpp package to conveniently wrap the functions into the R computing environment (Eddelbuettel, 2013), integrating the speed of C++ and the user-friendliness of R. The output of each model is a data frame that includes the value of each state variable and the number of new “events” (e.g., new exposures, new hospitalizations) per time step per population per stochastic realization of the model.
3.1 Stochastic dynamics
Demographic stochasticity
The models include demographic stochasticity, which implements the effects of probabilistic events that befall individuals in a population and that can affect epidemic trajectories. We use an event-driven approach in which the differential equations are iterated forward in daily time steps using a Gillespie-style algorithm known as the tau-leaping algorithm (Gillespie, 2001). The tau-leaping algorithm is more flexible and more computationally efficient compared to several other simulation and numerical integration techniques (Ganyani et al., 2021). And, because this form of demographic stochasticity requires us to use integers for the host classes (i.e., numbers of hosts in each class), it allows users to track the number of new events occurring per time step.
Stochastic transmission process
We also implement daily stochastic variation in the transmission rate, a form of environmental stochasticity. We assume that as the number of infectious individuals increases, the variation in transmission rate decreases, emphasizing that stochasticity has larger effects in smaller populations (Keeling & Rohani, 2008). Our specific form of stochasticity implies that there is heterogeneity in the transmission rate among individuals in the host population, such that our methods can account for super-spreader events, for instance, which have disproportionate effects early in the epidemic. The functional form of stochasticity is:
For every time step t we draw a random variate, χ, from a normal distribution with a mean of zero and a standard deviation of σ, such that χ N(0, σ), and σ is a user-controlled model input, stoch_sd. We calculate the total number of infectious individuals in the population, which we abbreviate ∑ I, because the summation will depend on the model and which state variables represent infectious individuals (e.g., there are several infectious states in the COVID-19 model). The vertical bars show that is the absolute value of the right-hand expression.
3.2 Spatial, meta-population dynamics
To iterate the SPARSEMODr models in a spatial context, we embed the equations into sub-populations that make up a meta-population, in which migration connects sub-populations. Migration between populations in the meta-population thus affects local and regional transmission dynamics. Susceptible individuals in a focal population can become exposed to the pathogen by infectious “visitors” from other sub-populations or by infectious visitors from outside of the meta-population (“immigrants”). Similarly, susceptible individuals can visit a different sub-population and become exposed by the resident infectious individuals. We model movement using a commuter-style approach, in which hosts can move to an alternative sub-population during a time step, but then they return to their home sub-population before the start of the next time step. After we move individuals to new populations, transmission between susceptible residents and infectious visitors is determined, and the visitors then leave the population. In other words, in a single day (i.e., time-step), transmission first occurs within a population and then additional transmission can occur after hosts commute.
The user controls the per-capita emigration rate, m, of susceptible and infectious hosts. Our models assume that the probability of moving to any specific population is dictated by a simple, distance-based dispersal kernel:
Here pi, j is the probability of moving from population j to population i, and di, j is the euclidean distance between the two populations. Larger values of the ϕ parameter (controlled as user input dist_phi) makes it more likely for hosts to travel farther distances. To determine the population to which migrants will move during any given time step, we draw from a multinomial probability distribution using the pi, j values.
The models also allow the effects of immigrants, who do not usually reside in the meta-population, to commute to the meta-population each day (e.g., out-of-state tourists). The user can define the model input imm_frac, which is the proportion of the focal population that may constitute visitors on any given day. For example, if for a given focal population the population size is 1000 hosts, and imm_frac = 0.05, an average of 50 immigrants may arrive on a given day. The exact number of these immigrants arriving on a given day is determined by drawing from a Poisson distribution. Then, the number of infectious visitors from this pool of immigrants is assumed to be proportional to the number of infectious residents in the focal population. In other words, we assume that the pathogen is present in “non-resident” populations at similar prevalence as the focal population.
3.3 Time-varying parameters
The SPARSEMODr models allow users to specify certain time-varying parameters, which take on unique values per day. Both models allow the transmission rate, β(t), to vary over time, and we allow the user to control how movement dynamics change over time with time-varying values of m, ϕ, and the fraction of the population that constitutes outside immigrants, imm_frac. In the COVID-19 model, we also allow time-varying fraction hospitalized, ρ2, transition rate of hospitalized individuals, δ4, fraction advancing to the ICU, ρ4, and fraction of ICU patients that die of disease, ρ5. We chose to allow these parameters to vary over time to emphasize that these rates, in reality, very likely changed throughout the COVID-19 pandemic, for instance, due to changes in hospital policies and improvements to patient care. The time-varying parameters can be specified in one of two ways, which we demonstrate in the Work flow section below.
3.4 Frequency- and density-dependent transmission
In the SPARSEMODr models, the transmission process can be described as frequency-dependent, where contact rates are invariable to population density, or density-dependent, where contact rates depend on population density (Hopkins et al., 2020). For frequency-dependent transmission, we divide the user-specified value of transmission rate by the total number of hosts within a given sub-population. For example, in the SEIR model, we have the following expression describing mass-action transmission per sub-population, , where βi,t is the user-specified transmission rate for sub-population i at time step t, and Ni is the total host population size for sub-population i. Therefore, with frequency-dependent transmission, the effect of a single infectious host on the risk of infection is modulated by the fraction of the host population that is still susceptible, irrespective of the host population density (i.e., number of hosts per unit area) within that sub-population.
Alternatively, for density-dependent transmission, the user can specify a non-linear Monod equation that describes the relationship between host population density and the transmission rate. The Monod equation is:
Here, βi,t is the transmission rate supplied by the user, but importantly this functionally becomes the maximum possible transmission rate for sub-population i at time step t. Thus, βDD,i,t is the density-dependent transmission rate for sub-population i at time step t. Ni/Ai is the density of the focal host population (i.e., number of hosts, Ni per unit area, Ai). To use this form of transmission, the area per population, Ai, must therefore be specified by the user as input vector, census_area. Finally, K is a constant that controls the effect of density on the transmission rate. More specifically, K is the half-saturation constant at which point βDD,i,t/βi,t = 0.5. In general, when K = 0, there is no effect of density on transmission rate, but larger values of K mean that low host densities more strongly reduce the transmission rate (i.e., transmission rate more strongly depends on host population density). The value of K is user-controlled by model input, dd_trans_monod_k.
4 Work flow
4.1 Process overview
SPARSEMODr runs spatial, stochastic models across a user-specified grid of populations. In Figures 1 and 3, we simulated populations that are scattered across a spatial lattice; one could imagine these are local communities situated within counties, labelled as “Regions”. In these particular simulations, we imposed a time-varying transmission rate, βt, but we assumed that each local population experiences the same pattern of time-varying transmission. The models track the spread of the disease in each sub-population, such that we can computationally aggregate patterns from local populations to higher spatial scales (e.g., “Regions”) to look at emergent patterns. Below we describe the necessary setup and declarations to run these simulations for the models in SPARSEMODr, focusing on the more complex COVID-19 model.
4.2 Declaring initial conditions and constant parameters
Users must specify the initial conditions of the state variables. To initiate the COVID-19 model, at least one of the following vectors must be supplied with a value greater than one for at least one sub-population:
input_E_pops, input_I_asym_pops, input_I_presym_pops, input_I_sym_pops, input_I_home_pops, input_I_hosp_pops, input_I_icu1_pops, or input_I_icu2_pops. If any initial values of state variables are not supplied, they are assumed to be zero. Moreover, either a vector of input_N_pops or a vector of input_S_pops must be supplied. As an example, we will specify:
The user must also declare some inputs that are shared between the SPARSEMODr models: input_dist_mat is a matrix that specifies the distance between populations; input_realz_seeds is a vector of integer values to seed the random realizations of the model; stoch_sd is the standard deviation of the stochastic transmission rate, σ, as described above; trans_type specifies the type of transmission type (frequency- or density-dependent); input census_area is the spatial area of each population, but this is only required when transmission is declared as density-dependent; dd_trans_monod_k the parameter controlling density’s effect on transmission, again only required for density-dependent transmission; and input_tw a time window object, which we will describe in the next section.
The next step is to populate the model-specific “control” object, which is a special R class defined in our package. Each model has its own control class that includes the model-specific parameters and initial conditions of the state variables.
As a simple example with the COVID-19 model, given the initial conditions above, here we will use default parameter values for all but two parameters:
my_covid19_control <- SPARSEMODr::covid19_control( input_S_pops = S_pops, # susceptible population counts input_E_pops = E_pops, # exposed population counts asym_rate = 0.4, # fraction of exposed that become asymptomatic recov_icu1 = 0.125) # average ICU recovery rate, i.e., 8 days (1/8)Also, because input_N_pops was not provided, the package internally assumes input_N_pops = input_S_pops + input_E_pops. covid19_control() returns a named list of vectors that must be supplied when running the model.
4.3 Declaring time-varying parameters with the time-windows object
A time windows object is required to specify the time-varying parameters (or whether these parameters are assumed constant). There are two ways to specify time windows: (1) specifying values for each time step in the simulation (i.e., “daily” method), or (2) the starting and ending dates of user-defined time windows. For the “daily” method, the parameter vectors must be of size equal to the number of days in the simulation. For the start date/end date method, values for each parameter are assigned at the beginning and end of the time window. Then, the back-end code calculates a linear interpolation to assign daily values of the parameter; in other words, the parameter values change linearly from the starting value to the ending value over the number of days within the time window.
In the following example, we specify the start and end dates of our time windows for time-varying transmission rate, to match the pattern shown in Fig. 3(b). Note that we format dates using the lubridate package (Grolemund & Wickham, 2011), although other methods are acceptable as long as they are objects of class Date in R.
# Function to specify all of the required parameters in a single time-window # In this example, only r0 is variable between the time periods one.window <- function( start_dates, #start of time window (date class) end_dates, #end of time window (date class) beta, #value of time-varying beta dist_phi=150, #controls shape of dispersal kernel m=0.1, #per-capita movement rate, i.e., move every 10 days on avg. imm_frac=0){ #zero outside immigration data.frame(beta, start_dates, end_dates, dist_phi, m, imm_frac) } library(lubridate) # for mdy() time.window.args <- rbind(# Specify the components of 5 time windows one.window(mdy(“1-1-20”), mdy(“1-31-20”), beta=0.40), one.window(mdy(“2-1-20”), mdy(“2-15-20”), beta=0.10), one.window(mdy(“2-16-20”), mdy(“3-10-20”), beta=0.10), one.window(mdy(“3-11-20”), mdy(“3-21-20”), beta=0.15), one.window(mdy(“3-22-20”), mdy(“5-1-20”), beta=0.15) ) # Populate the required object of class time_windows my_tw <- do.call(SPARSEMODr::time_windows, time.window.args)The package documentation and the vignettes give additional examples of how to flexibly structure these time window objects. Notably, the vignettes demonstrate how to specify time-varying βt values for each sub-population individually, to explore effects of spatially heterogeneous transmission rates within the meta-population. In this latter case, users would specify a list of beta vectors.
4.4 Running stochastic realizations in parallel
For efficiency and to reduce run time, we suggest simulating stochastic realizations of the SPARSEMODr models in parallel. We have therefore created a function, model_parallel(), that leverages the future package (Bengtsson, 2020) and compiles the output of the individual model runs into a user-friendly data frame. As an example:
# Specify the number of realizations to run: n_realz <- 75 # Specify unique seeds to run the realizations. ## Note, realizations with the same seeds will produce ## equivalent output my_realz_seeds <- 1:n_realz # Run the model in parallel and store the output: model_output <- SPARSEMODr::model_parallel( input_dist_mat = dist_mat, input_census_area = census_area, input_tw = my_tw, input_realz_seeds = my_realz_seeds, control = my_covid19_control, …universal_model_params…)Here “… universal_model_params … “ represents the universal model parameters, such as trans_type or stoch_sd, as described above. Note that dist_mat is a pairwise distance matrix, and census_area is a vector of population areas (e.g., in km2). The package documentation and vignettes show examples of creating these two inputs. Since model_parallel() produces a data frame, the output can easily be subset and summarized for plotting.
5 Conclusion
The SPARSEMODr package illustrates coding paradigms to specify and simulate complex disease models that are both stochastic and spatially explicit. In pedagogical contexts, SPARSEMODr models can be used to demonstrate the effects of time-varying transmission rates (e.g., as affected by disease interventions), demographic and environmental stochasticity, frequency-versus density-dependent transmission, and spatial heterogeneities. For instance, simulating the SPARSEMODr SEIR model can reinforce foundational theoretical concepts in mathematical epidemiology, such as spreading waves of disease or spatial (a)synchrony of epidemics. The SPARSEMODr COVID-19 model can be used to help students understand the key features of models built early in the pandemic to address public health issues, for example, highlighting the effects of public health interventions, spatiotemporal effects of altering human movement, or the changing landscape of hospitalization dynamics before vaccines were available. Because we develop a standard coding architecture for the two models in SPARSEMODr, we can envision multiple future package developments. For instance, our team or outside contributors can add models that follow the same coding standards, such as vector-borne disease models, or COVID-19 models that include additional layers of realism, such as age-structure, vaccination, and the dynamics of co-circulating variants. For package contributors, a new model would have to follow the current conventions, which can be copied from our GitHub repository (https://github.com/NAU-CCL/SPARSEMODr). Our team would approve all outside contributed model structures by forking and merge requests. We also plan to modularize the SPARSEMODr coding conventions using APIs so that they can be auto-populated for new models, reducing the need for copy-and-pasting code. We hope that the community of disease modelers and educators will help us to refine the package and to make it more accessible to broader audiences.
Data Availability
No data used
6 Conflict of Interest Statement
The authors have no conflicts of interest.
7 Author Contributions
JRM, CH, and ED designed the project. JRM, TDH, SB, SR, KEB, and JEE contributed code and methods. JRM, TDH, and SB designed the R package structure. JRM, SB, and SR wrote the first draft of this manuscript, and all authors contributed to manuscript revisions.
8 Acknowledgements
This material is based upon work supported by the National Science Foundation under Grant No. 2028629.
Footnotes
This is a heavily revised manuscript following the first round of peer review. This corresponds with the release of version 1.2.0 of SPARSEMODr, which encapsulates technical changes requested by reviewers.