ABSTRACT
Random walkers on a two-dimensional square lattice are used to explore the spatio-temporal growth of an epidemic. We have found that a simple random-walk system generates nontrivial dynamics compared with traditional well-mixed models. Phase diagrams characterizing the long-term behaviors of the epidemics are calculated numerically. The phase boundary separating those sets of parameters leading to outbreaks dying out and those leading to indefinite growth is mapped out in detail. The functional dependence of the basic reproductive number R0 on the model’s defining parameters reveals the role of spatial fluctuations and leads to a novel expression for R0. Special attention is given to simulations of inter-regional transmission of the contagion. The attack rate and the (growing) radius of gyration of the affected zones are used as measures of the severity of the outbreaks, in cases where R0 is not sufficiently prescriptive to chart the epidemic dynamics.
Introduction
Classic epidemiological models1,2 often assume panmictic populations: every infected person has an equal chance to affect any other person in the population. While these models have been successful in describing an ideal dynamics of epidemic spread, they do not account for inhomogeneity: an infected person has higher probability to transmit the disease to a member of their household3,4 or to a person in their locale. The overall reduction of long-distance travel due to the COVID-19 pandemic makes the latter cause of inhomogeneity especially relevant: many infected persons (but not all) spread the infection only in their immediate vicinity. Likewise, the local neighborhoods of both susceptible and infected persons are dynamic – they vary in space and time depending on the local state of the epidemic. These spatio-temporal inhomogeneities are studied in this paper.
There are two approaches to geographical inhomogeneity: detailed analyses based on contact-tracing and mobility data,5,6 or analyses based on stylized models.4,7,8 The first approach, while potentially highly accurate, requires a large number of parameters and is not robust with respect to unforeseen changes in mobility patterns. The second approach, in contrast, requires a small number of well-defined parameters and provides a sound intuition about their effects on the outcome. It is especially useful when planning interventions and policy changes, or investigating various “what if” scenarios. It is the latter approach that we have taken here.
In the present paper, we treat a very simple model: a random walker (modeled on an infected person) takes τ steps (a positive integer) on a 2D square lattice. After each step, the visited site can become infected with probability p. In this case, at the site of the new infection, it branches off another random walker, also with lifetime τ, that can also further spread the contagion to further sites, and so on. We study the resulting epidemic in space and time.
This model depends on two parameters: τ, representing the length of time of the infectious period, and p, representing the infectiousness. It also depends on the “walking pattern”, which may include steps to the neighboring sites or jumps to more distant locations.
Panmictic models predict that the spread of epidemics depends only on the basic reproduction number R0, the number of infected persons produced by one walker in a fully susceptible population, which, in turn, can be estimated in the panmictic approximation as We will refer to this estimate as the naïve approximation.
However, the random-walk model exhibits a surprisingly rich behavior, well beyond the “naïve approximation” above. We will show that the spread of the epidemics depends on the interplay between p and τ. Moreover, the detailed geometry of the epidemics depends on the stochasticity of individual walks and the nature of random walks on the plane. In subsequent sections, we describe our model in full, study its behavior using numerical simulations and theory, and draw conclusions for real-world epidemics.
Model
We model the geographic distribution of the population by an infinite 2D square lattice. At any given time, each site can be in one of three states. Although the model proposed here is fundamentally different than an SIR model, the same nomenclature is adopted; namely, each site in the lattice is either susceptible, infected, or removed. (Figure 1(b)). All sites are initialized as susceptible, with the exception of the origin which contains a single infectious agent. An infected site generates a brand new random walker, which starts to move on the lattice (Figure 1(a)). If the walker lands on a susceptible site (sites are represented as square cells in the figure), it becomes infected with probability p. After τ steps an infectious agent becomes removed. Thus, there are two fundamental parameters that control the outbreak: p, the probability of infection, and τ, the number of steps for which an infective agent (walker) is active before being removed.
Another feature of the model is the way the infectious agent chooses its next move. The model is quasilocal: most of the time the walker moves to one of the eight adjacent sites, but sometimes it makes a longer jump. Namely, if the walker is at r0 = (x0, y0), its next location, r, is given by with u and θ drawn from uniform distributions: Here ⌊X⌋ is the integer part of X (the largest integer number not exceeding X), and U (a, b) is the uniform distribution in the interval [a, b].1 This function produces a one-step walk to one of the eight adjacent sites approximately 66.8% of the time; 95.7% of jumps are within a distance of 2 sites. The random-variate generator for the jump distances r in eq. (2) is derived from a probability density function of the form ϕ (r) = A/r4, where the constant A is of order the lattice constant cubed a3 (in this paper, a = 1 is chosen throughout). The exponent of 4 corresponds to interactions of the form ∼ 1/rd+σ when d = 2, σ = 2, a case of special interest in a number of early studies,9,10 because it sits at the margin between long-range and short-range interactions. However, in the context of epidemic models, as noted by Bunde et al. 11, Grassberger 7, and Hallatschek and Fisher 8, it presents no special difficulties.
Results
Initial stages of the epidemics
In the initial stage of the outbreak, the infected person (the index case) is completely surrounded by the susceptible population. The key parameter describing this state is R0: the number of people infected by the index case. The value of R0 depends on p and τ in some form, but the naïve approximation, R0 ≈ pτ, must fail, as can be seen whenever the path of even a single walker self intersects; i.e., the approximation does not account for the recurrence of a two-dimensional random walker.
We ran 100000 realizations of the simulation, collecting how many agents were infected by the index case. Table 1 shows the R0 calculated for p and τ tabulated on log-log axes,2 an idea that will be explored further in the discussion of the phase diagram. The breakdown of the naïve approximation is evident in the lack of constancy of R0 along the diagonals of the table, which is particularly apparent when pτ is large.
The theoretical result (see Methods) is where K and τ0 are constants that depend on how the walker’s next jump is selected. For the jump generator in equation (2) we have As shown in Figure 2, this formula is in excellent agreement with the numerical experiments without any adjustable parameters.
Outbreak progression and phase diagrams
The progression of the outbreak beyond the initial stage is illustrated by Figure 3. Depending on the parameters, the outbreak may die out or spread indefinitely, and it is a natural starting point to describe this dependence by a phase diagram.
Phase diagrams are used in physics and related fields as a visualization of how pressure and temperature (or other thermodynamic variables) affect the bulk thermodynamic state of a substance. Here, phase diagrams are used in an analogous manner to provide insight into how p and τ affect the long-term behavior of an epidemic (i.e., how likely it is to die out after a certain amount of time has passed). In contrast to traditional phase diagrams, it is challenging to define discrete phases for the behavior of an epidemic, so the measures used in this section of our study are continuous variables with finite gradients between phases.
Each point in the phase diagram describes the expected behavior of an epidemic with the given p and τ inputs. The data from running 100 realizations of each of these outbreaks were used to produce a histogram of the growth rate (defined as the number of new infections) at the final time step, as shown in Figure 7 in the Appendix. The final time step is chosen by the following function: This sliding criterion adjusts for outbreaks with longer τ, i.e., when infective agents have longer lifespans. Notably, the histogram exhibits a large spike at zero new infections, which represents the fact that a significant fraction of outbreaks will die out quickly, since, due to the intrinsic stochasticity of our model, not every infection reaches epidemic-level proportions. From this histogram, a cutoff of 10 new infections at the final time step is used as a proxy for determining whether an outbreak has died out. The cutoff includes the spike at zero new infections (realizations which have definitively died out) as well as runs where the number of new infections is very small and the end of the epidemic is imminent. Using this criterion, we define the phase of an infection by this single measure: Conveniently, the phase variable is already normalized on the scale 0 to 1, and represents what fraction of epidemics, with a given p and τ, are expected to die out. The plot of this phase variable is shown in Figure 4. The determination of the phase boundary allows us to quickly predict the behavior of the system in the regions above or below the boundary, even in the absence of extensive simulations.
For each horizontal cross-section of the phase diagram (corresponding to a 1D phase diagram for a fixed value of τ), we have fitted the phase variable as a function of p to a smooth interpolating function in order to find the location of the phase boundary, defined as the point with p = 0.5. The resulting phase boundary is well approximated by a hyperbola in the p, τ plane. In particular, a least-squares fit gives the curve .
It is customary in epidemiology2 to predict the progression of epidemics based on the value of R0. Accordingly, on Figure 4 “iso-R0” lines were overlaid on the phase diagram: each point on an iso-R0 line has p and τ values which produce outbreaks with a common R0. As expected, the phase boundary indeed coincides with an iso-R0 line, having R0 ≈ 1.39 (and not R0 = 1, because of the role of fluctuations inherent in a particle-based, spatial model). As shown in the Methods section, this finding corresponds to a robust theoretical prediction.
Outbreaks across regional boundaries
In order to complement the mapping out of the phase diagram, the random-walk outbreak model was also used to simulate real-world infectious dynamics. In this section, the effects of having multiple regions with separate p values (representing adjacent states or provinces with different distancing or shelter-in-place policies) is explored. The spatial aspect of this scenario is simplified to two infinite half-planes with distinct p values.
Most notably, the random-walk model corroborates the hypothesis that it is possible for an infection to grow very slowly (constantly on the brink of completely dying out) until it eventually reaches a region with a higher value of p, upon which it exhibits rapid growth. This is demonstrated in Figure 5. Moreover, there is also a significant backflow of infectors returning from the high-p region to the original low-p region.
Attack rate and radius of gyration of the affected zone
In this section, we explore two alternative metrics to characterize the incidence and growth of an epidemic. First, the attack rate is generally defined as the proportion of exposed susceptibles that have been infected. Given the set of infected sites, I, the set of removed sites, R, and the set of visited sites (which includes infected and removed ones), V, we may define the attack rate in the context of our lattice-based model. Using |X| to represent the cardinality of set |X|, attack rate is calculated as In our spatial model, we can also study the growth of the affected area. To this end, a simple and intuitive metric is the radius of gyration ℛg of the set of visited sites. If ri is the position of the i-th visited site, then The attack rate and radius of gyration for selected p and τ values along two cross-sections of the phase diagram near the phase boundary were plotted and are shown in Figure 6. These two transects of the phase diagram were chosen to capture the effects of crossing the boundary from different positions. For each transect, 10 different p, τ pairs are sampled and then the attack rate and radius of gyration are calculated for each pair. These computations are averaged from 1000 realizations of the model for 800 time steps. For the fixed-τ interval, τ = 50 and p ranges from 0.01 to 0.10 (in increments of 0.01). For the fixed-p interval, p = 0.4 and τ ranges from 1 to 10 (in increments of 1).
Figures 6(b) and 6(d) show how the attack rate increases with p and τ. Notice that for very small values of τ, below the phase boundary, the attack rate can decrease with time. This is because we normalize by the number of visited sites, which is always one at start but increases with time. The attack rate of eq. (8), hence, is initially equal to p but can decrease for dying epidemics as more sites are visited but not infected.
The primary result from figure 6(c), which is shown in log-log scale, is the linear growth of the radius of gyration with time for developing epidemics. The curves for τ = 1, τ = 2, and τ = 3 are below the phase boundary, and, hence, they plateau, indicating that the infection has died out and the radius is no longer increasing. Figure 6(e) shows largely the same results as figure 6(c). However, this plot has a more punctuated change after t = τ = 50, when several infections begin to plateau and die out. This behavior is explained by the low p values, which cause most outbreaks to stop growing after the lifetime of the initial walker expires at t = 50.
This linear growth of ℛg translates into a quadratic, rather than exponential, growth for the cumulative incidence of the epidemic, as observed in recent works.12–14
Discussion
We see that a simple lattice model provides a number of interesting insights about epidemic spread. One of the most useful features of idealized models like this one is that they provide an understanding of which factors are salient for the outcome, and which are not. This understanding can be used in the construction of more detailed and realistic models, in the same way that a pencil sketch is useful when starting a full-size oil painting.
One of the most important questions when studying epidemics is whether the value of R0 is sufficient to describe the epidemics behavior, or whether a detailed analysis of the contact network is necessary. Our model shows that the answer depends on what aspect of the epidemic is interrogated. The spatial features of the outbreak are not determined by R0 alone, but depend on the the individual infectivity and the number of individual contacts separately. In contrast, the phase boundary between infinite spread and localization of the epidemic is mainly determined by R0, though, as a consequence of having a single index case, not by the canonical value R0 = 1.
A significant idealization in our model is the motion of the infective agents: the random walkers can go anywhere on the lattice and do not have a memory of their origins. In real life, people tend to have permanent residences and return there daily. These periodic movements would work to make the outbreaks even more localized than those in our model. Introducing more realistic walking patterns is, therefore, a compelling direction for future work. Another interesting possibility would be to use the end state of one simulation, with a lattice including both removed and still susceptible sites, as the initial condition to simulate the effect of subsequent epidemic waves. Such a numerical experiment has a direct bearing on the question of thresholds to herd immunity in a fully spatial model, a topic that has received scant attention.
Methods
Effect of fluctuations on phase diagram
The phase boundary introduced above separates the cases where an outbreak dies out from those where it spreads indefinitely. Let P0 be the probability that the outbreak dies out given one index case. We assume that the number of people infected by the index case has a Poisson distribution with mean R0. The meaning of R0 is thus the same as in classical epidemiology: the number of people infected by the index case in a naïve population.2 The probability that the number of secondary infections is zero, and that the outbreak dies out immediately, is exp(-R0). The probability that the index case infects exactly n persons is If the index case infected n persons, the epidemic dies out only if each secondary outbreak caused by each of these n persons dies out. The probability of such an event P1 is, strictly speaking, different from P0 since the secondary infections operate in a different environment than the index case. However, we neglect this and stipulate that P1 ≈ P0. The probability that n secondary outbreaks die out is therefore . Then we can write down At 0 ≤ R0 ≤ 1 the only solution for this equation is P = 1. This means that at R0 ≤ 1 the epidemics is always localized. At R0 > 1 equation (10) gets a solution between 0 and 1. If we draw the phase boundary at the point where exactly 50% of outbreaks survive, then the critical value of R0 is R0 = 1.39. This agrees extremely well with Figure 4.
If we have N index cases, the probability that the epidemic is localized is, to first approximation, . At large N this probability quickly approaches zero, for R0 > 1. It is important to note that the phase boundary, unlike the other features of the epidemic, depends only on R0.
Asymptotics for R0
In this section, we derive a theoretical estimate for R0: the number of infections produced by one random walker in a susceptible population. We use ideas similar to those in15–17.
Let us consider a walker that starts at the point x. Let C(τ, m, x, w, y) be the probability for this walker to end up at the point y after making τ steps, while visiting the point w exactly m times. The probability that the walker does not infect the site w during this trip is (1 − p)m, since the probability to infect at each visit is p. Summing over w, y and m, we get the total number of infected sites as Where This function is (up to a normalization) the generating function introduced in Rudnick and Gaspari [17, Section 3.2]. To calculate it, we introduce a transformation similar to the transition from the canonical ensemble to the grand canonical ensemble in statistical physics. Namely, let us take a complex variable z and write down the series: The inverse transformation is where the complex integral is taken around the point 𝓏 = 0. As we shall see below, the value of this integral is defined by the singularity of H at 𝓏 → 1.
Let P(τ, x, y) be the probability for a random walker starting at the point x to end at the point y. We introduce two functions A remarkable theorem [17, Section 3.3] states that function H can be expressed through the functions G and G1: Integration of G over y and w is easy due to the normalization. The only nontrivial contribution is from the function G1 in the denominator of this expression. In two dimensions, this function can be expressed as [17, Section 3.5] where χ(q) is the structure factor, i.e. the Fourier transform of the jump probability function. More precisely, let p(x) be the probability for the walker to go from the point 0 to the point y in one step. We then define the structure factor χ(q) as At small q, we can expand the structure factor as which gives near the singularity z → 1 Here q0 ≈ 1 is the upper limit in the integral (17) (assuming the lattice constant to be 1). Using these asymptotics, and integrating equations (11), (14), and (16), we get equation (4) with Note that we use a different normalization from the one used in the Rudnick and Gaspari17: they count the number of random walks, whereas we count the probability. This means that the singularities of the functions H, G and G1 are at 𝓏 = 1 instead of 𝓏 = 𝓏c, and that χ(0) = 1.
The coefficients K and τ0 depend only on the structure factor χ. To calculate these two, let us generate a large number N of jumps according to equations (2) and (3). Let us take q along the x axis. If jump number i lands at the point ri = (xi, yi), then it contributes the term to χ. Due to symmetry, the contribution of the second term is zero, and we arrive at In our experiments, we generate N = 106 points, which gives c ≈ 0.458, and the resulting values of K and τ0 in equation (5).
Data availability
Code and data for figures and tables can be found at https://github.com/czbiohub/random-walk-epidemic-model.
Data Availability
Code and data for figures and tables can be found in the linked GitHub repository.
Author contributions statement
G.H. and A.M. designed research; A.C. performed research with guidance from G.H., A.M. and D.Y.; A.M. wrote the simulation code; B.V. carried out analytical computations; all authors analyzed data and contributed to the writing of the paper.
Additional information
Competing interests
The authors declare no competing interests.
Acknowledgments
A.C., G.H., A.M., and D.Y. are supported by the Chan Zuckerberg Biohub; B.V. is supported by the Chan Zuckerberg Initiative.
Appendix
Figure 7 shows an example of the histograms used to draw our phase diagram.