Social heterogeneity drives complex patterns of the COVID-19 pandemic: insights from a novel Stochastic Heterogeneous Epidemic Model (SHEM)

In today’s absence of a vaccine and impactful treatments, the most effective way to combat the virus is to find and implement mitigation strategies. An invaluable resource in this task is numerical modeling that can reveal key factors in COVID-19 pandemic development. On the other hand, it has become evident that regional infection curves of COVID-19 exhibit complex patterns which often differ from curves predicted by forecasting models. The wide variations in attack rate observed among different social strata suggest that this may be due to social heterogeneity not accounted for by regional models. We investigated this hypothesis by developing and using a new Stochastic Heterogeneous Epidemic Model (SHEM) that focuses on vulnerable subpopulations. We found that the isolation or embedding of vulnerable sub-clusters in a major population hub generated complex stochastic infection patterns which included multiple peaks and growth periods, an extended plateau, a prolonged tail, or a delayed second wave of infection. Embedded vulnerable groups became hotspots that drove infection despite efforts of the main population to socially distance, while isolated groups suffered delayed but intense infection. Amplification of infection by these hotspots facilitated transmission from one urban area to another, causing the epidemic to hopscotch in a stochastic manner to places it would not otherwise reach, resembling a microcosm of the situation worldwide as of September 2020. Our results suggest that social heterogeneity is a key factor in the formation of complex infection propagation patterns. Thus, the mitigation of vulnerable groups is essential to control the COVID-19 pandemic worldwide. The design of our new model allows it to be applied in future studies of real-world scenarios on any scale, limited only by memory and the ability to determine the underlying topology and parameters.


Introduction
fatal and has a high transmission rate (R0), almost twice that of the 2017-2018 common influenza [2,3]. The World Health Organization stated that this combination of high health risk and susceptibility is of great global public health concern, and efforts must be directed to prevent further infection while vaccines are still being developed [4]. As of September 2020, there are more than twenty five million confirmed COVID-19 cases and close to confirmed one million deaths. Older adults seem to be at higher risk for developing more serious complications from COVID-19 illness [5,6]. In today's absence of a vaccine and impactful treatments, the most effective way to combat the virus is to find and implement mitigation strategies. An invaluable resource in this difficult task is numerical modeling studies that can reveal key factors in pandemic development.
What models could be useful? Direct study of the available data of COVID-19 is complicated because many cases and deaths are underrepresented. However, a simple model that correctly captures large-scale behaviors, but gets some details wrong, is useful, whereas a complicated model that gets some details correct but mischaracterizes the large-scale behaviors is misleading [7]. Previously, during the H1N1 pandemic, generic (i.e. non-specific) stochastic influenza models were important to understand and quantify the full effects of the virus in simulations of important scenarios [8]. Open source stochastic models such as FluTE (2010) or GLEaM (2011) [9,10] were developed to simulate the spatial interaction and clusterization of millions of people to discover epidemic patterns.
for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07. 10.20150813 doi: medRxiv preprint 5 Now, with respect to COVID-19, the FluTE model has recently been used to offer interventions to mitigate early spread of SARS-CoV-2 in Singapore [11], and GLEaM was adopted by Chinazzi et al. [12] to model the international propagation of COVID-19 to gain insight into the effect of travel restrictions on virus spread. In principle, agent-based models, like the ones mentioned above, track the fate of every individual and ought to be the most precise. However, they require detailed statistical information about the social interactions and grouping of individuals which is well characterized for seasonal influenza, but has been completely disrupted by mitigations for COVID-19 and continues to be in flux.
Despite extensive efforts to understand and predict the COVID-19 spread, the key factors that determine the multimodal rise patterns, the asymmetry of the recovery phase, and the emergence of a distinct second wave remain unclear. Therefore, instead of another data-based forecasting model, we chose to develop a scenario model to study the consequences of a set of hypothesisdriven conditions in a network of populations. One underexplored but important factor of pandemic spread is social heterogeneity which is characterized by various subpopulation clusterization, societal interaction, and disease mitigation strategies. Our hypothesis is that complex infection curves that consist of multiple infection peaks and growth periods are the consequence of asynchronous propagation of infection among groups with widely varying degrees of intra-group interaction and isolation from main hubs (a metapopulation of infections).
To approach this problem, we developed a novel Stochastic Heterogeneous Epidemic Model (dubbed SHEM) which incorporates heterogeneous aspects of society. We also take into account for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07. 10.20150813 doi: medRxiv preprint 6 over-dispersed stochasticity (super-spreading) [13], which is usually not incorporated into compartmental models but can be critical in small or virgin populations. The model design was inspired by our stochastic models of local calcium release dynamics inside heart cells, driven by explosive calcium-induced-calcium-release [14,15]. We examine several key scenarios of heterogeneity where separate communities of various clusterization and transmission capabilities are linked to a large population hub. The basic reproduction number of infection (R0) of the bulk of our population was assigned to R0 = 2.5 which is within the range of SARS-CoV-2 basic reproduction number based on the early phase of COVID-19 outbreak in Italy [16]. Interplay of various degrees of heterogeneity and isolation periods in our model generated various dynamic patterns of infection, including a multi-modal growth periods, an extended plateau, prolonged tail, or a delayed second wave of infection. Most importantly, we found that vulnerable social subgroups play a key role in the propagation and unpredictability of the epidemic, and can defeat efforts at social distancing.

Simulations of infection in isolated clusters driven by an urban cluster
In the first set of simulations we examined the virus spread in simple hypothetical scenarios with equal numbers of individuals in urban and isolated populations (Fig 1A, insert). The large urban cluster was composed of 1 million individuals set to R0=2.5 (open level, but changing throughout the simulation). The isolated population consisted of 250 clusters, each with 4000 +/-500 people and with the same internal R0=2.5 that remained constant throughout all simulation stages. The urban cluster was weakly connected with 0.001% transient contact into the isolated clusters (alphainpop) while isolated clusters had 0.1% contact into the urban cluster (alphaoutpop), see for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

7
Methods for the definition of transient contact. This can be visualized as a collection of small suburban neighborhoods or nursing homes that are attempting to isolate themselves from the city. We investigated 4 scenarios, specified below. In each scenario except #1, the urban cluster closed to R0=1.25 at t=40 days (closed level, e.g. this was New York City under lockdown, based on 21% antibody positive tests at the peak [17]). 2) Premature, partial reopening to R0 = 1.9 at 100 days.
3) Moderate lockdown period with full reopening at 225 days to R0 = 2.5. 4) Long lockdown period with full reopening at 365 days to R0 = 2.5.
A general tendency throughout all 4 scenarios was that as the lockdown period increased, the magnitude of the infection decreased but its duration increased. At the same time the interplay of the urban cluster and the isolated clusters generated a variety of specific patterns in virus spread dynamics.
In the first "no mitigation" scenario ( Fig 1A) the isolated areas generated a strong second peak at the time when infection in the urban cluster had gone through its peak and was decaying. The second peak substantially extended the overall span of the pattern, nearly twice as much.
The infection rise in the "premature reopening" scenario ( Fig 1B) was multi-modal. The initial rise of infection substantially slowed in the main cluster during the closed period, but a contribution from the isolated areas becomes notable closer to the partial reopening at day 100 for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  (Fig 1D, inset). In the reopen period, the infection surged in both the isolated and urban clusters.
The peak of the isolated clusters happened later than the urban cluster, creating an apparent plateau in active infection cases from day 175 to 225.
The infection dynamics in the "moderate lockdown" scenario ( Fig 1C) was more complex.
During the closed stage (of urban center), while infection in the urban cluster declined, the delayed infection in isolated clusters continued to rise forming an additional peak in total infections (Fig 1E, inset). Yet another peak in total infections emerged in the reopen stage that is generated mainly by the urban cluster, but echoed by the isolated subpopulations.
The "late reopening" (Fig 1D), decreased infection during the first wave in both urban and isolated clusters, but resulted in a distinct delayed second wave of infection. This second wave pattern provides a delay during which the respective second wave of deaths could be intercepted and prevented with the timely development of a vaccine or effective treatment (for example a year from the time of infection onset in Fig 1E).
We also performed a control simulation to validate that heterogeneity of isolated clusters is indeed important for the infection pattern. In the most complex scenario of "moderate lockdown" shown in Figure 1C we substitute 250 clusters by one big cluster with the same population of one million people keeping all other parameters the same. The simulations showed a different pattern in which the second big cluster always generated a peak of substantially larger amplitude ( Fig   S1).
for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Simulations of integrated clusters driving infection in an urban cluster
By altering parameters in the same topology as Figure 1A, we found that the outlying clusters, if they are unable to socially distance, can become potential "hotspots" that can drive the infection in the urban population even against efforts of the latter to lock down. In this scenario the large urban cluster was composed of 1 million individuals with R0 = 1.25 throughout all simulation stages while the highly susceptible population consists of 250 clusters each with 1200 +/-500 people and internal R0 = 3.0 that are partially embedded in the urban cluster. This R0 value is based on data from four districts in Germany when essential manufacturing sectors were open -95%-prediction interval: 2.16 -3.73 [18]. The potential hotspot clusters were connected with We performed 10 runs of these simulations which demonstrated that the integrated clusters drove infection in the urban cluster as shown in a typical example in Fig 2A, B, leading the late appearance of the epidemic in places that had seen few cases in a microcosm of the pattern. The stochastic nature of infection in individual hotspots is shown in Figure 2C; many hotspots "explode" as early as between 50 and 100 days (Movie S1). Hotspots substantially increased the for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
In the second "chain" topology multiple small urban areas (population 100K each) are sequentially connected and 30 potential hotspots with R0=2.0 drive infection within in each urban cluster and facilitate propagation from cluster to cluster (Fig 3, Fig S2, and Movie S2 show the stochastic dynamics of individual hotspots). In this model, the first cluster began with R0 = 2.5, then locked down to 1.25 at day 40, while the unsuspecting clusters down the road kept R0 = 1.05 throughout, signifying low population density and efforts at social distancing, which were defeated by the hotspots picking up the small number of arriving infections and amplifying them.
It is important to note, that in this scenario all four urban areas were ultimately infected (see ten consecutive simulation runs superimposed in Fig 3B), except rare cases (about 1 of 20) when the last urban cluster did not spike due to stochasticity of infection propagation. However, in the case where the hotspots were closed, infection generally didn't reach the last urban cluster at all in the absence of amplification along the way.

Reopening urban cluster after hotspots drive first wave of infection
We extended the single urban cluster hotspot scenario to reopen when infection numbers substantially drop. Here, the main cluster was composed of 1 million individuals which starts off closed with R0=1.05 and reopens to R0=2.50 at day 360. The cluster was connected to 30 potential hotspots each with 1200 +/-500 people with R0=3.0 which remained constant throughout all simulation stages. The urban cluster was connected with 0.1% transient contact into the isolated clusters (alphainpop) while isolated clusters had 1% contact into the urban for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . cluster (alphaoutpop). The results show two distinct waves of infection (Fig 4). The hotspots drove the first wave of infection, whereas the second wave was almost entirely composed of infection from the urban area, demonstrating that the hotspots acquired immunity and did not participate at all in the second wave. The ending of the first wave, dominated by the vulnerable groups, created the illusion that the epidemic was nearly over, while a large fraction of the surrounding populations was in fact still susceptible when reopening occurred.

Interpretations and implications
As of August 2020, the infection curves of the COVID-19 pandemic in various locations have been very different from standard smooth bell curves. Here we tested the hypothesis that multiple, asynchronous waves and plateaus are in part due to stochasticity and heterogeneity, as well as due to changing efforts at mitigation. Geographic heterogeneity is included in forecasting models [12,19,20] which use extensive, public databases of population characteristics and travel patterns, but these do not fully account for the stratification of social behaviors that controls the spread of the virus. Therefore, instead of building another data-based forecasting and estimation model, we developed a numerical scenario model that we used to explore mechanisms of infection dynamics with regards to social stratification. The model was built as a network of "populations" which represent social and behavioral strata of geographic populations. Our model can be considered a metapopulation of SARS-CoV2, when a single species is spread among different environments that determine its local survival or extinction.
for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . 12 We examined several scenarios which included one or more large urban populations connected to vulnerable subgroups that are unable/unwilling to socially distance and thus represent potential COVID-19 hotspots. Depending on the degree of interaction, these subgroups were either driven by infection from the main population, or acted as major drivers of the epidemic.
Isolated subpopulations were infection-driven (e.g. nursing homes, prisons, remote suburbs, clustered religious groups) and had a substantially delayed contribution to total infection cases, ultimately forming an infection curve which could include multi-modal growth periods, an extended plateau, a prolonged tail, or a delayed second wave of infection (Fig 1). These communities, due to their isolated nature, had low herd immunity that put them at risk for explosive scenarios when basic mitigation strategies were not implemented. Alternatively, partially integrated subpopulations were driving infection (e.g. employees of factories, warehouses, meat packing plants, church groups, campuses, shelters, and other essential workers) in its connected urban population by picking up infection and amplifying it by (Fig 2, movie S1). We found that these "hotspots" ignite infection even in a locked down population, ultimately propagating and igniting other isolated populations (Fig 3, movie S2). The locked down population however does not acquire herd immunity, as opposed to the hotspots, and thus when lockdown is lifted, a second wave is generated by the main cluster (Fig 4).
There are several implications that arise from our results. We can expect social heterogeneity to form delayed local asynchronous epidemics, creating a variety of infection profiles in various regions over time, prolonging the pandemic time span, and spreading to new areas unpredictably due to the stochasticity of infection in small subgroups. Effective mitigation of the epidemic in the main population requires close attention to vulnerable subgroups in order to prevent the for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020.

Comparison with other studies
While our study is focused on vulnerable subpopulations in pandemic development, there are other important factors regarding social heterogeneity identified by previous studies.
The study by Dolbeault et al. [21], using their multi-group SEIR model, underlined the importance of mitigation measures on single individuals with a high level of social interactions.
Indeed, their study showed that even a small group of individuals with high transmission rate can trigger an outbreak even if the R0 of the majority is below 1. Althouse et al. [13] identified and explored in depth another important factor, explosive super-spreading events originating from long-term care facilities, prisons, meat-packing plants, fish factories, cruise ships, family gatherings, parties and night clubs. This study further demonstrated the urgent need for targeted interventions as routes of effective virus transmission. Taking into account the importance of these super-spreading events and individuals, they were included in the design of our model (see Methods, Super-spreading) to generate more realistic outcomes of scenarios.
With regard to agent-based models, Chinazzi et al. [12] used GLEaM to demonstrate that travel restrictions introduced in Wuhan in January 2020 only delayed epidemic progression by 3 to 5 days within China, and international travel restrictions only helped slow infectious spread until for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 14 mid-February. Our simulations of COVID-19 spread also show that ultimately, when enough time goes by, isolation does not prevent infection of vulnerable subpopulations (Fig 1). Chinazzi et. al. suggests that early detection, hand washing, self-isolation, and household quarantine are more effective than travel restrictions at mitigating this COVID-19 pandemic. Our recommendations are in accord, and we advocate for communities to take extra care of vulnerable subpopulations internally, as so to prevent a possible hotspot formation that may evolve into a regional epidemic.

Model features, limitations, and future studies
An epidemic can be likened to a forest fire, which spreads by diffusion along a front, but can also jump by embers that may or may not start a new blaze. Such spread to virgin areas, with a virus as with a fire, is intrinsically stochastic and such stochasticity, which is not explicitly included in mean-field models, may contribute to the remarkable patchiness of the COVID-19 epidemic.
This has caused the epidemic to appear entirely different to observers in different locations, leading to politicization of the response, which is, itself, a form of social heterogeneity. For rare spread to small, isolated subgroups (embers) this stochasticity is crucial. Patchiness is aggravated by the over-dispersion (super-spreading) of secondary cases of COVID-19, where the majority of infected individuals do not spread the virus, but some can cause up to a hundred secondary infections [13]. Our model is explicitly stochastic, with a mechanism to account for overdispersion, by keeping a partial history of individual infections. Furthermore, the design of our new model allows it to be applied in future studies of real-world scenarios on any scale, limited only by memory and the ability to determine the underlying topology and parameters.
for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . However in our model, we make no attempt to distinguish between symptomatic and asymptomatic cases, despite recent findings by Chao et al. [22] in their agent-based model (dubbed Corvid) that demonstrated that most infections actually originate from pre-symptomatic people. Since the relative infectivity of symptomatic and non-symptomatic is uncertain, there is no direct way to accurately determine the number of asymptomatic infections at present. Such a distinction (included in a number of other models) could easily be added by subdividing the 5 compartments, at the risk of added complexity and more parameters needed in a scenario.
We did not take into account recent suggestions that infectivity is concentrated in a short time window just before and after symptom onset. Instead, we used the standard SEIRD assumption that infections are generated throughout the period of infection, using a mean clinical duration of 7 days. The model does not consider the physical mechanisms of transmission of COVID-19, or the possibility that many recovered patients do not quickly re-enter their normal social circles, delaying herd immunity. An additional compartment, with a pipeline mechanism, could also be added to account for this.

Model purpose
In view of the constantly changing behavioral environment for COVID-19 in the United States and worldwide, data-based predictive modeling of the future of the epidemic is difficult. Our model is specifically intended to examine the effect of heterogeneity, including not only geographic but also social heterogeneity, i.e. the existence of groups within one geographic for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 16 location that have different social interaction patterns and may be partially isolated from neighboring groups, e.g. nursing homes, prisons, campuses. Alternatively, subgroups can be partially embedded in the main population, e.g. meat processing plants or warehouse employees who are unable to socially distance at work, but spend part of their day in the main community where they can acquire and amplify infection. The model is fully stochastic and, unlike most compartmental models, incorporates the effect of over dispersion of secondary infections (super spreading).

Structure of the Model
The general model consists of a number of subpopulations ("villages") whose number is limited only by memory. The simulation is based on a generalization of the SEIRD representation. The This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . generation of exposure by these "visitors" at home and abroad is scaled so that each infected individual, generates (in an otherwise susceptible population) his destined number of secondary cases (see below under super-spreading).
This arrangement allows for the possibility that "visitors" from different villages could crossinfect while visiting a common hub (picture UPS and FEDEX drivers) even if there is no direct link between them. To represent this process, "virtual links" are generated between pairs of physical links that meet in a hub (in graph-theory terms these are links of the adjoint graph of the network). Infection by this indirect process is second order in the alpha's so it makes very little contribution in the case of highly isolated sub-populations (e.g. nursing homes, prisons) but could be important for embedded sub-populations with high contact with the hub.

Simulation Method
The entire collection of populations is simulated as a single, continuous-time Markov chain This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

12) Infected dies inside village (self-link only)
13) Recovered moves from source to target 14) Recovered moves from target to source At each step, rates of these 14 possible events (computed as in classic SEIRD differential equations, except as below under Super-spreading) are summed over all links in the network to give a total transition rate Rtot. A uniformly distributed random number rn is generated and -log(rn)/Rtot is taken to be the waiting time until the next event, considered as a Poisson point process. Time t is advanced by that amount and then a particular transition is selected by a second random number rn2 by partitioning Rtot into segments proportional to their relative rates and finding which segment contains rn2* Rtot. The action associated with the prescribed event (e.g. increment/decrement numbers in states SEIRD of the villages connected by the link) is taken, and then the above is repeated for the next step. This is continued until t reaches tmax specified in the input parameter file or tswitch, the time set for a discrete change in parameters.
Output is produced whenever t crosses tout, which is then incremented to t plus a user-specified interval (usually one day). This time-binning is only for output and plays no role in the continuous-time evolution of the epidemic.
for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Super-spreading
It is known that the distribution of secondary COVID-19 infections generated by a single, infected individual is over-dispersed (i.e. has a long tail compared to the Poisson distribution of infections expected if transmission were random). Although the average R0 is estimated to be 2.5-4 in the absence of social distancing mitigations, contact tracing has shown that single individuals have infected up to a hundred others. This is known as super-spreading events, and can occur by several possible mechanisms, involving either a predilection of an individual (e.g. a celebrity who travels widely and contacts many other people) or a situation in which individuals were placed in unusually close contact (e.g. a church choir in an indoor location). On the other hand, the majority of infected individuals do not appear to spread the infection to anyone. It has been shown [13] that this over-dispersed distribution can be approximated by a negative binomial distribution, with mean R0 (by definition) and dispersion parameter r<<1, for example 3 and 0.16. By iterating this distribution for several generations of viral spread, it is found that the eventual distribution of epidemic size is predicted to be quite different than found for a hypothetical stochastic transmission by Poison-distributed secondary infections with the same R0. A recent model of contact tracing assumed, based on data from the Netherlands, that the distribution of number of personal contacts outside the family is distributed as a negative binomial and used this to generate random changes to infection levels at 1-day intervals [23].
Unfortunately, viral generations do not remain synchronous in time, so it is not straightforward to incorporate super-spreading in a time-dependent epidemic evolution model except by for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 21 following the interactions and infections of each individual in the population, as done for example in the FLuTE simulation for influenza [8]. This is very compute-intensive, but a more significant objection from our point of view is that it depends on knowing (statistically) the social interaction groups and travel behavior of the population at a fine-grained scale, and these have been severely disrupted by mitigation efforts during the current pandemic. Rather than speculate on these variables, we have developed a modified Markov scheme that tries to reproduce the observed distribution of secondary infections by replacing R0 in the event-rate calculations by an infectivity that is itself stochastic.
Supposing that an individual produces, a posteriori, K secondary infections over the lifetime tdur of his infection, the required rate is K/tdur . However, tdur is itself stochastic: in our model, as in simple SIR models, recovery is treated as a Poisson point process with a rate 1/trec per infected individual where trec is the observed mean duration of infections (it is not presently known whether that is true of asymptomatic infections). When there are multiple infected individuals present, if recovery events in the population are random with a rate ki/trec, where ki is the number of infected individuals in the village, a super-spreader is likely to be "recovered" before (or after) doing his "job". To avoid this, we have adopted the following scheme: • In each village j, at each event, an infectivity inf(j) is maintained that takes the place of ki*R0 in the SEIRD rate equations.
• Whenever a new infection is created (by conversion of an exposed individual), a random number K is drawn from a negative binomial distribution of mean R0 and dispersion reff , for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 22 the latter to be determined. Inf is incremented by K and the individual infectivity K is placed on the top of a linked list.
• Whenever a random recovery event is generated at the above-mentioned rate, the oldest individual infectivity is removed from the bottom of the list and subtracted from inf.
The overall rate of infection events, based on inf, will be a superimposition of the rates of the individual infection events generated by each infected individual. The mean number of secondary infections actually realized by one infected individual over the life of his infection will be K*tdur/trec. Since infections recover in the order in which they were created, if there are n infections active, tdur will be the n th waiting time of the Poisson point process of rate n/trec (where n=ki(j)), whose probability distribution is proportional to poisson(n-1,n/trec). The individual infections generated by individual K are a Poisson point process, so the probability that the individual actually generates j secondary infections is poisson(j, K*tdur/trec). Integrating this over the distribution of tdur, multiplying by the probability mass distribution of the negative binomial distribution and summing over K we find: as the distribution of the actual, realized number of secondary infections. This is a long-tailed probability distribution that can be fit, by an appropriate choice of reff, to approximate the empirical negative binomial distribution with r=0.16 over the relevant range. With more than a few active infections present, the distribution converges to: for use under a CC0 license. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 23 We choose reff to give the best least-squares fit on a linear scale of the case n=1, which is the most important stochastic case since it governs the chance that a single infected individual can start an outbreak, giving the chance that an infected individual causes no secondary infections,  This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . https://doi.org/10.1101/2020.07.10.20150813 doi: medRxiv preprint 24 transient visitors in the alpha process is re-scaled to the local value of R0. In the current version of the program, SPREADR is a single variable governing all events, but it could easily be made specific to individual links to distinguish groups that are vulnerable due to high density in their home village (e.g. factory or warehouse) versus groups that are intrinsically super-spreaders due to their individual behavior.

Software Considerations
The model software is written in Fortran 77/95. The main simulation engine, described above, is in the form of a single Fortran module SIMULATOR. It is intended to be driven by a front-end program that sets up the distribution of the populations, topology of the network, and parameter changes at discrete time points, and connects to SIMULATOR by calling subroutine EPISIM (28 variables). Ideally, the front end should use some kind of scripting language based on network concepts. For purpose of these demonstrations, we hand-coded a front end -epichainF, describing a chain of urban clusters (or a single cluster) connected by bidirectional travel, each linked to a large set of small subpopulations whose characteristics differ from the urban cluster.
The single Markov-chain structure of the model is intrinsically serial, and is implemented in a single processor thread. For configuration with a large number of links sharing a common hub, the large number virtual links can make this slow. The main inner loop will be parallelized in future versions.
for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.     This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020.  This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020.  This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. . This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Figures S1 -S3 Legends for Movies S1 and S2 Other supplementary materials for this manuscript include the following: Movies S1 and S2 for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  In the most complex scenario of "moderate lockdown" (Fig. 1C in main text) we substituted 250 clusters by one big cluster with the same population of one million people keeping all other parameters the same. The big isolated cluster generated substantial and sharp infection peak (panels A and B) that is absent or very small in case of 250 isolated clusters (panels C and D). Each panel shows 10 simulation runs (overlapped multi-color curves). The lockdown period from day 40 to day 225 is shown by green shade. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this this version posted September 18, 2020. for use under a CC0 license.
This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.