The Effect of Large-Scale Anti-Contagion Policies on the Coronavirus (COVID-19) Pandemic

Solomon Hsiang; Daniel Allen; Sébastien Annan-Phan; Kendon Bell; Ian Bolliger; Trinetta Chong; Hannah Druckenmiller; Luna Yue Huang; Andrew Hultgren; Emma Krasovich; Peiley Lau; Jaecheol Lee; Esther Rolf; Jeanette Tseng; Tiffany Wu

doi:10.1101/2020.03.22.20040642

Abstract

Governments around the world are responding to the novel coronavirus (COVID-19) pandemic¹ with unprecedented policies designed to slow the growth rate of infections. Many actions, such as closing schools and restricting populations to their homes, impose large and visible costs on society. In contrast, the benefits of these policies, in the form of infections that did not occur, cannot be directly observed and are currently understood through process-based simulations.^2–4 Here, we compile new data on 1,659 local, regional, and national anti-contagion policies recently deployed in the ongoing pandemic across localities in China, South Korea, Iran, Italy, France, and the United States (US). We then apply reduced-form econometric methods, commonly used to measure the effect of policies on economic growth, to empirically evaluate the effect that these anti-contagion policies have had on the growth rate of infections. In the absence of any policy actions, we estimate that early infections of COVID-19 exhibit exponential growth rates of roughly 42% per day. We find that anti-contagion policies collectively have had significant effects slowing this growth. Our results suggest that similar policies may have different impacts on different populations, but we obtain consistent evidence that the policy packages now deployed are achieving large, beneficial, and measurable health outcomes. We estimate that, to date, current policies have already prevented or delayed on the order of 62 million infections across these six countries. These findings may help inform whether or when these ongoing policies should be lifted or intensified, and they can support decision-making in the other 180+ countries where COVID-19 has been reported.⁵

Introduction

The 2019 novel coronavirus¹ (COVID-19) pandemic is forcing societies around the world to make consequential policy decisions with limited information. After containment of the initial outbreak failed, attention turned to implementing large-scale social policies designed to slow contagion of the virus,⁶ with the ultimate goal of slowing the rate at which life-threatening cases emerge so as to not exceed the capacity of existing medical systems. In general, these policies aim to decrease opportunities for virus transmission by reducing contact among individuals within or between populations, such as by closing schools, limiting gatherings, and restricting mobility. Such actions are not expected to halt contagion completely, but instead are meant to slow the spread of COVID-19 to a manageable rate. These large-scale policies are informed by epidemiological simulations^{2, 4, 7–17} and a small number of natural experiments in past epidemics.¹⁸ However, the actual effects of these policies on infection rates in the ongoing pandemic are unknown. Because the modern world has never experienced a pandemic from this pathogen, nor deployed anti-contagion policies of such scale and scope, it is crucial that direct measurements of policy impacts be used alongside numerical simulations in current decision-making.

Populations across the globe are currently weighing whether, or when, the health benefits of anti-contagion policies are worth the costs they impose on society. Many of these costs are plainly seen; for example, restrictions imposed on businesses are increasing unemployment,¹⁹ travel bans are bankrupting airlines,²⁰ and school closures may have enduring impacts on affected students.²¹ It is therefore not surprising that some populations hesitate before implementing such dramatic policies, particularly when these costs are visible while their health benefits – infections and deaths that would have occurred but instead were avoided or delayed – are unseen. Our objective is to measure the direct health benefits of these policies; specifically, how much these policies slowed the growth rate of infections. We treat recently implemented policies as hundreds of different natural experiments proceeding in parallel. Our hope is to learn from the recent experience of six countries where early spread of the virus triggered large-scale policy actions, in part so that societies and decision-makers in the remaining 180+ countries can access this information.

Here we directly estimate the effects of 1,659 local, regional, and national policies on the growth rate of infections across localities within China, France, Iran, Italy, South Korea, and the US (see Figure 1 and Supplementary Table 1). We compile publicly available subnational data on daily infection rates, changes in case definitions, and the timing of policy deployments, including (1) travel restrictions, (2) social distancing through cancellations of events and suspensions of educational/commercial/religious activities, (3) quarantines and lockdowns, and (4) additional policies such as emergency declarations and expansions of paid sick leave, from the earliest available dates to April 6, 2020 (see complete descriptions in the Supplementary Information, also Extended Data Fig. 1). During this period, populations in these countries remained almost entirely susceptible to COVID-19, causing the natural spread of infections to exhibit almost perfect exponential growth.^{7, 14, 22} The rate of this exponential growth may change daily and is determined by epidemiological factors, such as disease infectivity and contact networks, as well as policies that induce behavior changes.^{7, 8, 22} We cannot experimentally manipulate policies ourselves, but because they are being deployed while the epidemic unfolds, we can estimate their effects empirically. We examine how the daily growth rate of infections in each locality changes in response to the collection of ongoing policies applied to that locality on that day.

Figure 1: Data on COVID-19 infections and large-scale anti-contagion policies.

The left-hand-side plots show daily cumulative confirmed cases of COVID-19 (solid black line) and deaths (dashed black line) over time (left axis for both). Deployment of anti-contagion policies are indicated by vertical lines, with height corresponding to the number of administrative units that instituted the policy on a given day (right axis). For display purposes only, not all policies are shown for each country (≤ 5 types highlighted per country) and cases are imputed for days with missing data. The right-hand-side maps show the number of cumulative confirmed cases by administrative unit on the last date of each sample (circle area = number of cases).

We employ well-established “reduced-form” econometric techniques^{23, 24} commonly used to measure the effects of policies^{25, 26} or other events (e.g., wars²⁷ or environmental changes²⁸) on economic growth rates. Similarly to early COVID-19 infections, economic output generally increases exponentially with a variable rate that can be affected by policies and other conditions. Unlike process-based epidemiological models,^{7–9, 12, 22, 29, 30} the reduced-form statistical approach to inference that we apply does not require explicit prior information about fundamental epidemiological parameters or mechanisms, many of which remain uncertain in the current pandemic. Rather, the collective influence of these factors is empirically recovered from the data without modeling their individual effects explicitly (see Methods). Prior work on influenza,³¹ for example, has shown that such statistical approaches can provide important complementary information to process-based models.

To construct the dependent variable, we transform location-specific, subnational time-series data on infections into first-differences of their natural logarithm, which is the per-day growth rate of infections (see Methods). We use data from first- or second-level administrative units and data on active or cumulative cases, depending on availability (see Supplementary Information). We then employ widely-used panel regression models^{23, 24} to estimate how the daily growth rate of infections changes over time within a location when different combinations of large-scale social policies are enacted (see Methods). Our econometric approach accounts for differences in the baseline growth rate of infections across sub-national locations, which may be affected by time-invariant characteristics, such as demographics, socio-economic status, culture, and health systems; it accounts for systematic patterns in growth rates within countries unrelated to policy, such as the effect of the work-week; it is robust to systematic under-surveillance specific to each sub-national unit; and it accounts for changes in procedures to diagnose positive cases (see Methods and Supplementary Information). The reduced-form statistical techniques we use are designed to measure the total magnitude of the effect of changes in policy, without attempting to explain the origin of baseline growth rates or the specific epidemiological mechanisms linking policy changes to infection growth rates (see Methods). Thus, this approach does not provide the important mechanistic insights generated by process-based models; however, it does effectively quantify the key policy-relevant relationships of interest using recent real-world data, when fundamental epidemiological parameters are still uncertain.

Results

We estimate that in the absence of policy, early infection rates of COVID-19 grow 42% per day on average (Standard Error [SE] = 7%), implying a doubling time of approximately 2 days. Country-specific estimates range from 24% per day in China (SE = 9%) to 69% per day in Iran (SE = 5%). Growth rates in South Korea, Italy, France, and the US are very near the 42% average value (Figure 2a). These estimated values differ from observed growth rates because the latter are confounded by the effects of policy. These growth rates are not driven by the expansion of testing or increasing rates of case detection (see Methods and Extended Data Fig. 2) and are not dependent on data from any particular region of any country (Extended Data Fig. 3).

Figure 2: Empirical estimates of unmitigated COVID-19 infection growth rates and the effect of anti-contagion policies.

Markers are country-specific estimates, whiskers are 95% CI. Columns report effect sizes as a change in the continuous-time growth rate (95% CI in parentheses) and the day-over-day percentage growth rate. (a) Estimates of daily COVID-19 infection growth rate in the absence of policy (red dashed line = average, excluding Wuhan-specific estimate). (b) Estimated combined effect of all policies on infection growth rates. (c) Estimated effects of individual policies or policy groups on the daily growth rate of infections, jointly estimated.

Some prior analyses of pre-intervention infections in Wuhan suggest slower growth rates (doubling every 5–7 days)^{32, 33} using data collected before national standards for diagnosis and case definitions were first issued by the Chinese government on January 15, 2020.³⁴ However, case data in Wuhan from before this date contains multiple irregularities:³⁴ the cumulative case count decreased on January 9; no new cases were reported between January 9 and January 15; and there were concerns over whether information about the outbreak was actively suppressed³⁵ (see Supplementary Table 2). When we remove these problematic data, utilizing a shorter but more reliable pre-intervention time series from Wuhan (January 16–21, 2020), we recover a growth rate of 43% per day (SE = 3%, doubling every 2 days) consistent with results from all other countries (Figure 2a and Supplementary Table 3), except Iran.

During the early stages of an epidemic, a large proportion of the population remains susceptible to the virus, and if the spread of the virus is left uninhibited by policy or behavioral change, exponential growth will continue until the fraction of the susceptible population declines meaningfully.^{7, 29} This decline results from members of the population leaving the transmission cycle, due to either recovery or death.²⁹ After correcting for estimated rates of case-detection,³⁶ we compute that the minimum susceptible population in any of the administrative units in our sample is approximately 78.0% of the total population (Cremona, Italy: roughly 79,000 total infections in a population of 360,000) and 86% of administrative units across all six countries would likely be in a regime of uninhibited exponential growth (susceptible fraction of population > 95%) if policies were removed on the last date of our sample.

Consistent with predictions from epidemiological models,^{2, 18, 37} we find that the combined effect of policies within each country reduces the growth rate of infections by a substantial and statistically significant amount (Figure 2b, Supplementary Table 3). For example, a locality in France with a baseline growth rate of 0.34 (national average) that fully deployed all policy actions used in France would be expected to lower its daily growth rate by −0.28 to a growth rate of 0.06. In general, the estimated total effects of policy packages are large enough that they can in principle offset a large fraction of, or even eliminate, the baseline growth rate of infections—although in several countries, many localities are not currently deploying the full set of policies used in that country. Overall, the estimated effects of all policies combined are generally insensitive to dropping regional (i.e. state- or province-level) blocks of data from the sample (Extended Data Fig. 3).

In China, only two policies were enacted across 116 cities early in a seven week period, providing us with sufficient data to empirically estimate how the effects of these policies evolve over time without making any assumptions about the timing of these effects (see Methods and Fig. 2b). We estimate that the combined effect of these policies significantly reduced the growth rate of infections by −0.14 (SE = 0.031) in the first week immediately following their deployment (also see Extended Data Fig. 5a), with effects doubling in the second week to −0.30 (SE = 0.040), and stabilizing in the third week at −0.34 (SE = 0.036). In other countries, we lack sufficient data to estimate these temporal dynamics explicitly and only report the average pooled effect of policies across all days following their deployment (see Methods). If other countries were to exhibit transient responses similar to that observed in China, we would expect effects in the first week following deployment to be smaller in magnitude than the average effect for all post-deployment weeks. Below, we explore how our estimates would change if we impose the assumption that policies cannot affect infection growth rates until after a fixed number of days, but we do not find evidence this improves model fit (Extended Data Fig. 5b).

The estimates described above (Figure 2b) capture the superposition of all policies deployed in each country, i.e., they represent the average effect of policies on infection growth rates that we would expect to observe if all policies enacted anywhere in each country were implemented simultaneously in a region of that country. We also estimate the effects of individual policies or clusters of policies that are grouped either based on their similarity in goal (e.g., library closures and museum closures are grouped) or timing (e.g., policies that are generally deployed simultaneously in a certain country). In many cases, our estimates for these effects are statistically noisier than the estimates for all policies combined because we are estimating multiple effects simultaneously. Thus, we are less confident in the individual estimates and in their relative rankings. Estimated effects differ between countries, and policies are neither identical nor perfectly comparable in their implementation across countries or, in many cases, across different localities within the same country. Nonetheless, despite a higher level of variability in these values, 28 out of 34 point estimates indicate that individual policies are likely contributing to reducing the growth rate of infections (Figure 2c). Six policies (one in South Korea, two in Italy, and three in the US) have point estimates that are positive, five of which are small in magnitude (< 0.1) and not statistically different from zero (5% level). Consistent with greater overall uncertainty in these dis-aggregated estimates, some in China, France, Italy, and South Korea are somewhat more sensitive to dropping regional blocks of data (Extended Data Fig. 4). The estimated effects of individual policies are broadly robust to assuming a constant delayed effect of all policies (Extended Data Fig. 5c).

We combine the estimates above with our data on the timing of the 1,659 policy deployments to estimate the total effect of all policies across the dates in our sample. To do this, we use our estimates to predict the growth rate of infections in each locality on each day, given the policies in effect at that location on that date (Figure 3, blue markers). We then use the same model to predict what counterfactual growth rates would be on that date if all policies were removed (Figure 3, red markers), which we refer to as a “no-policy scenario.” The difference between these two predictions is our estimated effect that all anti-contagion policies actually deployed had on the growth rate of infections. We estimate that since the beginning of our sample, on average, all anti-contagion policies combined have slowed the average daily growth rate of infections by −0.156 per day (±0.015, p < 0.001) in China, −0.248 (±0.089, p < 0.001) in South Korea, −0.241 (±0.068, p < 0.001) in Italy, −0.362 (±0.069, p < 0.001) in Iran, −0.139 (±0.038, p < 0.001) in France and −0.092 (±0.033, p < 0.05) in the US. Taken together, these results suggest that anti-contagion policies currently deployed in all six countries are achieving their intended objective of slowing the pandemic, broadly confirming epidemiological simulations. These results are robust to modeling the effects of policies without grouping them (Extended Data Fig. 6a and Supplementary Table 4) or assuming a delayed effect of policy on infection growth rates (Supplementary Table 5). At a particular moment in time, the total number of COVID-19 infections depends on the growth rate of infections on all prior days. Thus, persistent decreases in growth rates have a compounding effect on total infections, at least until a shrinking susceptible population slows growth through a different mechanism. To provide a sense of scale and context for our main results in Figs. 2 and 3, we integrate the growth rate of infections in each locality from Figure 3 to estimate the total number of infections to date, both with actual anti-contagion policies and in the no-policy counterfactual scenario. To account for the declining size of the susceptible population in each administrative unit, we couple our econometric estimates of the effects of policies with a simple Susceptible-Infected-Removed (SIR) model of infectious disease dynamics^{7, 22} that adjusts the susceptible population based on previously estimated case-detection rates³⁶ (see Methods). This allows us to extend our projections beyond the initial exponential growth phase of infections, a threshold which our results suggest would currently be exceeded in several countries in the no-policy scenario.

Figure 3: Estimated infection growth rates based on actual anti-contagion policies and in a no-policy counterfac-tual scenario.

Predicted daily growth rates of active (China and South Korea) or cumulative (all others) COVID-19 infections based on the observed timing of all policy deployments within each country (blue) and in a scenario where no policies were deployed (red). The difference between these two predictions is our estimated effect of actual anti-contagion policies on the growth rate of infections. Small markers are daily estimates for subnational administrative units (vertical lines are 95% CI). Large markers are national averages. Black circles are observed daily changes in log(infections), averaged across administrative units.

Our results suggest that ongoing anti-contagion policies have already substantially reduced the number of COVID-19 infections observed in the world today (Figure 4). Our central estimates suggest that there would be roughly 27 million more cumulative confirmed cases in China, 20 million more in South Korea, 2.7 million more in Italy, 5.4 million more in Iran, 530,000 more in France, and 5.1 million more in the US had these countries never enacted any anti-contagion policies since the start of the pandemic. The relative magnitudes of these impacts partially reflects the timing, intensity, and extent of policy deployment (e.g., how many localities deployed policies), and the duration for which they have been applied. Several of these estimates are subject to large statistical uncertainties (see intervals in Figure 4). Sensitivity tests that assume a range of plausible alternative parameter values and disease dynamics, such as incorporating a Susceptible-Exposed-Infected-Removed (SEIR) model, suggest that policies may have reduced the number of infections by a total of 57–65 million confirmed cases over the dates in our sample (central estimates).

Figure 4: Estimated cumulative confirmed COVID-19 infections with and without anti-contagion policies.

The predicted cumulative number of confirmed COVID-19 infections based on actual policy deployments (blue) and in the no-policy counterfactual scenario (red). Shaded areas show uncertainty based on 1,000 simulations where empirically estimated parameters are resampled from their joint distribution (dark = inner 70% of predictions; light = inner 95%). Black dotted line is observed cumulative infections. Infections are not projected for administrative units that never report infections in the sample, but which might have experienced infections in a no-policy scenario.

Discussion

Overall, our results indicate that large-scale anti-contagion policies are achieving their intended objective of slowing the growth rate of COVID-19 infections. Because infection rates in the countries we study would have initially followed rapid exponential growth had no policies been applied, our results suggest that these ongoing policies are currently providing large health benefits. For example, we estimate that there would be roughly 339× the current number of confirmed infections in China, 22× in Italy, and 15× in the US by the end of our sample if large-scale anti-contagion policies had not already been deployed. Consistent with process-based simulations of COVID-19 infections,^{2, 4, 10–12, 14, 17, 29} our empirical analysis of existing policies indicates that seemingly small delays in policy deployment likely produce dramatically different health outcomes.

While the limited amount of currently available data poses challenges to our analysis, our aim is to use what data exist to estimate the first-order impacts of unprecedented policy actions in an ongoing global crisis. As more data become available, empirical research findings will become more precise and may capture more complex interactions. For example, this analysis does not account for potentially important interactions between populations in nearby localities,^{7, 38} nor the structure of mobility networks.^{3, 4, 10, 12, 17, 39} Nonetheless, we hope the results we are able to obtain at this early stage of the pandemic can support critical decision-making, both in the countries we study and in the other 180+ countries where COVID-19 infections have been reported.

A key advantage of our reduced-form “top down” statistical approach is that it captures the real-world behavior of affected populations without requiring that we explicitly model all underlying mechanisms and processes. This property is useful in the current stage of the pandemic when many process-related parameters remain unknown. However, our results cannot and should not be interpreted as a substitute for process-based epidemiological models specifically designed to provide guidance in public health crises. Rather, our results complement existing models, for example, by helping to calibrate key model parameters. We believe both forward-looking simulations and backward-looking empirical evaluations should be used to inform decision-making.

Our analysis measures changes in local infection growth rates associated with changes in anti-contagion policies, treating each subnational administrative unit as if it were in a natural experiment. Intuitively, each administrative unit observed just prior to a policy deployment serves as the “control” for the same unit in the days after it receives a policy “treatment”. Thus, a necessary condition for our estimates to be interpreted as the plausibly causal effect of these policies is that the timing of policy deployment is independent of infection growth rates.²³ Such an assumption is supported by epidemiological theory, which predicts that infection totals in the absence of policy will be near-perfectly exponential early in the epidemic,⁷ implying that pre-policy infection growth rates in this context should be constant. The policies we analyze are unlikely to have been deployed in reaction to or anticipation of changes in growth rates, since epidemiological guidance to decision-makers explicitly projected constant growth rates in the absence of anti-contagion measures.^{2, 29, 40, 41} In practice, decision-makers have tended to deploy policies in response to the count of total infections in their locality, rather than their growth rate,² in response to outbreaks in other regions or countries,⁴² or based on other arbitrary and exogenous factors, such as closing schools on a Monday or after Spring Break.⁴³

Our analysis accounts for documented changes in the availability of and procedures for testing for COVID-19 as well as differences in case-detection across locations; however, unobserved trends in case-detection could affect our results (see Methods). For example, if growing awareness of COVID-19 caused an increasing fraction of infected individuals to be tested over time, then unadjusted infection growth rates later in our sample would be biased upwards. Because an increasing number of policies are active later in these samples as well, this bias would cause our current findings to understate the overall effectiveness of anti-contagion policies. However, our analysis of estimated case-detection trends³⁶ (Extended Data Fig. 2) suggests that the magnitude of this potential bias is small, elevating our estimated no-policy growth rates by 0.022 (6%) on average.

It is also possible that changing public information during the period of our study has some unknown effect on our results. If individuals alter their behavior in response to new information unrelated to anti-contagion policies, such as news reports about COVID-19, this could alter the growth rate of infections and thus affect our estimates. Because the quantity of new information is increasing over time, if this information reduces infection growth rates, it would cause us to overstate the effectiveness of anti-contagion policies. We note, however, that if public information is increasing in response to policy actions, then it should be considered a pathway through which policies alter infection growth, not a form of bias. Investigating these potential effects is beyond the scope of this analysis, but it is an important topic for future investigations.

While our analysis has focused on changes in the growth rate of infections, other outcomes, such as hospitalizations or deaths, are also of policy interest. Because these outcomes are more context- and state-dependent than infection growth rates, their analysis in future work may require additional modeling approaches. Nonetheless, we experimentally implement our approach on the daily growth rate of hospitalizations in France, the only country in our sample where we were able to obtain hospitalization data at the granularity of this study. We find that the total estimated effect of anti-contagion policies on the growth rate of hospitalizations is similar to our reported effect on the infection growth rate (Extended Data Fig. 6c).

Here we exclusively analyzed large-scale anti-contagion social policies to understand their effects on infection growth rates within a locality. However, contact tracing, international travel restrictions, and medical resource management, along with many other policy decisions, will play key roles in the global response to COVID-19. Our results do not speak to the efficacy of these other policies.

Lastly, the results presented here are not sufficient on their own to determine which anti-contagion policies are ideal for particular populations, nor do they speak to whether the social costs of individual policies are larger or smaller than the social value of their health benefits. Computing a full value of health benefits also requires understanding how different growth rates of infections and total active infections affect mortality rates, as well as determining a social value for all of these impacts. Furthermore, this analysis does not quantify the sizable social costs of anti-contagion policies, a critical topic for future investigations.

Data Availability

All data and code used in this analysis are available at https://github.com/bolliger32/gpl-covid. Updates are posted at http://www.globalpolicy.science/covid19.

Methods

Data Collection and Processing

We provide a brief summary of our data collection processes here (see the Supplementary Notes for more details, including access dates). Epidemiological, case definition/testing regime, and policy data for each of the six countries in our sample were collected from a variety of in-country data sources, including government public health websites, regional newspaper articles, and crowd-sourced information on Wikipedia. The availability of epidemiological and policy data varied across the six countries, and preference was given to collecting data at the most granular administrative unit level. The country-specific panel datasets are at the region level in France, the state level in the US, the province level in South Korea, Italy and Iran, and the city level in China. Due to data availability, the sample dates differ across countries: in China we use data from January 16 - March 5, 2020; in South Korea from February 17 - April 6, 2020; in Italy from February 26 - April 6, 2020; in Iran from February 27 - March 22, 2020; in France from February 29 - March 25, 2020; and in the US from March 3 - April 6, 2020. Below, we describe our data sources.

China

We acquired epidemiological data from an open source GitHub project¹ that scrapes time series data from Ding Xiang Yuan. We extended this dataset back in time to January 10, 2020 by manually collecting official daily statistics from the central and provincial (Hubei, Guangdong, and Zhejiang) Chinese government websites. We compiled policies by collecting data on the start dates of travel bans and lockdowns at the city-level from the “2020 Hubei lockdowns” Wikipedia page², the Wuhan Coronavirus Timeline project on Github,³ and various other news reports. We suspect that most Chinese cities have implemented at least one anti-contagion policy due to their reported trends in infections; as such, we dropped cities where we could not identify a policy deployment date to avoid miscategorizing the policy status of these cities. Thus our results are only representative for the sample of 116 cities for which we obtained policy data.

South Korea

We manually collected and compiled the epidemiological dataset in South Korea, based on provincial government reports, policy briefings, and news articles. We compiled policy actions from news articles and press releases from the Korean Centers for Disease Control and Prevention (KCDC), the Ministry of Foreign Affairs, and local governments’ websites.

Iran

We used epidemiological data from the table “New COVID-19 cases in Iran by province”⁴ in the “2020 coronavirus pandemic in Iran” Wikipedia article, which were compiled from the data provided on the Iranian Ministry of Health website (in Persian). We relied on news media reporting and two timelines of pandemic events in Iran^5,6 to collate policy data.

Italy

We used epidemiological data from the GitHub repository⁷ maintained by the Italian Department of Civil Protection (Dipartimento della Protezione Civile). For policies, we primarily relied on the English version of the COVID-19 dossier “Chronology of main steps and legal acts taken by the Italian Government for the containment of the COVID-19 epidemiological emergency” written by the Dipartimento della Protezione Civile,⁸ and Wikipedia.⁹

France

We used the region-level epidemiological dataset provided by France’s government website¹⁰ and supplemented it with numbers of confirmed cases by region on France’s public health website, which was previously updated daily through March 25.¹¹ We obtained data on France’s policy response to the COVID-19 pandemic from the French government website,¹² press releases from each regional public health site,¹³ and Wikipedia.¹⁴

United States

We used state-level epidemiological data from usafacts.org,¹⁵ which they compile from multiple sources. For policy responses, we relied on a number of sources, including the U.S. Centers for Disease Control (CDC), the National Governors Association, as well as various executive orders from county- and city-level governments, and press releases from media outlets.

Policy Data

Policies in administrative units were coded as binary variables, where the policy was coded as either 1 (after the date that the policy was implemented, and before it was removed) or 0 otherwise, for the affected administrative units. When a policy only affected a fraction of an administrative unit (e.g., half of the counties within a state), policy variables were weighted by the percentage of people within the administrative unit who were treated by the policy. We used the most recent population estimates we could find for countries’ administrative units (see the Population Data section in the Appendix). Additionally, in order to standardize policy types across countries, we mapped each country-specific policy to one of the broader policy category variables in our analysis. In this exercise, we collected 137 policies for China, 59 for South Korea, 215 for Italy, 22 for Iran, 59 for France, and 1167 for the United States (see Supplementary Table 1).

Epidemiological Data

We collected information on cumulative confirmed cases, cumulative recoveries, cumulative deaths, active cases, and any changes to domestic COVID-19 testing regimes, such as case definitions or testing methodology. For our regression analysis (Figure 2), we use active cases when they are available (for China and South Korea) and cumulative confirmed cases otherwise. We document quality control steps in the Appendix. Notably, for China and South Korea we acquired more granular data than the data hosted on the John Hopkins University (JHU) interactive dashboard;¹⁶ we confirm that the number of confirmed cases closely match between the two data sources (see Extended Data Fig. 1). To conduct the econometric analysis, we merge the epidemiological and policy data to form a single data set for each country.

Econometric analysis

Reduced-Form Approach

The reduced-form econometric approach that we apply here is a “top down” approach that describes the behavior of aggregate outcomes y in data (here, infection rates). This approach can identify plausibly causal effects^{23, 24} induced by exogenous changes in independent policy variables z (e.g., school closure) without explicitly describing all underlying mechanisms that link z to y, without observing intermediary variables x (e.g., behavior) that might link z to y, or without other determinants of y unrelated to z (e.g., demographics), denoted w. Let f (·) describe a complex and unobserved process that generates infection rates y:

Process-based epidemiological models aim to capture elements of f (·) explicitly, and then simulate how changes in z, x, or w affect y. This approach is particularly important and useful in forward-looking simulations where future conditions are likely to be different than historical conditions. However, a challenge faced by this approach is that we may not know the full structure of f (·), for example if a pathogen is new and many key biological and societal parameters remain uncertain. Crucially, we may not know the effect that large-scale policy (z) will have on behavior (x(z)) or how this behavior change will affect infection rates (f (·)).

Alternatively, one can differentiate Equation 1 with respect to the k^th policy z_k: which describes how changes in the policy affects infections through all N potential pathways mediated by x₁, …, x_N. Usefully, for a fixed population observed over time, empirically estimating an average value of the local derivative on the left-hand-side in Equation 2 does not depend on explicit knowledge of w. If we can observe y and z directly and estimate changes over time with data, then intermediate variables x also need not be observed nor modeled. The reduced-form econometric approach^{23, 24} thus attempts to measure directly, exploiting exogenous variation in policies z.

Model

Active infections grow exponentially during the initial phase of an epidemic, when the proportion of immune individuals in a population is near zero. Assuming a simple Susceptible-Infected-Recovered (SIR) disease model (e.g., ref. [²²]), the growth in infections during the early period is where I_t is the number of infected individuals at time t, β is the transmission rate (new infections per day per infected individual), γ is the removal rate (proportion of infected individuals recovering or dying each day) and S is the fraction of the population susceptible to the disease. The second equality holds in the limit S→ 1, which describes the current conditions during the beginning of the COVID-19 pandemic. The solution to this ordinary differential equation is the exponential function where is the initial condition. Taking the natural logarithm and rearranging, we have

Anti-contagion policies are designed to alter g, through changes to β, by reducing contact between susceptible and infected individuals. Holding the time-step between observations fixed at one day (t₂− t₁ = 1), we thus model g as a time-varying outcome that is a linear function of a time-varying policy where θ₀ is the average growth rate absent policy, policy_t is a binary variable describing whether a policy is deployed at time t, and θ is the average effect of the policy on growth rate g over all periods subsequent to the policy’s introduction, thereby encompassing any lagged effects of policies. ϵ_t is a mean-zero disturbance term that captures inter-period changes not described by policy_t. Using this approach, infections each day are treated as the initial conditions for integrating Equation 4 through to the following day.

We compute the first differences log(I_t) − log(I_t−1) using active infections where they are available, otherwise we use cumulative infections, noting that they are almost identical during this early period (except in China, where we use active infections). We then match these data to policy variables that we construct using the novel data sets we assemble and apply a reduced-form approach to estimate a version of Equation 6, although the actual expression has additional terms detailed below.

Estimation

To estimate a multi-variable version of Equation 6, we estimate a separate regression for each country c. Observations are for subnational units indexed by i observed for each day t. Because not all localities began testing for COVID-19 on the same date, these samples are unbalanced panels. To ensure data quality, we restrict our analysis to localities after they have reported at least ten cumulative infections.

We estimate a multiple regression version of Equation 6 using ordinary least squares. We include a vector of subnational unit-fixed effects θ₀ (i.e., varying intercepts captured as coefficients to dummy variables) to account for all time-invariant factors that affect the local growth rate of infections, such as differences in demographics, socio-economic status, culture, and health systems.²⁴ We include a vector of day-of-week-fixed effects d to account for weekly patterns in the growth rate of infections that are common across locations within a country, however, in China, we omit day-of-week effects because we find no evidence they are present in the data – perhaps due to the fact that the outbreak of COVID-19 began during a national holiday and workers never returned to work. We also include a separate single-day dummy variable each time there is an abrupt change in the availability of COVID-19 testing or a change in the procedure to diagnose positive cases. Such changes generally manifest as a discontinuous jump in infections and a re-scaling of subsequent infection rates (e.g., See China in Figure 1), effects that are flexibly absorbed by a single-day dummy variable because the dependent variable is the first-difference of the logarithm of infections. We denote the vector of these testing dummies µ.

Lastly, we include a vector of P_c country-specific policy variables for each location and day. These policy variables take on values between zero and one (inclusive) where zero indicates no policy action and one indicates a policy is fully enacted. In cases where a policy variable captures the effects of collections of policies (e.g., museum closures and library closures), a policy variable is computed for each, then they are averaged, so the coefficient on this type of variable is interpreted as the effect if all policies in the collection are fully enacted. There are also instances where multiple policies are deployed on the same date in numerous locations, in which case we group policies that have similar objectives (e.g., suspension of transit and travel ban, or cancelling of events and no gathering) and keep other policies separate (i.e., business closure, school closure). The grouping of policies is useful for reducing the number of estimated parameters in our limited sample of data, allowing us to examine the impact of subsets of policies (e.g. Fig. 2c). However, policy grouping does not have a material impact on the estimated effect of all policies combined nor on the effect of actual policies, which we demonstrate by estimating a regression model where no policies are grouped and these values are recalculated (Supplementary Table 4, Extended Data Fig. 6).

In some cases (for Italy and the US), policy data is available at a more spatially granular level than infection data (e.g., city policies and state-level infections in the US). In these cases, we code binary policy variables at the more granular level and use population-weights to aggregate them to the level of the infection data. Thus, policy variables may take on continuous values between zero and one, with a value of one indicating that the policy is fully enacted for the entire population. Given the limited quantity of data currently available, we use a parsimonious model that assumes the effects of policies on infection growth rates are approximately linear and additively separable. However, future work that possesses more data may be able to identify important nonlinearities or interactions between policies.

For each country, our general multiple regression model is thus where observations are indexed by country c, subnational unit i, and day t. The parameters of interest are the country-by-policy specific coefficients θ_cp. We display the estimated residuals ϵ_cit in Extended Data Fig. 10, which are mean zero but not strictly normal (normality is not a requirement of our modeling and inference strategy), and we estimate uncertainty over all parameters by calculating our standard errors robust to error clustering at the day level.²³ This approach allows the covariance in ϵ_cit across different locations within a country, observed on the same day, to be nonzero. Such clustering is important in this context because idiosyncratic events within a country, such as a holiday or a backlog in testing laboratories, could generate nonuniform country-wide changes in infection growth for individual days not explicitly captured in our model. Thus, this approach non-parametrically accounts for both arbitrary forms of spatial auto-correlation or systematic misreporting in regions of a country on any given day (we note that it generates larger estimates for uncertainty than clustering by i). When we report the effect of all policies combined (e.g., Figure 2b) we are reporting the sum of coefficient estimates for all policies , accounting for the covariance of errors in these estimates when computing the uncertainty of this sum.

Note that our estimates of θ and θ₀ in Equation 7 are robust to systematic under-reporting of infections, a major concern in the ongoing pandemic, due to the construction of our dependent variable. This remains true even if different localities have different rates of under-reporting, so long as the rate of under-reporting is relatively constant. To see this, note that if each locality i has a medical system that reports only a fraction ψ_i of infections such that we observe Ĩ_it = ψ_iI_it rather an actual infections I_it, then the left-hand-side of Equation 7 will be and is therefore unaffected by location-specific and time-invariant under-reporting. Thus systematic under-reporting does not affect our estimates for the effects of policy θ. As discussed above, potential biases associated with non-systematic under-reporting resulting from documented changes in testing regimes over space and time are absorbed by region-day specific dummies µ.

However, if the rate of under-reporting within a locality is changing day-to-day, this could bias infection growth rates. We estimate the magnitude of this bias (see Extended Data Fig. 2), and verify that it is quantitatively small. Specifically, if Ĩ_it = ψ_itI_it where ψ_it changes day-to-day, then where log(ψ_it) − log(ψ_i,t−1) is the day-over-day growth rate of the case-detection probability. Disease surveillance has evolved slowly in some locations as governments gradually expand testing, which would cause ψ_it to change over time, but these changes in testing capacity do not appear to significantly alter our estimates of infection growth rates. In Extended Data Fig. 2, we show one set of epidemiological estimates³⁶ for log(ψ_it) − log(ψ_i,t−1). Despite random day-to-day variations, which do not cause systematic biases in our point estimates, the mean of log(ψ_it) − log(ψ_i,t−1) is consistently small across the different countries: 0.047 in China, 0.066 in Iran, 0.008 in South Korea, 0.053 in France, 0.028 in Italy, and 0.036 in the US. The average of these estimates is 0.022, potentially accounting for 6.2% of our global average estimate for the no-policy infection growth rate (0.35). These estimates of log(ψ_it) − log(ψ_i,t−1) also do not display strong temporal trends, alleviating concerns that time-varying under-reporting generates sizable biases in our estimated effects of anti-contagion policies.

Transient dynamics

In China, we are able to examine the transient response of infection growth rates following policy deployment because only two policies were deployed early in a seven-week sample period during which we observe many cities simultaneously. This provides us with sufficient data to estimate the temporal structure of policy effects without imposing assumptions regarding this structure. To do this, we estimate a distributed-lag model that encodes policy parameters using weekly lags based on the date that each policy is first implemented in locality i. This means the effect of a policy implemented one week ago is allowed to differ arbitrarily from the effect of that same policy in the following week, etc. These effects are then estimated simultaneously and are displayed in Fig. 2 (also Supplementary Table 3). Such a distributed lag approach did not provide statistically meaningful insight in other countries using currently available data because there were fewer administrative units and shorter periods of observation (i.e. smaller samples), and more policies (i.e. more parameters to estimate) in all other countries. Future work may be able to successfully explore these dynamics outside of China.

We also explore the day-by-day response to the first anti-contagion policies in a limited number Chinese cities using an event study approach.⁴⁴ We examine the 36 cities in which five days of infection growth data immediately before and after deployment of the first anti-contagion policy (home isolation) are available (similar samples were unavailable in the other countries we study). Pooling these data, we then estimate average rates of infection growth five days before deployment, four days before, etc., shown in Extended Data Fig. 5a. In this limited sample of cities, we find that infection growth rates separate from the average pre-policy growth rate within the first three days following deployment of the policy.

As a robustness check, we examine whether excluding the transient response from the estimated effects of policy substantially alters our results. We do this by estimating a “fixed lag” model, where we assume that policies cannot influence infection growth rates for L days, recoding a policy variable at time t as zero if a policy was implemented fewer than L days before t. We re-estimate Equation 7 for each value of L and present results in Extended Data Fig. 5 and Supplementary Table 5.

Alternative disease models

Our main empirical specification is motivated with an SIR model of disease contagion, which assumes zero latent period between exposure to COVID-19 and infectiousness. If we relax this assumption to allow for a latent period of infection, as in a Susceptible-Exposed-Infected-Recovered (SEIR) model, the growth of the outbreak is only asymptotically exponential.²² Nonetheless, we demonstrate that SEIR dynamics have only a minor potential impact on the coefficients recovered by using our empirical approach in this context. In Extended Data Figs. 8 and 9 we present results from a simulation exercise which uses Equations 9-11, along with a generalization to the SEIR model²² to generate synthetic outbreaks (see Supplementary Methods Section 2). We use these simulated data to test the ability of our statistical model (Equation 7) to recover both the unimpeded growth rate (Extended Data Fig. 8) as well as the impact of simulated policies on growth rates (Extended Data Fig. 9) when applied to data generated by SIR or SEIR dynamics over a wide range of epidemiological conditions.

Projections

Daily growth rates of infections

To estimate the instantaneous daily growth rate of infections if policies were removed, we obtain fitted values from Equation 7 and compute a predicted value for the dependent variable when all P_c policy variables are set to zero. Thus, these estimated growth rates capture the effect of all locality-specific factors on the growth rate of infections (e.g., demographics), day-of-week-effects, and adjustments based on the way in which infection cases are reported. This counterfactual does not account for changes in information that are triggered by policy deployment, since those should be considered a pathway through which policies affect outcomes, as discussed in the main text. When we report an average no-policy growth rate of infections (Figure 2a), it is the average value of these predictions for all observations in the original sample. Location-and-day specific counterfactual predictions, accounting for the covariance of errors in estimated parameters, are shown as red markers in Figure 3.

Cumulative infections

To provide a sense of scale for the estimated cumulative benefits of effects shown in Figure 3, we link our reduced-form empirical estimates to the key structures in a simple SIR system and simulate this dynamical system over the course of our sample. The system is defined as the following: where S_t is the susceptible population and R_t is the removed population. Here β_t is a time-evolving parameter, determined via our empirical estimates as described below. Accounting for changes in S becomes increasingly important as the size of cumulative infections (I_t + R_t) becomes a substantial fraction of the local subnational population, which occurs in some no-policy scenarios. Our reduced-form analysis provides estimates for the growth rate of active infections (ĝ) for each locality and day, in a regime where S_t ≈ 1. Thus we know but we do not know the values of either of the two right-hand-side terms, which are required to simulate Equations 9-11. To estimate γ, we note that the left-hand-side term of Equation 11 is which we can observe in our data for China and South Korea. Computing first differences in these two variables (to differentiate with respect to time), summing them, and then dividing by active cases gives us estimates of γ (medians: China=0.11, Korea=0.048). These values differ slightly from the classical SIR interpretation of γ because in the public data we are able to obtain, individuals are coded as “recovered” when they no longer test positive for COVID-19, whereas in the classical SIR model this occurs when they are no longer infectious. We adopt the average of these two medians, setting γ = .079. We use medians rather than simple averages because low values for I induce a long right-tail in daily estimates of γ and medians are less vulnerable to this distortion. We then use our empirically-based reduced-form estimates of ĝ (both with and without policy) combined with Equations 9-11 to project total cumulative cases in all countries, shown in Figure 4. We simulate infections and cases for each administrative unit in our sample beginning on the first day for which we observe 10 or more cases (for that unit) using a time-step of 4 hours. Given we observe confirmed cases, rather than true infections, in our data, we seed each simulation by assuming I_t on the first day is equal to the number of observed cases divided by country-specific estimates of the proportion of infections confirmed.³⁶ We assume R_t = 0 on the first day. To maintain consistency with the reported data, we report our output in confirmed cases by multiplying our simulated I_t + R_t values by the aforementioned proportion of infections confirmed. We estimate uncertainty by resampling from the estimated variance-covariance matrix of all parameters. In Extended Data Fig. 7, we show sensitivity of this simulation to the estimated value of γ as well as to the use of a Susceptible-Exposed-Infected-Recovered (SEIR) framework (see Supplementary Methods Section 1).

End Notes

Author Contributions

SH conceived of and led the study. All authors designed analysis, interpreted results, designed figures, and wrote the paper. China: LYH, TW collected health data, LYH, TW, JT collected policy data, LYH cleaned data. South Korea: JL Collected health data, TC, JL collected policy data, TC cleaned data. Italy: DA collected health data, PL collected policy data, DA cleaned data. France: SAP collected health data, SAP, JT, HD collected policy data, SAP cleaned data. Iran: AH collected health data and policy data, AH, DA cleaned data. USA: ER, KB collected health data, EK collected policy data, ER, DA, KB cleaned data. IB Collected geographic and population data for all countries. SH designed the econometric model. SH, SAP, JT conducted econometric analysis for all countries. KB, IB, AH, ER, EK designed and implemented epidemiological models and projections. SAP, KB, IB, JT, AH, EK designed and implemented robustness checks. HD created Fig. 1, TC created Fig. 2, JT created Fig. 3, ER created Fig. 4, DA created SI Table 1, LYH, JL created SI Table 2, JT created SI Table 3, JT created SI Table 4, SAP, JT created SI Table 5, LYH created ED Figs. 1–2, SAP created ED Figs. 3–5, JT created ED Fig. 6, KB created ED Fig. 7, IB created ED Figs. 8–9, JT created ED Fig. 10. DA, IB, PL managed policy data collection and quality control. IB, TC managed the code repository. IB, PL ran project management. EK, TW, JT, PL managed literature review. LYH, EK managed References. PL managed the Extended Data and Appendix. The authors declare no conflicts of interest.

Additional Information

Supplementary Information is available for this paper. Correspondence and requests for materials should be addressed to Solomon Hsiang (shsiang{at}berkeley.edu). All data and code used in this analysis are available at https://github.com/bolliger32/gpl-covid. Updates posted at http://www.globalpolicy.science/covid19.

Extended Data

Extended Data Figure 1. Validating disaggregated epidemiological data against aggregated data from the Johns Hopkins Center for Systems Science and Engineering.

Comparison of cumulative confirmed cases from a subset of regions in our collated epidemiological dataset to the same statistics from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by the Johns Hopkins Center for Systems Science and Engineering (JHU CSSE). ¹ We conduct this comparison for Chinese provinces and South Korea, where the data we collect are from local administrative units that are more spatially granular than the data in the JHU CSSE database. a, In China, we aggregate our city-level data to the province level, and b, in Korea we aggregate province-level data up to the country level. Small discrepancies, especially in later periods of the outbreak, are generally due to imported cases (international or domestic) that are present in national statistics but which we do not assign to particular cities (in China) or provinces (in Korea).

Extended Data Figure 2. Estimated trends in case detection over time within each country.

Systematic trends in case detection may potentially bias estimates of no-policy infection growth rates (see Equation 8). We estimate the potential magnitude of this bias using data from the Centre for Mathematical Modelling of Infectious Diseases.² Markers indicate daily first-differences in the logarithm of the fraction of estimated symptomatic cases reported for each country over time. The average value over time (solid line and value denoted in panel title) is the average growth rate of case detection, equal to the magnitude of the potential bias. For example, in the main text we estimate that the infection growth rate in the United States is 0.30 (Figure 2A), of which growth in case detection might contribute 0.036 (this figure).

Extended Data Figure 3. Robustness of the estimated no-policy growth rate of infections and the combined effect of policies to withholding blocks of data from entire regions.

For each country, we re-estimated Eq. 7 using real data k times, each time withholding one of the k first-level administrative regions (“Adm1,” i.e. state or province) in that country. Each gray circle is either (a) the estimated no-policy growth rate or (b) the total effect of all policies combined, from one of these k regressions. Red and blue circles show estimates from the full sample, identical to results presented in panels A and B of Figure 2, respectively. For each country panel, if a single region is influential, the estimated value when it is withheld from the sample will appear as an outlier. Some regions that appear influential are highlighted with an open pink circle. As in Figure 2B of the main text, we estimate a distributed lag model for China and display each of the estimated weekly lag effects (red circle is the same “without Hubei” sample for lags).

Extended Data Figure 4. Robustness of the estimated effects of individual policies to withholding blocks of data from entire regions.

Same as Extended Data Figure 3, but for individual policies (analogous to Figure 2C in the main text). In cases where two regions are influential, a second region is highlighted with an open green circle.

Extended Data Figure 5. Evidence supporting models where policies affect infection growth rates in the days following deployment.

Existing evidence has not demonstrated whether policies should affect infection growth rates in the days immediately following deployment. It is therefore not clear ex ante whether the policy variables in Eq. 7 should be encoded as “on” immediately following a policy deployment. We estimate “fixed-lag” models in which a fixed delay between a policy’s deployment and its effect is assumed (see Supplementary Methods 3). If a delay model is more consistent with real world infection dynamics, these fixed lag models should recover larger estimates for the impact of policies and exhibit better model fit. a, Because data from China cover a longer period with fewer policies that are each implemented early in the sample, we estimated an explicit distributed lag model in the main article (Figure 2), finding evidence of policy impacts in the first week of deployment and evidence that these effects increase in the following weeks. Using a reduced sample of 36 Chinese cities where at least five days of infection data are available before and after the first policy (home isolation) is deployed, we implement an event study.³ Orange markers show the average infection growth rate in the five days immediately prior to and following the first policy deployment. b, R-squared values associated with fixed-lag lengths varying from zero to fifteen days. In-sample fit generally declines or remains unchanged if policies are assumed to have a delay longer than four days (whiskers are 95% CI computed through resampling). c, Estimated effects for no lag (the model reported in the main text) and for fixed-lags between one and five days. Estimates generally are unchanged or shrink towards zero (e.g. Home isolation in Iran), consistent with mis-coding of post-policy days as no-policy days.

Extended Data Figure 6. Estimated infection or hospitalization growth rates with actual anti-contagion policies and in a “no policy” counterfactual scenario.

a, The estimated daily growth rates of active (China, South Korea) or cumulative (all others) infections based on the observed timing of all policy deployments within each subnational unit (blue) and in a scenario where no policies were deployed (red). Identical to Figure 3 in the main text, but using an alternative disaggregated encoding of policies that does not group any policies into policy packages. b, Same as Figure 3 in the main text, but Eq. 7 is implemented for a single example administrative unit, Wuhan, China. c, Same as Figure 3 in the main text, but using hospitalization data from France rather than cumulative cases, since the French government stopped reporting the latter after March 25, 2020. For all panels, the difference between the with- and no-policy predictions is our estimated effect of actual anti-contagion policies on the growth rate of infections (or hospitalizations). The markers are daily estimates for each subnational administrative unit (vertical lines are 95% confidence intervals). Black circles are observed changes in log(infections) (or diamonds for log(hospitalizations)), averaged across the same administrative unit.

Extended Data Figure 7. Sensitivity of estimated averted/delayed infections to the choice of γ and σ in an SIR/SEIR framework.

This figure displays the sensitivity of total averted/delayed cases presented in Figure 4 of the main text to alternative modeling assumptions. We compute total cases across the respective final days in our samples for the six countries presented in our analysis. The figure displays how these totals vary with eight values of γ (0.05-0.4) and four values of σ (0.2, 0.33, 0.5, ∞), where the final value of σ (∞) corresponds to the SIR model. a, The simulated total number of infections under no policy. b, Same, but using actual policies. c, The difference between (a) and (b), which are the total number of averted/delayed infections. d, Same as (c), but on a logarithmic scale similar to Figure 4 in the main text (a-c are on a linear scale, trimmed to show details). Figure 4 in the main text uses γ = 0.079, which we calculate using empirical recovery/death rates in countries where we observe them (China and South Korea, see Methods). If we assume a 14-day delay between infected individuals becoming non-infectious and being reported as “recovered” in the data, we would calculate γ = 0.18. Figure 4 in the main text assumes σ = ∞.

Extended Data Figure 8. Simulating reduced form estimates for the no-policy growth rate of infections for different population regimes and disease dynamics.

We examine the performance of reduced form econometric estimators through simulations in which different underlying disease dynamics are assumed (see Supplementary Information Section 3). Each histogram shows the distribution of econometrically estimated values across 1,000 simulated outbreaks. Estimates are for the no-policy infection growth rate (analogous to Figure 2A) when three different policies are deployed at random moments in time. The black line shows the correct value imposed on the simulation and the red histogram shows the distribution of estimates using the regression in Eq. 7, applied to data output from the simulation. The grey dashed line shows the mean of this distribution. The 12 subpanels describe the results when various values are assigned to the mean latency period (γ^-1) and mean infectious period (σ^-1) of the disease. “σ = ∞” is equivalent to SIR disease dynamics. In each panel, S_min is the minimum susceptible fraction observed across all 1,000 45-day simulations shown in each panel. For reference, in the real datasets used in the main text, after correcting for country-specific underreporting, S_min across all units analyzed is 0.78 and 95% of the analyzed units finish with S_min > 0.93. “Bias” refers to the distance between the dashed grey and black line as a percentage of the true value. a, Simulations in near-ideal data conditions in which we observe active infections within a large population (such that the susceptible fraction of the population remains high during the sample period). For example, these conditions are similar to those in our real data for Chongqing, China. b, Simulations in a non-ideal data scenario where we are only able to observe cumulative infections in a small population. For example, these conditions are similar to those in our real sample of data for Cremona, Italy.

Extended Data Figure 9. Simulating reduced form estimates for anti-contagion policy effects for different population regimes and assumed disease dynamics.

Same as Extended Data Figure 8, but estimates are for the combined effect of three different policies (analogous to Figure 2B) that are deployed at random moments in time.

Extended Data Figure 10. Regression residuals for the growth rates of COVID-19 by country.

These plots show the estimated residuals from Equation 7 for each country-specific econometric model. Histograms (left) show the estimated unconditional probability density function. Quantile plots (right) show quantiles of the cumulative density function (y-axis) plotted against the same quantiles for a Normal Distribution. For additional details, see the full model under the Methods - Econometric analysis section as well as the results in Figure 3 of the main paper.

SI Guide

Supplementary Notes

Describes the data acquisition and processing for the epidemiological and policy data. The sources for both types of data come from a variety of in-country data sources, which include government public health websites, regional newspaper articles, and Wikipedia crowd-sourced information. We have supplemented this data with international data compilations.

Supplementary Methods

Describes sensitivity analyses and simulations performed to verify the robustness of our model, including: the sensitivity of our regression model and counterfactual projections to varying epidemiological parameters; and the sensitivity of our estimates to alternative lag structures, withholding of data, and differing policy groupings.

Supplementary Tables

Contains tables detailing: 1) the number of anti-contagion policies tabulated by administrative division in each country; 2) the main regression results estimating the effect of policy on growth rates; and 3) epidemiological data in Wuhan prior to policy intervention, and estimates of the initial infection growth rate and case doubling times.

Supplementary Information

The Supplementary Information contains three sections: Supplementary Notes, Supplementary Methods, and Supplementary Tables.

The Supplementary Notes section describes the data acquisition and processing procedure for the epidemiological and policy data used in this paper. The sources for both types of data come from a variety of in-country data sources, which include government public health websites, regional newspaper articles, and Wikipedia crowd-sourced information. We have supplemented this data with international data compilations. A list of the epidemiological and policy data compiled for this analysis can be found here.

The Supplementary Methods section describes sensitivity analyses and simulations performed to verify the robustness of our model, including: the sensitivity of our regression model and counterfactual projections to varying epidemiological parameters; and the sensitivity of our estimates to alternative lag structures, withholding of data, and differing policy groupings.

The Supplementary Tables section contains tables detailing: 1) the number of anti-contagion policies tabulated by administrative division in each country; 2) epidemiological data in Wuhan prior to policy intervention, and estimates of the initial infection growth rate and case doubling times; and 3) the main regression results estimating the effect of policy on growth rates.

Supplementary Notes

Epidemiological Data

The epidemiological datasets and sources used in this paper are described below. The main health variables of interest are:

“cum_confirmed_cases”: The total number of confirmed positive cases in the administrative area since the first confirmed case.
“cum_deaths”: The total number of individuals that have died from COVID-19.
“cum_recoveries: The total number of individuals that have recovered from COVID-19.
“cum_hospitalized”: The total number of hospitalized individuals.
“cum_hospitalized_symptom”: The total number of symptomatic hospitalized individuals.
“cum_intensive_care” : The total number of individuals that have received intensive care.
“cum_home_confinement”: The total number of individuals that have been self-quarantined in their homes as a result of a positive test.
“active_cases”: The number of individuals who currently still test positive on the date of the observation.
“active_cases_new”: The number of new active cases since the previous date.
“cum_tests”: The total number of tests (includes both positive and negative results) conducted in an administrative unit.

Additional metadata accompanying the health outcome variables:

“date”: The date of observation.
“adm0_name”: The ISO3 (country) code to which this observation belongs.
“adm1_name”: The name of the “Adm1” region (typically state or province) to which this observation belongs.
“adm2_name”: If the dataset contains observations at the “Adm2” level, then this is the name of the “Adm2” region to which this observation belongs (e.g. counties in the United States).
“adm[1,2]_id”: Any alphanumeric ID scheme to identify different administrative units (e.g. FIPS code in the United States).
“lat”: The latitude of the centroid of the administrative unit.
“lon”: The longitude of the centroid of the administrative unit.
“policies_enacted”: The number of active policies that are in place for the administrative unit as of that date. This variable is not population weighted.
“testing_regime”: A categorical variable used to identify when an administrative region changed their COVID-19 testing regime. This is zero-indexed, with the ordering only indicating chronological progression (there is no external meaning to Regime 2 vs. Regime 1 vs. Regime 0, and there is no consistency enforced for coding across countries). For example, if China changes their testing regime twice, all observations prior to the first regime change would be coded “testing_regime=0,” all observations in between the two changes would be coded “testing_regime=1,” and all observations after the second change would be coded “testing_regime=2.”
“population”: The population of the administrative unit.
“pop_is_imputed”: A binary variable equal to 1 if the population is imputed, and 0 otherwise. Used for imputing the population of some cities in China.

Data Imputation

In instances where health outcome observations are missing or suffer from data quality issues, we have imputed to fill in the missing values. Imputed health outcome variables are denoted by “[health_outcome]_imputed.” For the majority of our analyses we do not use imputed data; France is the exception where we impute two days of missing data. We do this to ensure we have variation in policy variables for use in the analysis.

We impute by:

Taking the natural log of the non-missing observations pertaining to that health outcome variable.
Linearly interpolating over the missing dates for that health outcome variable.
Exponentiating the interpolated values back into levels and rounding to the nearest integer.