Changes in Reproductive Rate of SARS-CoV-2 Due to Non-pharmaceutical Interventions in 1,417 U.S. Counties

In response to the rapid spread of the novel coronavirus, SARS-CoV-2, the U.S. has largely delegated implementation of non-pharmaceutical interventions (NPIs) to local governments on the state and county level. This staggered implementation combined with the heterogeneity of the U.S. complicates quantification the effect of NPIs on the reproductive rate of SARS-CoV-2. We describe a data-driven approach to quantify the effect of NPIs that relies on county-level similarities to specialize a Bayesian hierarchical inference model based on observed fatalities. Using this approach, we estimate change in reproductive rate, Rt, due to implementation of NPIs in 1,417 U.S. counties. We estimate that as of May 28th, 2020 1,177 out of the considered 1,417 U.S. counties have reduced the reproductive rate of SARS-CoV-2 to below 1.0. The estimated effect of any individual NPI, however, is different across counties. Stay-at-home orders were estimated as the only effective NPI in metropolitan and urban counties, while advisory NPIs were estimated to be effective in more rural counties. The expected level of infection predicted by the model ranges from 0 to 28.7% and is far from herd immunity even in counties with advanced spread. Our results suggest that county characteristics are pertinent to re-opening decisions.


Introduction
As of May 28 th , 2020, the United States has reported more than 1,700,000 cases of novel coronavirus 2019 . This disease, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, has led to more than 100,000 deaths in the U.S. 1 Nonpharmaceutical interventions (NPIs) are a critical component of the public health effort to slow the spread of COVID-19B as pharmaceutical interventions are not currently available. NPIs include guidelines for hand hygiene, cancellations of mass events, school closures, closure of non-essential business, and stay-at-home orders. These measures are designed to reduce transmission, buy time to expand healthcare capacity, develop effective testing and tracing mechanisms, and research pharmaceutical options, such as a vaccine.
Indeed, the implementation of NPIs caused a rapid decline of new cases and deaths, although at a very high economic and social cost. Understanding precisely how NPIs affect disease transmission is desirable in order to safely roll back NPIs and minimize their adverse effects. Prior works have quantified the effects of NPIs on the reproductive rate of SARS-CoV-2 in China, 2 UK, 3 Brazil, 4 and 14 European countries, including Italy, Spain, and Germany. 5,6 Except for some large urban areas, 7 these effects have not yet been quantified for the U.S., which delegated NPI implementation to state and local governments rather than establishing a unified, federal approach. As a result, it is necessary to quantify the effect of NPIs on the county level, but the limited number of documented fatalities in many counties would limit this analysis to urban areas with sufficient data for epidemiological modeling.
We develop a data-driven approach that establishes county similarity based on factors known to affect community transmission of infectious diseases. Within groups of similar counties, we jointly optimize the parameters of a Bayesian hierarchical inference model to the observed deaths in every county under the assumption that the same NPI attains comparable effects across counties that are similar with respect to disease transmission factors. Using this approach, we estimate the change in due to implementing NPIs in 1,417 U.S. counties that exhibit substantially different characteristics regarding population density, economy, demographics, and infrastructure. We fit a model to groups of counties with similar characteristics which gives us a probable trajectory over time and consequently, the probable number of people infected over time.

Conclusions
Based on the model, as of May 28 th , 2020 1,177 out of the considered 1,417 U.S. counties are estimated to have reduced the reproductive rate of novel coronavirus to below 1.0 via the implementation of NPIs. The estimated effect of any individual NPI, however, may differ across counties. We observe that for metropolitan and urban counties, the most substantial reduction is attributed to stay-at-home orders while less restrictive NPIs were estimated to be effective in more rural counties. Further, the expected level of infection predicted by the model, ranging from 0 to 28.7%, is far from herd immunity even in counties with advanced spread.
While the model explains the observed trends in fatalities well, the rapid escalation of the COVID-19 situation and quick succession in the implementation of NPIs in most counties within few days from another complicates the disentanglement of the effects of any individual NPI. Further, even if accurate, the effect attributed to any NPI may not describe the consequences of roll back of the same NPI because social norms and individuals' behavior may have changed, including the wearing of masks in public.
Despite these limitations, our results suggest that strategies for shutdown as well as re-opening require careful consideration of county characteristics in addition to state and national trends.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint

Characterizing Groups of Similar Counties as Clusters
Since our hypothesis is that local characteristics affect the spread of COVID-19, we differentiate among groups of counties using a data-driven approach known as clustering. Each group -or cluster -of counties is characterized by having similar demographic and socioeconomic qualities. For instance, cluster 1 consists of low-population, mostly rural counties with little public transit capacity and the lowest median household income. Cluster 2 and 3 are similar in size, having a mean population of 45,000 and 52,000 respectively, but cluster 2 includes higherincome, suburban areas where-as cluster 3 has lower income areas with a large land area. Cluster 4 consists of the densest urban areas with a high proportion of 18-to 65-year-olds, high household income, a high public transit score, and small land area. Finally, cluster 5 has the highest mean population besides cluster 4, but is less densely populated, has poor public transit, and a lower household income on average. It is important to note that although we refer to these clusters with numbers 1-5, these are merely labels returned by the clustering algorithm without any meaning inherent to the ordering. shows our clustering for all U.S. counties with this data available, including those without any incidence of COVID-19. Figure 1: Cluster labels based on demographic and socioeconomic characteristics are used to aggregate data and specialize epidemiological models. Here, one can see how cluster 1 and 3 primarily cover rural areas, while clusters 5, 2, and 4 consist of increasingly urban counties. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint Once we have identified the clusters, we infer the reproductive rate of SARS-CoV-2 over time. Figure 2 shows an example of this progression for counties ordered by their public transportation use. Figure 2: Relationship between public transit capacity and the time-dependent reproductive rate of SARS-CoV-2 for U.S. counties. Colors correspond to clusters as in Figure 1. In the cluster which relies less on public transport, shown in yellow, the reproductive rate dropped somewhat earlier, possibly indicating a difference in response to various NPIs.
We observe that although the basic reproduction rate of SARS-CoV-2 starts at a similar level for all clusters, the speed at which counties in each cluster responds to the disease, and reduces its differs. The dense, urban cluster, shown in purple, which tends to have higher reliance on public transportation decreases over the entire period. This is especially apparent going from March 15 to March 25. On the other hand, a cluster with little transportation use, such as cluster 5, shown in yellow, was able to quickly reduce transmission rates. This suggests that if people have a higher reliance on public infrastructure, more stringent interventions are necessary to reduce transmission. Comparisons with more features are shown in the methods section below. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

Estimates of Initial and Current Reproductive Rates and Number of Infected
The copyright holder for this preprint this version posted June 1, 2020. From Table 1, we observe that most counties exhibited an initial reproductive rate above 3. As of May 28 th , however, most have successfully reduced the reproductive rate to 1 through NPIs. It is estimated that 18.4 to 42.7% of the population (95% confidence interval) has been infected in New York, NY, which has the most advanced spread. Based on the initial reproductive rates between 2 and 4, herd immunity is reached only after 50 -70% of the population has recovered, 8,9 suggesting that all United States counties are far from resilience. Consequently, easing restrictions is likely to result in subsequent waves of the epidemic. We state these findings for 15 heterogeneous counties in Table 1 and provide the same metrics for all 1,417 counties in supplementary materials (https://github.com/JieYingWu/npi-model).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Learned Effects of NPIs
Using a cluster-specialized model, we quantify the effectiveness of NPIs, as shown below in Table 2.   We note that our model estimates different behavior across counties. While all counties have implemented a similar set of interventions, their estimated effects are substantially different in each respective cluster. For example, metropolitan counties (cluster 4) were estimated to have a strong response to stay-at-home orders, while rural and suburban areas (clusters 1, 2, 3, and 5) responded more to national-level interventions according to our model.
One caveat is that the effects of interventions that were implemented close together are difficult to disentangle. Many local governments implemented formal NPIs in response to these guidelines, leading a quick succession of NPIs coming into effect. Disentanglement of the effects of NPIs is explored more in the methods section.
Another limitation of our model is that the federal level interventions may have come before some counties have seen any cases. Therefore, it is impossible to discern what the would have been without any intervention. This may lead to higher weights in federal guidelines and travel ban since other interventions are not generally implemented before a county has seen cases. Additionally, since our model is mechanistic, it only allows for decreases of the transmission rate due to interventions or decrease in the susceptible pool. Other events, such as high-profile cases and cancellations of prominent festivals, increased awareness of the disease and may also have effects on individual behavior, and therefore the . These effects may falsely increase the weights of interventions that come into effect at around the same time.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint 8 Figure 3 shows an overview of the proposed method. We train a Bayesian hierarchical inference (BHI) model in a cluster-specialized manner and compare with the baseline model trained on all counties. We show the stability of our model in two ways, first by withholding random days during training and second by withholding given counties. We compare the predictions based on incomplete data with those based on complete data. Figure 3: (a) The fitting process for our model. We compare the performance of the baseline model, which is fit to all eligible counties, with cluster-specialized models BHI 1-5. (b) Two validation methods for our model. Validation A tests the stability of the model by withholding 3 random days and comparing the outputs. Validation B withholds a set of counties for comparison rather than days.

Method Overview
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint Quantifying changes in COVID-19's reproductive rate is complicated when considering differences at the county -rather than the national -level. Parametric epidemiological models are optimized to describe fatality counts, the volume and reliability of which decreases outside of highly populated regions. To make up for this scarcity, we leverage a balanced clustering of U.S. counties to aggregate data from similar counties in the same state, treating them as a single "super-county," if they have identical NPI implementation dates. This has the advantage of considering counties that would otherwise be excluded without assuming that the spread of the disease in those counties follows the same trend of more advanced regions in the same state or country. Figure 4 shows how for a given state-in this case Texas, and a given cluster, the counties with 1-49 cumulative deaths from COVID-19 are treated as a single entity. In addition to this data aggregation strategy, we fit a cluster-specialized model to each set of counties, quantifying the possibly disparate effects of the NPIs in each type of county, as detailed below. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020.  To generate this clustering, we partition 3,059 U.S. counties into five clusters based on variables which directly affect disease spread, using a Gaussian mixture model. 10 Table 4 summarizes these variables, which include demographic, economic, and public transit capacities gathered in a publicly available dataset. 11 Sources include the United States Census Bureau, the United States Department of Agriculture Economic Research Service and the Center for Neighborhood Technology. A full list of sources can be found at the corresponding website. 12 To incorporate potential exposure, we consider county population. 11 To incorporate potential exposure, we consider county population density, housing density, and land area. Additionally, we consider portions of the population for age-and gender-based demographic categories, due to COVID-19's disparate effects on these groups. [13][14][15][16][17] Our clustering is also based on socioeconomic variables, which may indicate behavioral traits relevant to the spread of COVID-19. For instance, workers with tertiary education are more likely to hold office-type jobs which can be done from home. 18 At the same time, many secondary-education jobs have been deemed essential, requiring a high contact rate, which in turn increases the likelihood of infection. Thus, our clustering considers college education, poverty, unemployment, and median household income for each county as a reflection of the overall job composition in the local area. Finally, we include a population-weighted transit score due to the likelihood of transmission in the enclosed, possibly crowded space that public transport entails. Figure 5 and Figure 6 show the plotted over median income and density for counties and supercounties in the different clusters. Super-counties are visualized as a single point using their population-weighted average for that feature. The plots show the distribution of the cluster over the features and its correlation with how changes.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint Notably, we exclude racial demographics from the variables considered during clustering. This is despite the possibly strong relationship between these data and the transmission or mortality rates of COVID-19. Minority communities are more likely to make up essential workers, who have greater exposure to the virus. African American communities in particular suffer from greater incidence of HIV as well as higher infant mortality rates. 19 Thus, we do not include racial demographics because we believe that no relationship exists between race and incidence of COVID-19 directly. Rather, such a relationship would be a byproduct of existing socioeconomic inequality resulting from systemic racism. Racial demographic makeup may still influence our clustering, but only because it affects already included factors.

Modelling the Effects of NPIs Data Processing
We use cumulative fatality and infection counts from the JHU CSSE COVID-19 Dashboard, which has been tracking COVID-19 since January. 1 When fitting our model, we use measured fatality rates, which are generally considered more reliable than confirmed infections because of limited testing and the prevalence of asymptomatic cases. Thus, we use population-weighted fatality rates to estimate the true cases count. Obtaining a reasonable estimate for this ratio is crucial to realistically model the numbers of total infections. However, due to asymptomatic cases, undertesting and biased reporting, this parameter cannot be measured directly, but has to be inferred from observable data. [20][21][22] These studies all report values with substantial uncertainty but agree on the fact that fatality for COVID-19 depends strongly on the age of the infected person. Therefore, we adapt the fatality rates per age group presented in Verity et al. 20 for each county with respect to its demographic age distribution. Based on U.S. Census data, a per-county weighted fatality rate is computed using the share of each age group in the overall population.

Model
We estimate the effective reproductive rate using a semi-mechanistic Bayesian hierarchical inference model proposed in Flaxman et al. 5,23 , that infers the impact of a predefined set of interventions and estimates the number of infections over time. The model estimates a countyspecific initial reproductive rate , and intervention weights , the effect of which is assumed constant for all counties included in joint optimization. These effects are assumed multiplicative, modeled as where is the county index and , is a binary indicator for intervention being in place at time . The interventions we take into account here are summarized in Table 1.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint The model assumes a normal distribution as the prior for the . We set the prior on ~ 3.28, | | where ~ 0, 0.5 . The value of 3.28 is in accordance with the analysis presented in Liu et al. 24 Starting from the time-varying , a latent function of daily infections is modeled depending on a number of factors: a generation distribution g with density g(τ) that models the time between spread of infection from an individual to the next (approximated as the serial interval distribution), the number of susceptible individuals left in the population, an infection-to-death distribution, and the county-specific time-varying reproduction number that models the average number of secondary infections at any given time. The parameters for the distributions are chosen in accordance to Flaxman et al. 5 Model fitting is driven by the timeseries of observed daily deaths. These are linked to the modelled number of infections by the county specific weighted fatality rate. The sum of past infections, along with the weighted probability of death gives the number of deaths on a day for a given county. To ensure that the deaths accounted for are from locally acquired infections, we include observed deaths in a county only after the cumulative count has exceeded 10. The seeding of new infections is assumed to be a month prior to that.
All parameters are estimated jointly using an adaptive Hamiltonian Monte Carlo (HMC) sampler in the probabilistic programming language Stan. 25

Validation
We propose three validation schemes to validate our approach. First, we show that our model is stable, so a small change in the input produces a small change in the output. Second, we separate our counties and super-counties into train and test sets. We fit models for each cluster as well as for all counties and super-counties on either, both the train and test set or with only the train set. We evaluate using the parameters of the "correct" cluster model to predict fatalities for the heldout regions. Comparing these predictions to those made when the regions are included during training confirms that the model is not over-reliant on each data point. Third, we evaluate how well our model discerns the effects of individual NPIs and discuss biases it may have to attribute more weight to certain NPIs.

Validating the Stability of the Model
To show stability in our model, we train the model, withholding three non-consecutive days chosen at random, and compare the predictions for these days to the baseline model fit on full data. Only days with non-zero death counts are selected as potential leave-out candidates. We set the threshold for data selection at counties with 50 cumulative deaths by May 28 th , which results in 211 counties. (For model validation, we do not use super-counties.) We fit the model over 300 total iterations, with 150 of those as warmup iterations. We compare the predictions of the two models for the held-out days and report our observations in Figure 7a. The clustering of points around the optimal fit visually confirms that the predicted values from the model with held-out days. In Figure 5b, we average the value of the withheld days and compare again to our baseline . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint model fitted on full data and observe a more concentrated distribution around the line of optimal fit. Using the Pearson-correlation coefficient, we find high similarity of 1.00 between our models for the held-out days and the average held-out days which suggests that the model is robust against small input perturbations.

Validating the Advantages of Clustering
We show the advantage of cluster-specialized models for quantifying the effects of NPIs by comparing their performance with models trained on different clusters or on the national level. While a clustering of U.S. counties may or may not be interesting, its value related to COVID-19 comes from its ability to identify epidemiologically meaningful differences among various regions in the country. When fitting each model, we withhold a validation region from each cluster and use the remaining counties to fit Equation 1, for both cluster-specialized and baseline models. We then use the learned weights to initialize a fixed-BHI model for each withheld region.
With this scheme, we show the advantage of clustering U.S. counties in three ways. First, we show that within the same cluster, we obtain comparable predictions for a county whether or not it is included in training. Second, we observe that values learned from a different cluster produce substantially different predictions. Finally, we apply the values learned at the national level to similar effect. This demonstrates how cluster-specialized models can reveal trends in the spread of COVID-19 that are not apparent under the assumption that NPIs have universal effect at the national or state level.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint (c) On the other hand, using values from a different cluster makes it appear as though the curve has flattened, even if it hasn't. This occurs because the same NPIs had a stronger effect in cluster 4 than they did in cluster 3. Figure 8 shows the first and second validation strategy for cluster 3, which consists of lesspopulous counties with large land area. For such regions, which necessarily have fewer cases to support local models, it is vital to gain accurate insight from the entire training pool and verify these insights through the aforementioned validation process. Thus, in this illustrative example, we use data up to May 18, at which point the continuing spread of cases was not immediately clear. Indeed, Figure 8a shows the predictions obtained for seven counties in Arizona (treated as a single super-county). One can readily observe the model takes into account random outliers while following a general upward trend. Figure 8b exhibits less awareness of outliers but still follows the same trend, despite using α values obtained from training on cluster 3 with these same counties withheld. On the other hand, Figure 8c shows the predictions made when using the wrong α values, from cluster 4, which consists of densely populated city centers. As can be seen, this results in a markedly different prediction for the trend of the disease. This is a critical difference, which could result in very different decisions for implementing or removing NPIs.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint Although this difference is more pronounced for the District of Columbia than for most regions, it underscores the potential for misleading predictions when the United States' heterogeneity is not taken into account.
Finally, we compare the performance of our cluster-specialized model with the baseline model, which uses all eligible counties for training. In many cases, this difference is slight, and both the baseline and cluster-specialized models exhibit similar trends for a given region. However, for certain areas the difference can be profound. Take, for example, the District of Columbia, which has become a significant hot-spot for COVID-19. The baseline model, shown in Figure 9a, has only recently flattened, whereas the cluster-specialized model in Figure 9b reflects the reality that efforts to combat the disease have had much greater effect, significantly reducing the number of deaths. This is because when forced to accommodate counties across the U.S., the baseline model emphasizes federal guidelines with a mean α value of 0.804 (see Table 3). On the other hand, our cluster-specialized model found that stay-at-home orders were much more effective for counties in cluster 4, with a mean α value of 0.963. This difference illustrates the advantage of specializing a BHI model based county-level characteristics; it allows the model to include a greater number of counties within separate clusters while not being forced to overgeneralize and, in doing so, compromise some certainty. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint One drawback of the model is that it cannot disentangle implementations that came into effect at the same time. For example, states often closed public schools at the same time as federal guidelines were issued so it is difficult to attribute which of them had the real effect on reducing . Additionally, in Table 5, we observe that banning gatherings of 50 or more people often occurs at the same time as 500 or more, and restaurants and entertainment venues are often closed together. This makes these pairs of interventions almost impossible to disentangle from each other.
To further investigate the model's ability to disentangle intervention weights, we create simulated trajectories of counties' deaths and cases counts based on their and the dates on which the interventions came into effect. Using all counties that have more than 50 cumulative deaths on May 28 th without super-counties, we seed each county with 200 cases in each of the first 6 days. To simulate county-specific trajectories, we construct two sets of generated timeseries. In the first set, we assign intervention weights to be randomly generated from a gamma distribution, the same distribution as our prior on the Bayesian hierarchical model adjusted to be in the range of our learned weights. In the second experiment, we set all of them to a constant value of 0.16, which was chosen so the sum of intervention is in the range of the sum of the learned weights. We then calculate what the on each day must have been based on the and the interventions in place. Once we have the seeded infection and the trajectory for each county, we can calculate daily infections and thus expected fatalities. Using the simulated trajectories, we fit the model. Table 6 compares the weights used for generation with the weights that the model learned.  We observe that the effects of individual NPIs are not well disentangled in general. The model tends to attribute more weight to few NPIs rather than spread out the weight evenly. Specifically, the model tends to put more weight on shelter-in-place. This may be because interventions I2-I8 are often implemented close together (see Table 5) and it is difficult to attribute effect to any single one of them on a national scale. While we can conclude that the trajectory the model . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 1, 2020. . https://doi.org/10.1101/2020.05.31.20118687 doi: medRxiv preprint predicts are reliable, due to their match to measured death, and therefore the overall decrease in is reliable, attributing decreases to an individual NPI is challenging.