## Abstract

In response to the rapid spread of the novel coronavirus, SARS-CoV-2, the U.S. has largely delegated implementation of non-pharmaceutical interventions (NPIs) to local governments on the state and county level. This staggered implementation combined with the heterogeneity of the U.S. complicates quantification the effect of NPIs on the reproductive rate of SARS-CoV-2.

We describe a data-driven approach to quantify the effect of NPIs that relies on county-level similarities to specialize a Bayesian mechanistic model based on observed fatalities. Using this approach, we estimate change in reproductive rate, *R _{t}*, due to implementation of NPIs in 1,417 U.S. counties.

We estimate that as of May 28^{th}, 2020 1,177 out of the considered 1,417 U.S. counties have reduced the reproductive rate of SARS-CoV-2 to below 1.0. The estimated effect of any individual NPI, however, is different across counties. Stay-at-home orders were estimated as the only effective NPI in metropolitan and urban counties, while advisory NPIs were estimated to be effective in more rural counties. The expected level of infection predicted by the model ranges from 0 to 28.7% and is far from herd immunity even in counties with advanced spread.

Our results suggest that local conditions are pertinent to containment and re-opening decisions.

## 1 Introduction

As of May 28^{th}, 2020, the United States has reported more than 1,700,000 cases of novel coronavirus 2019 (COVID-19). This disease, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, has led to more than 100,000 deaths in the U.S.^{1} Non-pharmaceutical interventions (NPIs) are a critical component of the public health effort to slow the spread of COVID-19 as pharmaceutical interventions are not currently available. NPIs include guidelines for hand hygiene, cancellations of mass events, school closures, closure of non-essential business, and stay-at-home orders. These measures are designed to reduce transmission, buy time to expand healthcare capacity, develop effective testing and tracing mechanisms, and research pharmaceutical options, such as a vaccine.

Indeed, the implementation of NPIs caused a rapid decline of new cases and deaths, although at a very high economic and social cost. Understanding precisely how NPIs affect disease transmission is desirable in order to safely roll back NPIs and minimize adverse effects. Prior works have quantified the effects of NPIs on the reproductive rate *R _{t}* of SARS-CoV-2 in China,

^{2}UK,

^{3}Brazil,

^{4}and 14 European countries, including Italy, Spain, and Germany.

^{5,6}Except for some large urban areas,

^{7}these effects have not yet been quantified for the U.S., which delegated NPI implementation to state and local governments rather than establishing a unified, federal approach. As a result, it is necessary to quantify the effect of NPIs on the county level, but the limited number of documented fatalities in many counties would limit this analysis to urban areas with sufficient data for epidemiological modeling.

We develop a data-driven approach that establishes county similarity based on factors known to affect community transmission of infectious diseases. Within groups of similar counties, we jointly optimize the parameters of a Bayesian mechanistic model to the observed deaths in every county under the assumption that the same NPI attains comparable effects across counties that are similar with respect to disease transmission factors. Using this approach, we estimate the change in *R _{t}* due to implementing NPIs in 1,417 U.S. counties that exhibit substantially different characteristics regarding population density, economy, demographics, and infrastructure. We fit a model to groups of counties with similar conditions which gives us a probable

*R*trajectory over time and consequently, the probable number of people infected over time.

_{t}## 2 Conclusions

Based on the model, as of May 28^{th}, 2020 1,177 out of the considered 1,417 U.S. counties are estimated to have reduced the reproductive rate of novel coronavirus to below 1.0 via the implementation of NPIs. The estimated effect of any individual NPI, however, may differ across counties. We observe that for metropolitan and urban counties, the most substantial reduction is attributed to stay-at-home orders while less restrictive NPIs were estimated to be effective in more rural counties. Further, the expected level of infection predicted by the model, ranging from 0 to 28.7%, is far from herd immunity even in counties with advanced spread.

While the model explains the observed trends in fatalities well, the rapid escalation of the COVID-19 situation and quick succession in the implementation of NPIs in most counties within few days from another complicates the disentanglement of the effects of any individual NPI. Further, even if accurate, the effect attributed to any NPI may not describe the consequences of roll back of the same NPI because social norms and individuals’ behavior may have changed, including the wearing of masks in public.

Despite these limitations, our results suggest that strategies for shutdown as well as re-opening require careful consideration of county conditions in addition to state and national trends.

## 3 Results

### Characterizing Groups of Similar Counties as Clusters

Since our hypothesis is that local conditions affect the spread of COVID-19, we differentiate among groups of counties using a data-driven approach known as clustering. Each group – or cluster – of counties is characterized by having similar demographic and socioeconomic qualities. For instance, cluster 1 consists of low-population, mostly rural counties with little public transit capacity and the lowest median household income. Cluster 2 and 3 are similar in size, having a mean population of 45,000 and 52,000 respectively, but cluster 2 includes higher-income, suburban areas where-as cluster 3 has lower income areas with a large land area. Cluster 4 consists of the densest metropolitan areas with a high proportion of 18- to 65-year-olds, high household income, a high public transit score, and small land area. Finally, cluster 5 has the highest mean population besides cluster 4, but is less densely populated, has poor public transit, and a lower household income on average. It is important to note that although we refer to these clusters with numbers 1–5, these are merely labels returned by the clustering algorithm without any meaning inherent to the ordering Figure 1. shows our clustering for all U.S. counties with this data available, including those without any incidence of COVID-19.

Once we have identified the clusters, we infer the reproductive rate of SARS-CoV-2 over time. by jointly optimizing the parameters of a Bayesian mechanistic model on each cluster. This process is described in greater detail in Section 4. Figure 2 shows an example of this progression for counties ordered by their public transportation use.

We observe that although the basic reproductive rate of SARS-CoV-2 starts at a similar level for all clusters, the speed at which counties in each cluster respond to the disease, and reduce its *R _{t}* differs. The reproductive rate in metropolitan counties (cluster 4), which tends to have higher reliance on public transportation decreases over the entire period. This is especially apparent going from March 15 to March 25. On the other hand, cluster with little transportation use, such as cluster 5, was estimated to have quickly reduced transmission rates. This suggests that if people have a higher reliance on public infrastructure, more stringent interventions are necessary to reduce transmission. The staggered drops in reproductive rates are attributed to both varied efficacy of NPIs (to be discussed in the next section) and counties implementing NPIs at different times. Comparisons with more features are shown in Section 4.

### Estimates of Initial and Current Reproductive Rates and Number of Infected

From Table 1, we observe that most counties exhibited an initial reproductive rate *R*_{0} above 3. As of May 28^{th}, however, most have successfully reduced the reproductive rate to *R _{t}* ≈ 1 after implementing NPIs. It is estimated that 18.4 to 42.7% of the population (95% confidence interval) has been infected in New York, NY, which has the highest number of cases. Based on the initial reproductive rates between 2 and 4, herd immunity is reached only after 50 – 70% of the population has recovered,

^{8,9}suggesting that all U. S. counties are far from resilience. Consequently, easing restrictions is likely to result in subsequent waves of the epidemic. We state these findings for 15 heterogeneous counties in Table 1 and provide the same metrics for all 1,417 counties online at https://github.com/JieYingWu/npi-model, where we also provide code and data required for reproduction.

### Learned Effects of NPIs

Using a cluster-specialized model, we quantify the effectiveness of NPIs, as shown below in *Table 2* and Table 3.

We note that our model estimates different behavior across counties. While all counties have implemented a similar set of interventions, their estimated effects are substantially different in each respective cluster. For example, metropolitan counties (cluster 4) were estimated to have a strong response to stay-at-home orders, while rural and suburban areas (clusters 1, 2, 3, and 5) responded more to national-level interventions according to our model.

One caveat is that the effects of interventions that were implemented close together are difficult to disentangle. Many local governments implemented formal NPIs in immediate response to the federal guidelines, leading a quick succession of NPIs coming into effect. Disentanglement of the effects of NPIs is explored more in Section 4.

Another limitation of our model is that the federal level interventions may have come before some counties have seen any cases. This introduces ambiguity between *R*_{0} estimates and the weight attributed to federal NPIs, i.e., federal guidelines and travel ban. Similar ambiguity is less prominent for other interventions, since they are not generally implemented before a county has seen cases. Additionally, since the epidemiological model used here is mechanistic, it only allows for changes of the transmission rate at the time of implementation of interventions. Other events, such as high-profile cases and cancellations of prominent festivals, likely contributed to increasing awareness of the disease and may also have effects on individual behavior, and therefore the *R _{t}*. These effects cannot be attributed to a specific date and may thus affect the weights of interventions that come into effect at around the same time.

## 4 Methods

### Method Overview

Figure 3 shows an overview of the proposed method. We train a Bayesian mechanistic model (BMM) model in a cluster-specialized manner and compare with the baseline model trained on all counties. We show the stability of our model in two ways, first by withholding random days during training and second by withholding given counties. We compare the predictions based on incomplete data with those based on complete data.

### County-level Clustering

Quantifying changes in COVID-19’s reproductive rate is complicated when considering differences at the county – rather than the national – level. Parametric epidemiological models are optimized to describe fatality counts, the volume and reliability of which decreases outside of highly populated regions. To make up for this scarcity, we leverage a balanced clustering of U.S. counties to aggregate data from similar counties in the same state, treating them as a single “super-county,” if they have identical NPI implementation dates. This has the advantage of considering counties that would otherwise be excluded without assuming that the spread of the disease in those counties follows the same trend of more advanced regions in the same state or country. Figure *4* shows how for a given state-in this case Texas, and a given cluster, the counties with 1–49 cumulative deaths from COVID-19 are treated as a single entity. In addition to this data aggregation strategy, we fit a cluster-specialized model to each set of counties, quantifying the possibly disparate effects of the NPIs in each type of county, as detailed below.

To generate this clustering, we partition 3,059 U.S. counties into five clusters based on variables which directly affect disease spread, using a Gaussian mixture model.^{10} *Table 4* summarizes these variables, which include demographic, economic, and public transit capacities that we have gathered, processed for machine readability, and released in a publicly available dataset.^{11} Sources include the United States Census Bureau, the United States Department of Agriculture Economic Research Service and the Center for Neighborhood Technology. A full list of sources can be found on the corresponding website.^{11} To incorporate potential exposure, we consider county population density, housing density, and land area. Additionally, we consider portions of the population for age- and gender-based demographic categories, due to COVID-19’s disparate effects on these groups.^{12–16}

Our clustering is also based on socioeconomic variables, which may indicate behavioral traits relevant to the spread of COVID-19. For instance, workers with tertiary education are more likely to hold office-type jobs which can be done from home.^{17} At the same time, many secondary-education jobs have been deemed essential, resulting in a high contact rate, which in turn increases the likelihood of infection. Thus, our clustering considers college education, poverty, unemployment, and median household income for each county as a reflection of the overall job composition in the local area. Finally, we include a population-weighted transit score due to the likelihood of transmission in the enclosed, possibly crowded space that public transport entails.

Figure *5* and Figure *6* show the plotted over median income and density for counties and super-counties in the different clusters. Super-counties are visualized as a single point using their population-weighted average for that feature. The plots show the distribution of the cluster over the features and its correlation with how *R _{t}* changes.

Notably, we exclude ethnic demographics from the variables considered during clustering. because we assume that no direct relationship exists between race and incidence of COVID-19. Rather, such a relationship would be mediated by socioeconomic factors that are already included for similarity assessment.

### Modelling the Effects of NPIs

#### Data Processing

We use cumulative fatality and infection counts from the JHU CSSE COVID-19 Dashboard, which has been tracking COVID-19 since January.^{1} When fitting our model, we use measured fatality rates, which are generally considered more reliable than confirmed infections because of limited testing and the prevalence of asymptomatic cases. Thus, we use population-weighted fatality rates to estimate the true cases count. Obtaining a reasonable estimate for this ratio is crucial to realistically model the numbers of total infections. However, due to asymptomatic cases, undertesting and biased reporting, this parameter cannot be measured directly, but has to be inferred from observable data.^{19–21} Previous studies all report fatality rates with substantial uncertainty but agree on the fact that fatality for COVID-19 depends strongly on the age of the infected person. Therefore, we adapt the fatality rates per age group presented in Verity et al.^{19} for each county with respect to its demographic age distribution. Based on U.S. Census data, a per-county weighted fatality rate is computed using the share of each age group in the overall population.

#### Model

We estimate the effective reproductive rate using a semi-mechanistic Bayesian mechanistic model proposed in Flaxman et al.^{5,22}, that infers the impact of a predefined set of interventions and estimates the number of infections over time. The model estimates a county-specific initial reproductive rate *R*_{0, m} for the *m*-th county and intervention weights *α _{i}*, the effect of which is assumed constant for all counties included in joint optimization. These effects are assumed multiplicative, modeled as
where

*m*is the county index and is a binary indicator for intervention

*i*being in place at time

*t*. The interventions we take into account here are summarized in Table 1.

The model assumes a normal distribution as the prior for the *R*_{0}. We set the prior on *R*_{0}∽*N*(3.28, ∣*κ*∣) where *κ* ∽ *N*^{+}(0, 0.5), The value of 3.28 is in accordance with the analysis presented in Liu et al.^{23}

Starting from the time-varying *R _{t}*, a latent function of daily infections is modeled depending on a number of factors: a generation distribution

*g*with density

*g(τ)*that models the time between spread of infection from an individual to the next (approximated as the serial interval distribution), the number of susceptible individuals left in the population, an infection-to-death distribution, and the county-specific time-varying reproduction number that models the average number of secondary infections at any given time. The parameters for the distributions are chosen in accordance to Flaxman et al.

^{5}

Model fitting is driven by the timeseries of observed daily deaths. These are linked to the modelled number of infections by the county specific weighted fatality rate. The sum of past infections, along with the weighted probability of death gives the number of deaths on a day for a given county. To ensure that the deaths accounted for are from locally acquired infections, we include observed deaths in a county only after the cumulative count has exceeded 10. The seeding of new infections is assumed to be a month prior to that.

All parameters are estimated jointly using an adaptive Hamiltonian Monte Carlo (HMC) sampler in the probabilistic programming language Stan.^{24}

### Validation

We propose three validation schemes to validate our approach. First, we show that our model is stable, so a small change in the input produces a small change in the output. Second, we separate our counties and super-counties into train and test sets. We fit models for each cluster as well as for all counties and super-counties on either, both the train and test set or with only the train set. We evaluate using the parameters of the “correct” cluster model to predict fatalities for the held-out regions. Comparing these predictions to those made when the regions are included during training confirms that the model is not over-reliant on each data point. Third, we evaluate how well our model discerns the effects of individual NPIs and discuss biases it may have to attribute more weight to certain NPIs.

#### Validating the Stability of the Model

To show stability in our model, we train the model, withholding three non-consecutive days chosen at random, and compare the predictions for these days to the baseline model fit on full data. Only days with non-zero death counts are selected as potential leave-out candidates. We set the threshold for data selection at counties with 50 cumulative deaths by May 28^{th}, which results in 211 counties. (For model validation, we do not use super-counties.) We fit the model over 300 total iterations, with 150 of those as warmup iterations. From the uncertainty distributions obtained as results from the models, we use the expected values of the deaths of the held-out days to compare the two models and report our observations in Figure 7a. The clustering of points around the optimal fit visually confirms that the expected values of the baseline model fit those from the model with held-out days. In Figure 7b, we compare the average county-wise expected value of deaths of the withheld days and observe a more concentrated distribution around the line of optimal fit.

Using the Pearson-correlation coefficient to quantify the similarity between the expected values, we find r-values greater than 0.99 for the held-out days and the average held-out days which suggest that the model is robust against small input perturbations.

#### Validating the Advantages of Clustering

We show the advantage of cluster-specialized models for quantifying the effects of NPIs by comparing their performance with models trained on different clusters or on the national level. While a clustering of U.S. counties may or may not be interesting, its value related to COVID-19 comes from its ability to identify epidemiologically meaningful differences among various regions in the country. When fitting each model, we withhold a validation region from each cluster and use the remaining counties to fit Equation(1), for both cluster-specialized and baseline models. We then use the learned weights *α _{i}* to initialize a fixed-

*α*BHI model for each withheld region.

With this scheme, we show the advantage of clustering U.S. counties in three ways. First, we show that within the same cluster, we obtain comparable predictions for a county whether or not it is included in training. Second, we observe that *α* values learned from a different cluster produce substantially different predictions. Finally, we apply the *α* values learned at the national level to similar effect. This demonstrates how cluster-specialized models can reveal trends in the spread of COVID-19 that are not apparent under the assumption that NPIs have universal effect at the national or state level.

Figure 8 shows the first and second validation strategy for cluster 3, which consists of less-populous counties with large land area. For such regions, which necessarily have fewer cases to support local models, it is vital to gain accurate insight from the entire training pool and verify these insights through the aforementioned validation process. Thus, in this illustrative example, we use data up to May 18^{th}, at which point the continuing spread of cases was not immediately clear. Indeed, *Figure 8a* shows the predictions obtained for seven counties in Arizona (treated as a single super-county). One can readily observe the model takes into account random outliers while following a general upward trend. Figure 8b exhibits less awareness of outliers but still follows the same trend, despite using α values obtained from training on cluster 3 with these same counties withheld. On the other hand, Figure 8c shows the predictions made when using the wrong α values, from cluster 4, which consists of densely populated city centers. As can be seen, this results in a markedly different prediction for the trend of the disease than actually occurred up to May 28^{th} in Figure 8b. This is a critical difference, which could result in very different decisions for implementing or removing NPIs.

Finally, we compare the performance of our cluster-specialized model with the baseline model, which uses all eligible counties for training. In many cases, this difference is slight, and both the baseline and cluster-specialized models exhibit similar trends for a given region. However, for certain areas the difference can be profound. Take, for example, the District of Columbia, which has become a significant hot-spot for COVID-19. The baseline model, shown in *Figure 9a*, has only recently flattened, whereas the cluster-specialized model in *Figure 9b* reflects the reality that efforts to combat the disease have had much greater effect, significantly reducing the number of deaths. This is because when forced to accommodate counties across the U.S., the baseline model emphasizes federal guidelines with a mean α value of 0.804 (see *Table 3*). On the other hand, our cluster-specialized model found that stay-at-home orders were much more effective for counties in cluster 4, with a mean α value of 0.963. This difference illustrates the advantage of specializing a BHI model based county-level characteristics; it allows the model to include a greater number of counties within separate clusters while not being forced to overgeneralize and, in doing so, compromise some certainty.

### Disentanglement

One drawback of thes model is that it cannot disentangle implementations that came into effect at the same time. For example, states often closed public schools at the same time as federal guidelines were issued so it is difficult to discern the individual effect on reducing *R _{t}*. Additionally, in

*Table 5*, we observe that banning gatherings of 50 or more people often occurs at the same time as banning gatherings 500 or more, and restaurants and entertainment venues are often closed together. This suggests that these pairs of interventions may not or be poorly disentangled.

To further investigate the model’s ability to disentangle intervention weights, we create simulated trajectories of counties’ deaths and cases counts based on their *R _{0}* and the dates on which the interventions came into effect. Using all counties that have more than 50 cumulative deaths on May 28

^{th}without super-counties, we seed each county with 200 cases in each of the first 6 days. To simulate county-specific trajectories, we construct a sets of generated timeseries. In the first set, we assign intervention weights

*α*to be randomly generated from a Gamma distribution, the same distribution as our prior on the Bayesian mechanistic model adjusted to be in the range of our learned weights. We then calculate what the

_{i}*R*on each day must have been based on the

_{t}*R*

_{0}and the interventions in place. Once we have the seeded infection and the

*R*trajectory for each county, we can calculate daily infections and thus expected fatalities. Using the simulated trajectories, we fit the model.

_{t}*Table 6*compares the weights used for generation with the weights that the model learned.

We observe that the effects of individual NPIs are not well disentangled in general. The model tends to attribute more weight to few NPIs rather than spread out the weight evenly. Specifically, the model tends to put more weight on stay-at-home orders. This may be because interventions I2-I8 are often implemented close together (see *Table 5*) and it is difficult to attribute effect to any single one of them on a national scale. While we can conclude that the trajectories the model predicts are reliable, due to their match to measured death, and therefore the overall decrease in *R _{t}* is reliable, attributing decreases to any individual NPI is challenging whenever the difference in implementation date is small. This observation seems to be in line with the similarly large confidence intervals reported in previous work on varied models and regions.

^{2–7}

## Data Availability

All data used in the manuscript is available from public sources.

https://github.com/JieYingWu/COVID-19_US_County-level_Summaries

## Authors’ Contributions

Jie Ying Wu worked on the modeling, disentanglement, and coordinating efforts. Benjamin D Killeen worked on clustering and validating the epidemiological advantage of clustering. Philipp Nikutta worked on processing the data and validating the stability of the model. Mareike Thies did the background research and calculated the weighted fatalities. Anna Zapaishchykova created the methods overview diagram as well as the plots over the features. Shreya Chakraborty worked on processing the data and disentanglement. Mathias Unberath developed the initial concept and direction for this work. Jie Ying Wu, Benjamin D Killeen, and Mathias Unberath wrote the manuscript.

## Competing Interests

The authors have no competing interests.

## Acknowledgements

This article was supported in part by institutional funds provided by the Malone Center for Engineering in Healthcare.

## Footnotes

jieying{at}jhu.edu

killeen{at}jhu.edu

pnikutta1{at}jhu.edu

mthies1{at}jhu.edu

azapais1{at}jhu.edu

schakr20{at}jhu.edu

mathias{at}jhu.edu