## Abstract

The current outbreak of novel coronavirus disease 2019 (COVID-19) poses an unprecedented global health and economic threat to interconnected human societies. Until a vaccine is developed, strategies for controlling the outbreak rely on aggressive social distancing. These measures largely disconnect the social network fabric of human societies, especially in urban areas. Here, we estimate the growth rates and reproductive numbers of COVID-19 in US cities from March 14th through March 19th to reveal a power-law scaling relationship to city population size. This means that COVID-19 is spreading faster on average in larger cities with the additional implication that, in an uncontrolled outbreak, larger fractions of the population are expected to become infected in more populous urban areas. We discuss the implications of these observations for controlling the COVID-19 outbreak, emphasizing the need to implement more aggressive distancing policies in larger cities while also preserving socioeconomic activity.

The coronavirus pandemic of 2019-20 (COVID-19) is an unprecedented worldwide event. Its speed of propagation, its international reach and the unprecedented coordinated measures for its mitigation, are only possible in a world that is more connected and more urbanized than at any other time in history.

As a novel infectious disease in human populations, COVID-19 has a number of quantitative signatures to its pattern of spread. These signatures make its dynamics more difficult to contain but also easier to understand.

First, because there is no history of previous exposure, all human populations in contact with the virus are (presumably) susceptible. This means that the susceptible population is the world’s total population writ large. Second, because COVID-19 is a respiratory disease, it is easily transmissible resulting in high reproductive numbers, *R* = 2.2 − 6.5 (*1–3*), though considerable uncertainty remains about these estimates. Third, COVID-19 appears to be characterized by reproductive numbers above the epidemic threshold (*R* > 1) everywhere around the world, regardless of environmental conditions such as humidity or temperature. These reproductive numbers are considerably higher than seasonal influenza (*4*). Bringing the disease reproductive number below the epidemic threshold (*R → R <* 1) is the main goal of all public health interventions; once this is achieved the disease’s transmission chain reaction will shut down.

The reproductive number is the product of two factors *R* = *β/γ*, the infectious period 1*/γ* (a physiological property) and the contact rate *β*, which is a property of the population, essentially measuring the number of social contacts that can transmit the disease per unit time. Of these, only the contact rate can be changed via public health interventions.

In the absence of a vaccine, social distancing remains the only option to slow down the spread of the disease and arrest potential mortality. Governments around the world are now enacting aggressive policies, including “shelter in place” and emergency closures of all non-essential services, which carry severe economic and social consequences. For example, in the last week, the US Centers for Disease Control and the White House have recommended extreme social distancing in order to slow down the current outbreak of novel coronavirus disease 2019 (COVID-19) (*5*). However, these measures are less aggressive than what has been put in place elsewhere (*6*). There is still a great deal of uncertainty as to how strong social distancing recommendations must be or how long they must last. Importantly, national and regional social distancing policies are likely to impact individual cities differently.

Cities are predicated on extensive and intense socioeconomic interactions. Many of their measurable properties - from the size of their economies, to their crime rates, to the prevalence of certain infectious diseases - are mediated by socioeconomic interactions. These interactions are subject to well known scaling effects, which are magnified by city population size (*7*). All of these relationships are tied to socioeconomic networks with average degree (number of social connections per capita) that increases approximately as a power law of city size *k*(*N*) = *k*_{0}*N*^{δ}, with (*7, 8*).

This is a large effect. Based on data of mobile phone social networks (*8*), people living in a city of 500,000 have, on average, 11 people in their mobile phone social network, while people living in a city of 5,000,000 people have approximately 15. This is relevant to diseasetransmission as the average contact rate is proportional to degree *β ∝ k*(*N*) (see Materials and Methods). Therefore, we expect that initial growth rates of COVID-19 cases to be higher in larger cities (see Materials and Methods). This is what is found empirically (see Figure 1A).

A larger reproductive number for spreading processes in larger cities has two important consequences (*9–12*). First, the reproductive number, *R*, sets a finite threshold for how an epidemic outbreak propagates in a population, just like the branching rate in a chain reaction (*13, 14*). For *R <* 1, an introduced disease will die off because it will be dampened in transmission, while for *R* > 1, disease transmission will be amplified and result in an epidemic where the disease is transmitted quickly to almost everyone (see Figure 1B). Because we expect *R ∼ N*^{δ} to increase with city size, we expect larger cities to be more susceptible to both contagious diseases, but also to the spread of information (see below).

Second, the size of an epidemic outbreak, as measured by the percent of the population that becomes infected, is also related to the reproductive number. In complex epidemic models, this needs to be computed numerically, but for a simple Susceptible-Infected-Recovered (SIR) model (*13*) we can integrate the dynamics and write the explicit expression (see Methods)
where *S*_{0} is the initial suscepible population size (before the outbreak) and *S*_{∞} is the (smaller) final population of susceptible people. A larger *R ∼ N*^{δ} leads to more extensive epidemics. The percent of people infected at the end of the outbreak is 1 − *S*_{∞}*/N* which is larger in populations with larger *R*, such as in larger cities. A final point centers on the fraction of the population that must be removed from the susceptible class when *R* > 1 to stop the outbreak. This is often called the vaccination rate, *p*_{R}, which is equally relevant in the context of social distancing (as only the means of the intervention and its duration differ). In the SIR model, this is simply *p*_{R} = 1 − 1*/R*, which shows that as cities get larger the distancing rate must also increase (see Supplementary Figure 1).

These observations have a number of implications that can inform evolving national, regional, and local responses to the outbreak of COVID-19. First, it is particularly important for larger cities to act quickly to contain this outbreak. Second, social distancing will impact cities differently based on city size. From the perspective of containing the outbreak, larger cities require more aggressive social distancing policies, corresponding to a larger *p*_{R}. At the same time, once the outbreak is contained it might be possible to relax social distancing policies in smaller cities first, allowing a faster return to normal life and economic activity compared to more populous urban areas.

These distinctions may help to bring more nuance to ongoing strategies for suppression and control of COVID-19, including gradually restoring socioeconomic activity in context appropriate and safe ways.

Because of their higher network density, insufficient social distancing in larger or typically more connected cities may lead to bigger outbreaks and to the creation of reservoirs for the disease, which can continue to create introductions elsewhere. These dynamics may also play out within cities, as communities in which people interact more densely from the perspective of disease transmission (e.g., downtowns), may similarly act as contagion reservoirs that may prolong the duration of the present outbreak and potentially create secondary reinfection waves.

Finally, as strategies for controlling this outbreak continue to evolve, it is critical to keep in mind that almost everything that we appreciate about urban environments, including their economic prowess, their ability to innovate, and their role in their inhabitants social and mental health, is predicated on network effects mediated by socioeconomic interactions. The ability to succeed against a fast emerging epidemic like COVID-19 depends on preserving as much person-to-person connectivity (e.g., through technology), while stopping disease transmission. Research on safe types of socioeconomic contact and exchange is therefore paramount so that we can succeed in controlling this outbreak while maintaining livelihoods, socioeconomic capacity, and mental health. This can in principle be done through approaches that make the most of emerging, real-time data to create context appropriate suppression strategies at local, regional, national, and global levels.

The higher socioeconomic connectivity of larger cities in a fast urbanizing world makes containing emergent epidemics harder. But the density of socioeconomic connections in cities can also facilitate the spread of information, social coordination, and innovations necessary to stop the spread of COVID-19. This information and associated actions can easily spread much faster than the biological viral contagion. To fight an exponential, we need to create an even faster exponential!

## Data Availability

COVID-19 case data by county is available at https://github.com/tomquisel/covid19-data. US Census population data is available at https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html. MSA delineation files are available at https://www.census.gov/programs-surveys/metro-micro/about/delineation-files.html.

https://www.census.gov/programs-surveys/metro-micro/about/delineation-files.html

https://www.census.gov/data/tables/time-series/demo/popest/2010s-counties-total.html

## Supplementary materials

### Materials and Methods

#### Data and Urban Units of Analysis

Here we briefly describe the mathematical analysis steps. County level daily data from March 13-24 were aggregated to the city level (Metropolitan Statistical Areas, which are integrated labor markets) using delineation files from the US Office of Budget and Management (*15*). Counties with at least 1 reported case of COVID-19 are available in the data. We next excluded cities that had no COVID-19 cases on March 13. This excluded cities with low case counts, likely due to introductions from outside the MSA, for which accurate estimates of local case growth rate is unlikely. Results were similar for all contiguous subsets of the data of at least 7 days (see Supplmentary Figure 4a). This left 163 cities for further analysis. We substracted deaths from cases in each city and found the slope of the *ln*(*cases*) *∼ ln*(*a*) + *r · t* line for each resulting time-series of active COVID-19 cases. Finally we plotted the natural logarithm of *r* and the natural logarithm of city population from 2018 census estimates (*16*), and performed an ordinary least squares linear regression to determine the slope of the scaling line. Regression residuals were not related to city population (Supplementary Figure 2) and a q-q plot of the residuals indicated that they were well described by a normal distribution (Supplementary Figure 3).

In order to estimate the reproductive number *R* we multiplied the growth rate of each city, *r*, by an average infections period of 1*/γ* = 4.5 days and adding one (see below). The size of the epidemic was then estimated by finding the root of the equation *y* = ln(*x*) + *R ·* (1 − *x*), where *x* = *S*_{∞}*/N*.

Accurately estimating the growth rate of epidemics is often difficult (*17*), however, here we are concerned with the pattern of growth rates among cities rather than the precision of our growth rate estimates. To that end we additionally estimated the growth rate of COVID-19 cases by *r* = *ln*(*cases*_{T} */cases*_{0})*/T* which is an estimate of the slope of the line *ln*(*cases*) *∼ ln*(*a*) + *r · t* from the first and last points of the time series (Supplementary Figure 4b). These growth rate estimates showed a scaling relationship with city size that is consistent with Figure 1 of the main text. This was observed despite variations in growth rate estimates between the two methods

#### Epidemic Models and the Reproductive Rate

Even though well known, we include here the basic derivation of the reproductive rate and final size of the epidemic epidemic models, for the sake of completeness.

The SIR model (Figure 2A) is the simplest relevant description of an epidemic in a population. The model is typically written in terms differential equations as

Here, *β* is known as the contact rate, 1*/γ* is the infectious period, 1*/σ* is the non-infectious incubation time, and *m* is the probability that an infected individual dies (mortality rate).

The reproductive rate *R* can be easily deduced from the initial growth of *I*, when *S/N* ≃ 1, which is
with *R* = *β/γ*. We see that the temporal growth rate . We see that the number of cases will grow exponentially if *R* > 1 and, conversely, decay exponentially when *R <* 1.

At a later time, as the susceptible population becomes depleted, the effective *R* decreases. Distancing or vaccination work by removing people from the susceptible class, which can be modelled by reducing *S/N* or, equivalently for *R, β* since these two factors always appear multiplied together i. the y

To obtain the expression for the final size of the epidemic we note that
where *S*_{∞} is the population of susceptibles left uninfected at the end of the outbreak, and *S*_{0}, *I*_{0}are the size of the susceptible and infected population at the initial time.

The equation for *S* in time can be written as
which now can be integrated to give
which is the desired result, used in the main text to create Figure 1B.

Finally we note that even though a model with a non-infectious incubation period such as the SEIR looks more complex, it has the same value of *R* provided we can neglect mortality in the *S, E* classes, not due to the disease outbreak.

#### Derivation of the City Size Dependence of the Reproductive Number

The most important quantity characterizing epidemic processes is the reproductive number, *R*, which measures the number of secondary cases induced by an infectious individual in a fully susceptible population. Recall that we expect that human population are thought to be wholly susceptible to COVID-19.

For a contagion network, the reproductive number is related to the statistics of degree, *k*, as
where *p*_{I} is the infection probability per contact, ⟨…⟩ denotes expectation values over the population and is the degree variance.

For a lognormal degree distribution, which is typical of social networks in cities, the degree average and variance are given by
with the parameters *µ* = ⟨ln *k*⟩, σ^{2} = ⟨(ln *k* − *∠* ln *k*⟩)^{2}⟩. This results in a simple and elegant expression for the reproductive number
where, in the last equality, we introduced the scaling relation for the average degree with city population size, ⟨*k*(*N*)⟩ = *k*_{0}*N*^{δ}. We see therefore that in general the reproductive number is expected to be a function of city size *N*, and to be larger in bigger cities. How much bigger, depends on the behavior of the log-variance, *σ*^{2}, and whether this parameter is city size dependent, an issue that can generates statistical corrections beyond “mean-field” to the simplest expectations from urban scaling theory with *δ* ≃ 1*/*6.

## Supplementary Figures

## Supplementary Tables

## Acknowledgments

This research is partially supported by the Mansueto Institute for Urban Innovation and a Social Science Research grant from the University of Chicago.