Summary
Background Human mobility is expected to be a critical factor in the geographic diffusion of infectious diseases, and this assumption led to the implementation of social distancing policies during the early fight against the COVID-19 emergency in the United States. Yet, because of substantial data gaps in the past, what still eludes our understanding are the following questions: 1) How does mobility contribute to the spread of infection within the United States at local, regional, and national scales? 2) How do seasonality and shifts in behavior affect mobility over time? 3) At what geographic level is mobility homogeneous across the United States? Addressing these questions is critical to developing accurate transmission models, predicting the spatial propagation of disease across scales, and understanding the optimal geographical and temporal scale for the implementation of control policies.
Methods We address this problem using high-resolution human mobility data measured via mobile app usage. We compute the daily coupling network between US counties, and we integrate our mobility data into a spatially explicit transmission model to reproduce the national invasion of the first wave of SARS-CoV-2 in the US.
Findings Temporally, we observe that intercounty connectivity is largely seasonal and was unperturbed by mobility restrictions during the early phase of the COVID-19 pandemic. Spatially, we identify 104 geographic clusters of US counties that are highly connected by mobility within the cluster and more sparsely connected to counties outside the cluster. These clusters are stable across time and highly overlap with US state boundaries. Together, these results suggest that intercounty connectivity in the US is relatively static across time and is homogeneous at the sub-state level. We also find that while having access to county-level, daily mobility data best captures the spatial invasion of disease, static mobility data aggregated to the scale of our mobility data-based clusters also performs well in capturing spatial diffusion of infection.
Interpretation Our work demonstrates that intercounty mobility was negligibly affected outside the lockdown period of Spring 2020, explaining the broad spatial distribution of COVID-19 outbreaks in the US during the early phase of the pandemic. Such geographically dispersed outbreaks place a significant strain on national public health resources and necessitate complex metapopulation modeling approaches for predicting disease dynamics and control design. We thus inform the design of such metapopulation models to balance high disease predictability with low data requirements.
Introduction
Human mobility plays a crucial role in the spread of respiratory diseases [1, 2]. The combination of regional travel and local commuting represents the spatial connectivity between locations, serving as the main driver in the geographic diffusion of infectious diseases. Characterizing the spatial dynamics of pathogen transmission is, therefore, intricately tied to unraveling human mobility patterns. Such a task has proven to be challenging due to the inherent complexity and privacy-related limitations on collecting mobility data [3]. Over the past few decades, researchers have extensively relied on mobility data obtained from census records, surveys, transportation statistics, commuting data, and international air traffic data. Such datasets have widely contributed to a better understanding of human mobility patterns and their impact on the epidemic spread [4–9], but can be limited in their resolution or scale. More recently, this gap has been filled by the use of mobile phone data [10, 11], primarily based on phone records, but no such data has been available in the United States.
The global health crisis triggered by COVID-19 has underscored the critical need for swift access to mobility to help mitigate the spread of the virus. The urgency of the situation prompted an unprecedented sharing of data by private companies worldwide, through legally and ethically compliant agreements. This data was based on mobile location-based app usage and thus provided incomparable access to high-resolution, large-scale, and near-real-time mobility data and has expanded human mobility science [12], and computational epidemiology [13, 14]. The availability of this data has especially represented a shift in US public health and it has been used to inform epidemic models and reveal the impact of mitigation strategies on behavior[15–22]. While the association between mobility patterns and COVID-19 transmission in the USA has been extensively studied, no studies have been devoted to assessing when the underlying mobility network is needed to be embedded into models to characterize the epidemic spread.
Moreover, the effects of control measures on human mobility at meso-scale (i.e. intermediate or regional level of geographical granularity) and long-range levels (i.e. entire countries, continents), as well as the most suitable geographical and temporal granularity for implementing these measures, still lack clarity. This gap in understanding the characteristic spatio-temporal scale of mobility not only limits target control policies, but also our ability to model the transmission dynamics of the disease effectively. To date, mobility data have been integrated into epidemic models without due consideration for the optimal geographical (e.g. municipalities, regions, states) and temporal resolution (e.g., day, week, month) required to accurately capture epidemic spread. The level of granularity used in these models has consistently been dictated by a priori assessments from data providers [23, 24].
To address these gaps, we aim to characterize the spatiotemporal characteristic scale of human mobility in the United States, for the periods before and after the pandemic emergency by using mobile phone data. Furthermore, to assess if mobility was relevant in the spread of COVID-19 and which mobility scale drove the invasion, we integrate mobility data into spatially-explicit transmission models to reproduce the national invasion of the first wave of SARS-CoV-2 in the US. More specifically, we compute the daily coupling network between US counties, and we observe the daily intercounty connectivity in the US in time and space. To assess the role of mobility, we thus evaluate how the predictability of the model depends on the underlying mobility aggregation scheme, and how model predictive power is impacted by not accounting for mobility.
Methods
Our study has two objectives: i) to characterize the temporal and spatial scale of variation in the connectivity between U.S. counties influenced by human mobility and ii) to inform metapopulation models with mobility data at different scales, thereby informing the scale of data required for epidemic predictability. To achieve this, we analyze human mobility data measured via mobile app usage across spatial and temporal scales. We then integrate these mobility data into an epidemic model parameterized with national COVID-19 public health data to evaluate the performance of the model at different scales of mobility to predict epidemic spatial invasion.
Characterizing intercounty connectivity with mobility data
We access data from SafeGraph [25] (now called Advan Patterns), which collects and shares mobility data based on location-based mobile app usage. In particular, we use the daily Social Distancing dataset provided by Safegraph (see Supplementary Information for dataset selection), and use information on the number of mobile devices with a home in an origin census block group that visit a given destination census block group for at least a minute. Data span the period from January 2019 to April 2021 on a daily timescale. We aggregate the data to the US county level to capture a common geographic scale for disease surveillance and public health decision-making. To address the spatial and temporal heterogeneity in the observed devices obsi within each county i (Figure S1), we developed a correction factor: where popi is the population in the county i. In order to reduce sampling biases, we exclude the bottom 25% of counties by population size, running all analyses on 2327 geographical counties within the continental US of population size greater than 11,000.
After rescaling for the correction factor, to quantify intercounty connectivity monthly we normalize the number of visits between an origin and destination county by the daily total number of visits originating in the origin county. We then compute the average daily mobility each month from Jan 2019 to March 2021 for all pairs of counties. in Figure 1A is shown for illustrative proposes the spatial connectivity network computed in March, 2020. We therefore obtain a time-evolving connectivity network between US counties, in which the links represent the daily coupling probabilities pij between any pair i and j of US counties. A comparison between the monthly and daily dataset provided by SafeGraph is reported in Figure S2.
Early phase of COVID-19 in the US
The initial confirmed case of COVID-19 in the United States was reported in Washington state on January 21, and within a few weeks, local transmission was established. Guidelines advocating for social distancing and the avoidance of gatherings were released on March 16. During the early stages of the COVID-19 pandemic, lockdowns and stay-at-home orders in the United States exhibited variations between states and were implemented at diverse times. The peak of lockdowns was in April 2020, when more than 40 states had issued some form of stay-at-home or shelter-in-place order [26]. However, the spatial spread of COVID-19 was not contained, and at the end of June 2020, most US counties reported COVID-19 cases. Facing a rebound of cases in the fall of 2020, social distancing was recommended (but not mandated) to maintain epidemic activity at low levels.
COVID disease incidence data was derived from government data accessible via the USA FACTS website [27]. In this work, we use COVID-19-related daily new reported cases and the time of arrival in any county, i.e. the day when at least 10 cases have been reported in that area (Figure 1 B-C).
Describing temporal and spatial variability in the mobility network
We examine the monthly network structure to evaluate the temporal dynamics of mobility patterns. We quantify the degree by county i.e. the number of connections (edges) the counties in the network have to other counties. We also define link persistence as the probability that links between counties that exist with non-zero mobility during the month of 2019 remain present in the same month during 2020 and 2021.
We also fitted a gravity model to the intercounty connectivity network for each month. For technical details and model performance, see Supplementary Information (SI).
To detect the different geographical communities generated by human mobility patterns, we performed a community detection analysis using the stochastic InfoMap algorithm [28]. Our aim is to identify regions where within movements occur more frequently compared to movements to other regions. To account for stochasticity, we use a bootstrap resampling method (see SI for details).
The classification of urban and rural areas was determined based on the NCHS Urban-Rural Classification Scheme for counties [29].
All network analyses were done using Python’s networkX library, and the gravity model fitting was done using the Python scikit-mobility library.
Incorporating human mobility into infectious disease models
We used a stochastic non-Markovian transmission model with a metapopulation structure at the US county level [30]. In each county, the model accounted for disease transmission proportional to (i) infected residents not moving (ii) infected visitors coming from other counties and (iii) returning residents previously infected in other counties. The resulting force of infection in the county i is defined as follows: where pij is the coupling probability between patches i and j extracted from the intercounty connectivity network. The effective population, and effective number of infections, are respectively defined as follows: We considered SEIR (Susceptible - Exposed - Infectious - Recovered) epidemic dynamics specific to COVID-19. Epidemics parameters are informed by [31].
In designing the metapopulation structure across various spatial scales, we maintain the spatial scale of the model to be at the county level. This approach enables us to make meaningful comparisons of the results across different scales. To design a model at a county level, with homogeneous mixing within a specific spatial scale, we set coupling probabilities for the connected counties within each patch (i.e., regions), all equal to the average coupling probability of the links within the patch. The detailed mathematical framework, model calibration and implementation details can be found in the Supplement.
Goodness of fit
To assess model performance across spatial scales, we computed the goodness of fit to compare the estimated invasion probability pi,inv(t) with the observed early phase COVID-19 spatial invasion. pi,inv(t) denotes the likelihood for a county i in a day t to report at least 10 infected cases in the simulation [32]. The goodness of fit is defined as follows: Ii = 1 if the county i have been at least 10 infected cases at the day t, and Ii = 0 otherwise.
Results
A human mobility is expected to be a pivotal driver in infectious disease transmission. Understanding the impact of mobility on infection spread at local, regional, and national scales is therefore imperative for precision in transmission models, predicting disease spread, and optimizing targeted control strategies. By analyzing US county-level spatial connectivity using mobile phone data, we assess the temporal and geographical variability of human mobility and identify the geographical scale that drove the early phase of COVID-19 spatial invasion. As we address public health questions, we reveal the characteristic scale to design metapopulation modeling.
Temporal stability of the intercounty connectivity network
Limited changes in mobility are observed from January 2019 to March 2021, except for a significant impact localized to April 2020 (Figure 2A). This notable transition coincided with the implementation of lockdown measures, causing a nationwide decline in mobility from roughly 45 million daily visits to about 25 million visits post-lockdown enforcement. The mobility shock extended throughout the month, encompassing a transitional period (Figure 2A-B). Analyzing the temporal evolution of the intercounty connectivity network, we discovered a consistent seasonal pattern in the degree distribution and the persistence of mobility connections. Local variations were observed, only in April 2020, with a 23% reduction in degree and a 26% reduction in link persistence, respectively. Surprisingly, no variation was observed in November 2020, despite the strong recommendations for social distancing ahead of the winter surge of SARS-CoV-2. The reduction in Rural-Urban connections is particularly pronounced, with a 25% decrease compared to the pre-reduction value in February 2020. This decrease stabilized in May and beyond. Notably, Urban-Urban connections exhibited greater resilience over time when compared to connections involving rural areas. Moreover, while link weights stay consistent over the study period (Figure 1C), the likelihood of staying in the home location exhibited larger variability (See SI).
In spite of sporadic extreme events leading to local variability, the intercounty connectivity network demonstrated temporal stability and exhibited a high level of predictability through a gravity fit model as shown in Figure S7-9. Indeed, the Spearman coefficient between the original and modeled intercounty connectivity network remains constant over time, averaging 0.55.
Spatial stability of the intercounty connectivity network
To identify the geographic scale at which mobility is highly connected, we detect clusters of counties that are more connected via mobility within the cluster than outside the clusters, we use a network community detection algorithm. Our hypothesis was that this partitioning of the US would be at a geographic scale larger than 3143 US counties but smaller than 50 US states or 10 HHS regions. Indeed, we find that based on human mobility, the US can be partitioned into around 100 regions that split most US states into multiple clusters (Figure 3). We also find that these clusters are highly spatially contiguous and respect state boundaries (with a similarity measured by normalized mutual information (NMI) as 0.82). Furthermore, these regions demonstrate stability over time (NMI = 0.95) despite the perturbations of the early phases of the COVID pandemic (Figure 3B and in Figure S9-10). Thus, we identify a persistent geographic partitioning of the US in which clusters are more connected within than between, and hypothesize that the relevance of mobility to the spatial diffusion of infectious diseases occurs at a mesoscale.
Implications for metapopulation disease models
After analyzing the stability of mobility patterns in both space and time, we evaluated how the spatiotemporal scale of human mobility affects our ability to effectively model metapopulation dynamics of disease. To address this, we integrate the connectivity network into a spatially explicit metapopulation model, with the goal of simulating the national spread of the initial wave of SARS-CoV-2 in the US. To understand the role of the geographic scale of mobility on disease dynamics, we integrate networks into the disease model that always at the US county level, but are homogenized at different spatial scales to represent missing information at different scales. To investigate the influence of temporal scale on model effectiveness, we inform the model with either a time-evolving connectivity network or a static connectivity network representing mobility from March, 2020 (without loss of generalizability given the temporal stability of the network we discuss above). In all cases, we measured goodness of fit by comparing the model predicted time of arrival of disease in a county to the observed time of arrival of disease.
In Figure 4A, we demonstrate that a metapopulation disease model informed by a county-level intercounty connectivity network is highly predictive of observed early COVID-19 spatial diffusion. Relative to a randomized intercounty connectivity network (with the empirical degree distribution and edge weights preserved), the empirical mobility network has a stronger goodness of fit throughout the early phase of the pandemic. This emphasizes the crucial role of human mobility in the spatial spread of the initial SARS-CoV-2 wave and underscores the necessity of accessing mobility data for constructing more reliable models. Additionally, we find that the spatial invasion predicted by a static mobility network is highly consistent with the prediction from a time-varying mobility network, suggesting that static mobility data is sufficient to accurately reproduce epidemic spatial heterogeneity.
In Figure 4B, we demonstrate the impact of defining metapopulation structures at different spatial scales on predictions of spatial diffusion. We compared three spatial scales: US HHS regions, US states, and mobility data-based clusters (as defined in Figure 3). When comparing spatial scale, we find that our mobility-based clusters most accurately capture the spatial spread of disease in the US, although do not perform as well the predictions when county-level mobility data is available. Our previous results of the mobility-based clusters being highly overlapping with state boundaries is also reflected in the strong performance of the state-level metapopulation structure in predicting spatial invasion. Conversely, HHS region-level mobility data lack the necessary granularity and, upon incorporation into the model, fail to provide adequate insights into the diversity of mobility patterns.
To additionally consider the impact of low data availability, we explore two additional scenarios. For both the state and cluster scales, we informed the metapopulation disease model using an inter-county network in which links between regions (state and cluster, respectively) are randomized, such that only the total number of links and the sum of all mobility for the region must be known. In Figure 4B, we highlight that this low data scenario predicts an epidemic invasion path comparable to the high data scenario, indicating that we can model the same spatial spread with reduced information. However, at the cluster level, the high data scenario outperforms the low data scenario, highlighting the effectiveness of incorporating the characteristic mobility scale into models to enhance predictive accuracy.
Discussion
Since the onset of the COVID-19 pandemic, mobile phone data has played a crucial role in addressing the public health crisis [13–20, 24, 31, 33–36]. During this period, numerous network operators and private enterprises have made considerable efforts to swiftly share their data within the confines of legal regulations. Consequently, researchers worldwide have embarked on working with this data, monitoring human behavior caused by containment measures and adaptive responses to the epidemic, and utilizing it to enhance epidemic models in order to increase their reliability. [19, 20, 31, 33–36].
While static mobility data have predominantly been analyzed and integrated into models over the past decades [5, 37, 38], the current accessibility to real-time human behavior data prompts an essential investigation into the optimal scenarios for utilizing this dynamic information versus relying solely on static representations of reality [39]. Equally important is the exploration of the characteristic mobility scale to comprehensively capture the intricate coupling between different locations, a consideration with potential implications for target control policies to reduce epidemic activity, and for improving epidemic model forecasting. Furthermore, numerous researchers have emphasized the pressing necessity to implement standardized strategies that facilitate rapid data access while upholding stringent data privacy measures [40].
To answer this gap in the literature, in this study, we investigate the spatial connectivity of US counties during the early phase of the COVID-19 pandemic using high-resolution real-time human mobility data obtained from mobile phone usage. Our findings reveal significant insights into the dynamics of human mobility and their implications for infectious disease modeling. We observe that despite the implementation of local social distancing measures and lockdowns, intercounty connectivity remained largely unperturbed, leading to rapid geographic diffusion of SARS-CoV-2. Mobility patterns experience only marginal changes before and after the COVID-19 pandemic. The most notable disruption occurred during the first lockdown period in April 2020, when mobility sharply declined. However, this reduction was short-lived, and mobility patterns quickly rebounded. Notably, even during periods of social distancing recommendations, the mobility network remains relatively stable. Assuming the lockdown represents the most extreme form of mobility disruption, the temporal stability findings suggest that global human mobility demonstrates resilience against short-term changes.
We also assess the spatial stability of the intercounty connectivity network by detecting spatial communities based on mobility patterns. Our results indicate that mobility-driven clusters align closely with state boundaries, reflecting the influence of administrative and geographical factors on human movement. These clusters exhibited remarkable stability over time, reinforcing the idea that spatial mobility patterns are deeply ingrained and relatively resistant to abrupt changes. The fact that mobility patterns are highly correlated with state boundaries suggests that state-level structures could be effective for designing target public health interventions based on travel reductions. Our findings underscore the importance of considering mobility patterns when designing interventions, resource allocation, and disease control strategies.
We also demonstrated that incorporating high-resolution human mobility data is crucial for accurately capturing the spatial spread of infectious diseases. Our findings indicate that county-level, daily mobility data offer the most accurate representation of the spatial spread of disease in the US. Notably, static county-level mobility data achieves similar model performance to real-time data, suggesting that an undisturbed representation of reality is adequate for reproducing spatial spread. More interestingly, our exploration of various spatial scales for metapopulation models underscores the significance of aligning the model’s structure with the inherent spatial scale of human mobility. While county-level mobility data yields the most accurate depiction, mobility data-based clusters also display promising outcomes, highlighting their potential utility in scenarios where county-level data might be inaccessible or difficult to acquire. Striving to achieve a balance between high disease predictability and low data requirements, we found the number of connections in the observed connectivity network and the total volume of connection is already enough information to define a static connectivity network with the same performances as the full-detailed observed network.
While our study provides valuable insights, it is not without limitations. Our work focused on the early phase of the pandemic; future research could explore how the interplay between mobility patterns and disease dynamics has changed over time, taking into account factors such as vaccination campaigns and behavioral aspects like mask-wearing due to increased social awareness. Furthermore, we assume homogeneity within US counties. Including other sources of behavioral data like masking, or compliance to social distancing by age, could provide a more comprehensive understanding of the spatial heterogeneity behind the epidemic spread. Lastly, due to limited data availability on surveillance resources, we assume homogeneity in the delay of reporting dates for confirmed COVID-19 cases.
While characterizing the key role of mobility in the spatial invasion of the COVID-19 pandemic in the US, our study sheds light on the global stability of human mobility patterns, and the relevant information needed to design a reliable predictive model. Metapopulation models that incorporate accurate mobility data can provide valuable insights into disease dynamics and enhance our ability to predict and control the spread of future infectious disease outbreaks. Moreover, standardized data extraction and sharing we introduced might help facilitate the timelines associated with legal agreements for data sharing, which do not always align with the rapid spread of epidemics, thus diminishing the feasibility of timely responses to such outbreaks.
Data Availability
mobility data were openly available to the public before the initiation of the study here:https://www.safegraph.com/ public health data are openly available here: https://covid.cdc.gov/covid-data-tracker/#datatracker-home All data produced by data analysis and model simulations in the present work will soon be available on Github, here: https://github.com/GiuliaPullano/USA_first_wave_COVID_mobility.
https://github.com/GiuliaPullano/USA_first_wave_COVID_mobility
Acknowledgments
The authors thank the teams at Safegraph/Advan Patterns for sharing mobility data.