Supporting COVID-19 Policy Response with Large-scale Mobility-based Modeling

Mobility restrictions have been a primary intervention for controlling the spread of COVID-19, but they also place a significant economic burden on individuals and businesses. To balance these competing demands, policymakers need analytical tools to assess the costs and benefits of different mobility reduction measures. In this paper, we present our work motivated by our interactions with the Virginia Department of Health on a decision-support tool that utilizes large-scale data and epidemiological modeling to quantify the impact of changes in mobility on infection rates. Our model captures the spread of COVID-19 by using a fine-grained, dynamic mobility network that encodes the hourly movements of people from neighborhoods to individual places, with over 3 billion hourly edges. By perturbing the mobility network, we can simulate a wide variety of reopening plans and forecast their impact in terms of new infections and the loss in visits per sector. To deploy this model in practice, we built a robust computational infrastructure to support running millions of model realizations, and we worked with policymakers to develop an interactive dashboard that communicates our model's predictions for thousands of potential policies.


INTRODUCTION
The COVID-19 pandemic has wreaked havoc on lives and livelihoods across the globe. In an eort to contain the virus, policymakers have turned to non-pharmaceutical interventions, such as restricting mobility, in order to limit contact and reduce disease transmission between individuals. To this end, many US states and local governments have closed or required reduced occupancy at places such as restaurants and gyms [7]. However, these measures come at a heavy cost to individuals and businesses: for example, over 160,000 US businesses closed due to COVID-19 between March and August 2020 [36]. The next few months will continue to pose challenges to public health and economic activity. It is imperative during this time to provide policymakers with analytical tools that can help them develop optimal solutions to minimize new COVID-19 infections while also minimizing damage to businesses. They need a tool that can quantitatively assess, in near real-time, the tradeos between mobility and new infections. Furthermore, this tool should be ne-grained, able to test out heterogeneous plans-for example, allowing one level of mobility at essential retail, another level at retail, and yet another at restaurants-so that policymakers can tailor restrictions to the specic risks and needs of each sector. Despite this granularity, the tool also needs to be scalable, supporting analyses for an exponential number of potential policies so that policymakers can select the best option among them for their jurisdiction.
To fulll these needs, we present a decision-support tool, which we built based on interactions with the Virginia Department of Health (VDH) to support their decision-making on mobility reduction policies. Our approach begins with our state-of-the-art epidemiological model [8], which integrates large-scale mobility and mask-wearing data to accurately capture the spread of COVID-19. Our model overlays transmission dynamics on a time-varying mobility network which encodes the hourly movements of individuals from neighborhoods to specic points of interest (POIs), such as restaurants or grocery stores. Since we model infections in tandem with mobility, our model can provide the multifaceted analyses necessary to understand the costs and benets of a policy; for example, by quantifying predicted infections and the number of POI visits lost per sector, which can serve as a proxy for economic impact. By design, our model is ne-grained, as it simulates who is getting infected where and when down to the individual POI and hour. Our model is also exible, since we can modify any one of its inputs-for example, modifying mobility for a subset of POIs to reect a change in policy, or modifying transmission rates per neighborhood to indicate vaccination eects-and straightforwardly run the model with the new inputs to observe the eects of the hypothetical change.
Finally, to scale our model, we build a robust infrastructure to handle computational challenges. The mobility networks that we model are large, with billions of hourly edges between POIs and neighborhoods. Furthermore, the exibility of our approach-for JavaScript framework amCharts NGINX server Figure 1: Illustration of our approach. We integrate data from many sources to run, evaluate, and analyze our model. We build a computational infrastructure to support millions of model realizations during the model tting and experiment steps of our pipeline. Our UI design enables a smooth experience, so that policymakers can easily compare dierent reopening policies.
example, being able to simulate dierent POI categories at dierent levels of mobility-results in an exponential number of hypothetical scenarios to test. In order to support predictions for thousands of scenarios, we design a lightweight, vectorized implementation of our model, and build a large-scale system to run hundreds of models in parallel. 1 By leveraging this system, we are able to compress 2 years of compute time into the span of a few days.
Advances in the current work. In building this tool, we have greatly extended our epidemiological model since its original implementation [8]. We have introduced new features including (1) variation in mask-wearing over time; (2) a time-varying base transmission rate; (3) a time-varying death detection rate; and (4) model initialization based on historical reported deaths. These additions allow us to accurately t daily deaths in Virginia, and we also show that the inclusion of our new features contribute substantially toward reducing model loss (Section 3.1). Furthermore, whereas our original work focused on the rst two months of the pandemic (March and April 2020), in this work we focus on recent months (November 2020 to January 2021), which are more relevant to current policy-making. We have also tted the model on new, smaller metropolitan areas in Virginia. Importantly, the experimental results in this work are consistent with and extend our original analyses, showing that the high-level ndings from our rst work generalize to new time periods and smaller regions. For example, we continue to nd that mobility patterns are predictive of disparities in infection rates between lower-and higher-income neighborhoods, and that certain POI categories like restaurants are far more dangerous to fully reopen than others (Section 3.2). Finally, to create a nished product that policymakers can directly use, we developed a new dashboard that can communicate thousands of results from our model. Our resulting interface includes multiple interactive panels, where policymakers can select various proposed changes in mobility, and observe how these changes would aect predicted infections over time and losses in POI visits (Section 3.3). 1 Our code is available at https://github.com/snap-stanford/covid-mobility-tool.
Supporting public health decision-making. Our group has been supporting various federal, state and local public health authorities for over a year now as they respond to the pandemic. The need for such a tool became clear to us during the course of sustained response eorts. This tool was designed to fulll public health ocials' desire to have a quantitative and comprehensive analysis of a range of reopening policies. VDH reviewed a prototype of the tool and provided valuable feedback on how best to present the data to maximize clarity and applicability from a public health perspective. This guidance was integral to the nal design of the dashboard presented in this paper. While we focus on the state of Virginia for illustrative purposes, the tool can be generalized and used in other states as well.

OUR APPROACH
In this section, we break down our approach: the datasets that we use (Section 2.1), our epidemiological model (Section 2.2), and the computational infrastructure that we developed to produce predictions at scale (Section 2.3). Figure 1 also provides a summary of our process, illustrating the main steps of our approach and where dierent data sources are integrated along the way.

Large-scale data
Fine-grained mobility data (SafeGraph). Mobility data capture important changes in population behavior over time: for example, in Figure 2, we see that mobility fell dramatically in March 2020, then slowly climbed back up during the following months until receding again near the end of the year. These patterns reect where and when individuals may have been coming into contact with one another, and thus inform our understanding of transmission risks and how to mitigate them. We use data from SafeGraph, a company that anonymizes and aggregates location data from mobile apps. SafeGraph's Places 2 and Weekly Patterns 3 datasets provide Figure 2: Three time-varying data sources -mobility, mask use, and daily COVID-19 deaths -shown for the Washington DC MSA. The highlighted regions indicate the periods that we test for model validation, one from the rst wave and one from the second wave of infections (Section 3.1). There is an 18-day oset between the highlighted regions for input data (mobility, mask use) and deaths, due to the modeled delay between becoming infectious and date of death (Section 2.2).
detailed information about millions of points of interest (POIs), which are non-residential locations that people can visit. For each POI, SafeGraph provides estimates of its hourly visit counts, as well as weekly estimates of which census block groups the visitors are coming from. In addition, SafeGraph provides each POI's North American industry classication systems (NAICS) category, its physical area in square feet, and its median visit duration in minutes (the "dwell time"). We also use SafeGraph's Social Distancing Metrics 4 dataset, which contains daily estimates of the proportion of people staying at home in each CBG.
In this work, we focus on three of the largest metropolitan statistical areas (MSAs) in Virginia: Washington-Arlington-Alexandria, DC-VA-MD-WV (hereby referred to as the "Washington DC" MSA), Virginia Beach-Norfolk-Newport News, VA-NC ("Eastern"), and Richmond, VA ("Richmond"). Across these MSAs, we model 63,744 POIs and 7,609 census block groups (CBGs) in total, with over 3 billion hourly edges between them (Table 1).
Mask-wearing data (IHME). We use mask-wearing data from the Institute for Health Metrics and Evaluation (IHME) website, 5 which provides daily estimates at the state level of the percentage of the population wearing masks. In Virginia, we see the most dramatic change in mask-wearing near the beginning of the pandemic, from 4 https://docs.safegraph.com/v4.0/docs/social-distancing-metrics. 5  Hourly edges counts the number of non-zero edge weights in the mobility network from 12am November 1 to 11pm December 31, 2020 (the second wave period that we t).
0% of the population wearing masks in mid-March to 60% by the end of May 2020 ( Figure 2).

COVID-19 deaths (New York Times)
. To calibrate our model, we compare its predicted death counts to data on reported deaths. We use The New York Times' COVID-19 dataset, 6 which contains daily reported deaths per US county. For each MSA that we model, we sum over the county-level counts to produce overall counts for the entire MSA. As shown in Figure 2, Washington DC-along with much of the US-experienced two major waves of infections, one in the spring of 2020 and the second near the end of the year.
Demographic data (US Census). We utilize data about each CBG from the American Community Survey (ACS) of the US Census Bureau. We use the 5-year ACS data (2013-2017) to extract the median household income, the proportion of white residents, and the proportion of Black residents of each CBG. For the total population of each CBG, we use the most-recent one-year estimates (2018); one-year estimates are noisier, but we wanted to minimize systematic downward bias (due to population growth) by making them as recent as possible. The model uses CBG populations as input, but it does not use income or race during the simulation. Instead, we use income and race data to analyze the model's output; for example, to compare the predicted infection rates of lower-income and higher-income CBGs (Section 3.2).

Epidemiological model
Mobility network. We overlay a disease transmission model on a dynamic mobility network, which is represented as a complete undirected bipartite graph G = (V, E) with time-varying edges. The nodes V are partitioned into two disjoint sets C = {2 1 , · · · , 2 < }, representing < CBGs, and P = {? 1 , · · · , ? = }, representing = POIs. The weight F (C ) 8 9 on an edge (2 8 , ? 9 ) indicates the number of people from CBG 2 8 who visited POI ? 9 in hour C; we refer the reader to Chang et al. [8] for the details of how we derive the hourly edge weights from SafeGraph data. From US Census data, each CBG 2 8 is labeled with its population # 2 8 , and from SafeGraph data, each POI ? 9 is labeled with its area 0 ? 9 and median dwell time 3 ? 9 .
Model dynamics. We assume that every individual is in one of four disease states at any given time: susceptible ((), exposed (⇢), infectious ( ), or removed ('). Susceptible individuals can acquire the virus through contact with infectious individuals. They then enter the exposed state, during which they have been infected but 6 https://github.com/nytimes/covid-19-data.

3
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. are not yet infectious. Individuals transition from exposed to infectious at a rate inversely proportional to the mean latency period. Finally, they transition from the infectious state to the removed state at a rate inversely proportional to the mean infectious period. In the removed state, they can no longer infect others or become infected again (e.g., because they have recovered or died).
In our model, each CBG maintains its own SEIR states, with ( At each hour C, we sample the transitions between states as follows: where _

(C )
? 9 is the rate of infection at POI ? 9 at hour C; _ (C ) 2 8 is the base rate of infection that is independent of visiting POIs; X ⇢ is the mean latency period; and X is the mean infectious period.

The number of new exposures, #
. We assume that any susceptible visitor to POI ? 9 at hour C has the same independent probability _ (C ) ? 9 of being infected. Since there are F (C ) 8 9 visitors from CBG 2 8 to POI ? 9 at hour C, and we assume that a ( The number of new infections among all outgoing visitors from CBG 2 8 is therefore distributed as the sum of the above expression over all POIs, Pois (( We dene the infection rate _ The transmission rate is V (C ) where k is a transmission constant (shared across all POIs) that we t to data, the dwell time 3 ? 9 2 [0, 1] is the average fraction of an hour a visitor spends at ? 9 , and 0 ? 9 is the physical area of ? 9 .
In addition to new infections from POIs, we model a CBG-specic base rate of new infections that is independent of POI visit activity. This captures other sources of infections, e.g., household infections or infections at POIs that are absent from the SafeGraph data. At each hour C, every susceptible individual in CBG 2 8 has a base probability _ (C ) is the product of the mask-wearing scaling factor, the base transmission rate V (C ) base , and the proportion of infectious individuals in 2 8 . We parameterize V base by dening a starting point V base and A V are free parameters, shared across all CBGs, which we t to data. We allow V base to vary over time to capture changes in behavior outside of POIs (e.g., if home gatherings increase). However, we restrict the amount that it can vary by using a conservative range for A V , allowing V base to change up to 30% (in either direction) over the 2-month periods that we simulate.
The number of reported deaths. We assume that a time-varying proportion of infections A (C ) deaths will result in reported deaths, and that they will be reported exactly X deaths = 432 hours (18 days) after the individual became infectious. 7 From these assumptions, the predicted number of newly reported deaths from CBG 2 8 on day 3 is# where we dene # to be 0 when g < 1. This conversion from (⇢ ' states to reported deaths allows us to t our models on daily reported deaths, which we describe in the following section, as well as to initialize the (⇢ ' states at the beginning of the simulation based on historical reported deaths (Section A.1.3).
The time-varying reported death rate A IFR , which, based on hospital fatality rates, estimate that the infection fatality rate was nearly halved by the summer of 2020 [15]. To estimate A (C ) detect , we compute the weekly ratio of reported COVID-19 deaths in the US, from The New York Times, over weekly select-cause excess deaths, as estimated by the National Center for Health Statistics (NCHS). 8 The NCHS provides, for select causes of deaths determined to be related to COVID-19 (e.g., pneumonia, heart failure), the expected (based on 2015-2019) and actual numbers of deaths due to these causes for each week since the pandemic began; the weekly excess select-cause deaths are the dierence between the expected and actual (or 0, if the expected exceeds the actual). If we assume that all select-cause excess deaths can be attributed to COVID-19, then we nd that the death detection rate A (C ) detect increased from around 20% at the beginning of the pandemic to 60%-80% after June 2020.
Model tting. Most of our model parameters can either be estimated from data or taken from prior work (see Table A1 for a 7 It is simplifying to assume that the delay is xed, but we performed sensitivity analyses in the original work that showed that model predictions would remain essentially identical if delays were sampled stochastically from a distribution. 8 https://www.cdc.gov/nchs/nvss/vsrr/covid19/excess_deaths.htm.

4
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.20.21254022 doi: medRxiv preprint summary). This leaves 3 model parameters that we need to calibrate with data: the POI transmission constant, k , and the starting point and ratio for the base transmission rate, V (0) base and A V . We calibrate these parameters per MSA by tting to published numbers of conrmed deaths, as reported by The New York Times (NYT). NYT data provides the daily cumulative number of COVID-19 deaths per county, so rst we sum over county-level counts to produce cumulative death counts for the MSA. Then, we convert this to the daily number of new deaths and apply two-week averaging to the raw daily counts to smooth over weekly eects (e.g., not reporting on weekends) and other sources of noise. We measure model t based on new deaths per day (i.e., incident deaths), since model error and uncertainty can be severely underestimated when the t is evaluated on cumulative curves [19].
Let # deaths represent the (smoothed) number of reported deaths on day 3 in the MSA, and let# (3) deaths represent the model's prediction for this quantity, which we compute by summing over the model's CBG-level predictions (Equation 6). We compare our model predictions to the actual counts by computing the root-meansquared-error (RMSE) between each of the ⇡ days of our simulation: For each parameter set, we quantify model t by running 30 stochastic realizations and averaging their RMSE. Throughout this paper, we report aggregate predictions from the best-tting parameter sets. For each MSA, we: (1) Grid-search over 1,050 combinations of k , V (0) base , and A V .
(2) Find the best-t parameter set with the lowest average RMSE.
(3) Select all parameter sets that achieve an RMSE within 20% of the RMSE of the best-t parameter set. (4) Pool together all predictions across those parameter sets and all of their stochastic realizations, and report their mean and 2.5th/97.5th percentiles.
This procedure captures uncertainty from two sources: (1) stochastic variability between model runs with the same parameters, and (2) uncertainty in the model parameters. In Table 2, we describe the selected parameters for each of the MSAs that we model.

Large-scale implementation
In this section, we describe computational challenges that arise from implementing this system at scale, and discuss how we addressed them with engineering solutions.
Handling large mobility networks. The hourly POI-CBG networks store large amounts of data; for example, across the three Virginia MSAs we focus on in this work, their networks contain 3.4 billion hourly edges from November to December 2020 (Table 1). In order to reduce computation time, we run our network inference algorithm ahead of any disease modeling, and save the inferred edge weights so that they can be loaded later on. For each network, we save the weights separately per hour (as sparse matrices), so that the model only needs to load as many hours of data as necessary for the current simulation. The model dynamics bring their own challenges: in every hour, we need to estimate the infection rate per POI (Equation 4), and the base infection rate per CBG (Equation 5). However, because each POI's hourly infection rate is conditionally independent of the other POIs' infection rates (given the current (⇢ ' states for each CBG), we can parallelize the hourly computations across POIs; for similar reasons, we can parallelize across CBGs and random seeds (i.e., stochastic realizations). These strategies greatly reduce simulation time; for instance, bringing the average runtime for a 2-month multi-seed simulation with Washington DC down to 5.5 minutes, even as each simulation requires computing 1.78 billion hourly, seed-specic POI infection rates and 215 million hourly, seed-specic CBG infection rates.
Scaling model experiments. One of the strengths of our approach is exibility: for example, our dashboard allows policymakers to test any combination of opening 5 dierent POI categories to 4 dierent levels of mobility (Section 3.3). However, this exibility also creates an exponential number of scenarios to simulate (4 5 = 1, 024). Furthermore, for each scenario, we run 30 stochastic realizations for every parameter set; thus, with 9 parameter sets ( Table 2), we need to run nearly 300,000 model realizations. As described above, part of the solution lies in our model implementation, which runs the stochastic realizations per parameter set in parallel. However, the key to completing these experiments is that we distribute the work across multiple computers, with collectively 288 cores. This allows us to run hundreds of simulations in parallel, compressing 2 years of compute time into a few days.

Model validation
First, we calibrated models for each of the three Virginia MSAs using input data from November 1 to December 31, 2020 and reported deaths from November 19 to January 18, 2021 (there is a 18-day oset between these ranges due to the lag from becoming infectious to date of death, X 2 ). In Figure 3b-d, we show that our models are able to accurately t daily deaths for these MSAs during this time period. The t is especially good for Washington DC, with a normalized RMSE of 7.2% ( Table 2). The normalized RMSEs for the Richmond and Eastern MSAs are slightly higher at 11.5% and 17.3%, respectively; these regions were more challenging to t during this time period due to the large amount of noise relative to their single-digit reported daily deaths (Figure 3d).
To further test our model, we conduct a series of extended analyses with Washington DC as our case study. We focus on Washington DC because its daily death counts are less noisy than the counts for the other two MSAs, due to its substantially larger size (it is 4⇥ their sizes, and in fact the 6th largest MSA in the US). In addition to the November to January period that we tted above, we t our model using input data from March 15 to May 14, 2020 and reported deaths from April 2 to June 1, 2020. 9 We choose these two periods since they overlap with the rst and second waves of infection, and because they reect vastly dierent points in the pandemic, with dierent distancing behaviors, proportions of people infected, weather conditions, and so on. For each time period, 9 We choose two contrasting periods to t, instead of tting the entire time period from March 2020 to January 2021, because tting the entire time period would have required loading over 8,000 hourly mobility network weights, which would be around 100GB.

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.   Table 2: Quantitative results from model tting. For each model, we keep all parameter sets that achieve an RMSE within 20% of the RMSE of the best-t parameter set. Here, we report the number of parameter sets kept per model, and the mean and range over the values for each parameter. nRMSE indicates the normalized RMSE, i.e., the RMSE divided by the MSA's mean daily reported deaths over this period; we normalize in order to facilitate comparison across periods and MSAs. Note that the four rows in this table correspond to the four visualizations in the gure above.
we also conduct ablation studies to test the importance of several model features. In the rst ablation, we remove mobility data by xing k , the POI transmission constant, to 0, and allow the model to search over a wider and ner grid for the base transmission rate, V (0) base . In the second ablation, we remove mask-wearing data by xing the mask-wearing proportion c to 0, but we search over the same grids as in the original model since c was not a free parameter. The third ablation keeps V base constant for the entire time period, instead of allowing it to vary slightly over time. To do this, we x A V to 1, and search over a wider and ner grid for V (0) base . First, we nd that our model can also accurately t the non-linear daily deaths curve from the rst wave (Figure 3a), with a normalized RMSE of 5% (Table 2). Furthermore, the model outperforms its ablations in both periods, although the impact of removing any feature is substantially larger in the rst wave than the second, due to the larger changes in behavior early on. For example, removing mask-wearing data results in a severe increase in the model's RMSE (+591%) during the rst wave, where the increase in mask-wearing allowed the model to capture the downward trend in deaths in May 2020, even as mobility began to slightly climb during this period ( Figure 2). Removing mobility during this time period also has a large eect, nearly doubling the model's RMSE, and xing V base over time results in a 74% increase. In the second wave, the impacts are more subtle: removing mobility data, removing mask-wearing data, and xing V base over time result in 14%, 13%, and 5% increases in the model's RMSE, respectively (Table A2).

Use cases
Our tted model can be applied to a wide variety of use cases, including retrospective analyses investigating who was infected where and when, and forward-facing experiments that modify the model's inputs to test hypothetical changes in policy or behavior in the near future. In this section, we provide a few examples that demonstrate the retrospective and forward-facing capabilities of our model, and highlight the usefulness of our ne-grained approach in capturing heterogeneity in risk across POIs and CBGs.
Analyzing disparities in infection rates. Our rst use case is an example of retrospective analysis. After tting the model, we might be interested in studying what the model learned about the infection rates for lower-income versus higher-income CBGs, since it is wellreported in the real world that lower-income neighborhoods have been impacted more severely by COVID-19 [34]. To analyze this, we stratify CBGs by median household income and compare the cumulative infection rates over time (anyone in ⇢, , or ') of the CBGs in the bottom income decile versus top income decile. We nd that the model correctly predicts a large disparity between the bottom and top income deciles in Washington DC [30] (Figure 4). This gap is especially striking in the rst wave period: from March 15 to May 14, cumulative predicted infections per 100,000 increased around 60% more for the bottom income decile than the top income decile (9,900 versus 6000). This dierence can only be attributed to diering mobility patterns, since the CBGs were initialized to very similar levels of infection at the beginning of the simulation 6 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  and all other exogenous model parameters are shared across CBGs. This matches our rst wave ndings from the original work [8], where we found across 10 of the largest MSAs in the US (including Washington DC), CBGs in the bottom income decile always had a higher infection rate by the end of the simulation than the top income decile. What we found in the mobility data that explained this dierence was that lower-income CBGs could not reduce their mobility as much and, even within the same POI category, the POIs that they visited tended to be more crowded and thus higher-risk.
In the second period, the bottom income decile begins at a disadvantage, reecting actual cumulative deaths by the beginning of November 2020 (see Section A.1.3 for how we initialized (⇢ ' states using historical reported deaths). However, the disparity continues to grow: by the end of December 2020, predicted cumulative infections per 100,000 increased 30% more for the bottom income decile than the top income decile (8,900 versus 6,800). This dierence can be partially attributed to mobility patterns, but also to the self-compounding nature of the virus. Since lower-income CBGs begin with a larger number of infectious individuals, those individuals will generate more infectious people, exacerbating the disparity. Our model can capture these self-compounding dynamics, as well as demonstrate how mobility contributes to-but could also be used to mitigate-these health disparities.
Opening POI categories to dierent degrees. Our second use case is an example of a forward-facing experiment. One common strategy by policymakers has been to implement varying restrictions on dierent business sectors [7], so here we explore the eects of opening POI categories to diering levels of mobility. To test this, we run a simulation starting from November 1, 2020, using the models that we already calibrated (Section 3.1). Then, we modify the mobility networks starting on January 1, 2021, and run the model forward with the modied mobility network for four weeks. During those four weeks, each POI category either maintains its current levels of mobility, or we specify it to have a certain fraction of its "normal" mobility levels, based on SafeGraph data from January 2019. We perform this experiment with 5 POI categories: (1) Restaurants; (2) Essential Retail (grocery stores, convenience stores, drug stores); (3) Gyms; (4) Religious Organizations; and (5) Retail (clothing stores, hardware stores, book stores, pet stores, etc.). For each category, we consider 4 possible mobility settings: maintaining the current mobility level, or keeping 0%, 50%, or 100% of 2019 mobility. In order to provide a wide array of options to policymakers, we test every combination of POI category and mobility setting (1, 024 options per MSA).
These experiments allow us to quantify the trade-o between visits and infections. For example, if every POI continued at current levels of mobility, our model predicts that the Washington DC MSA would experience around 267,000 new infections (2.9% of the population) in January 2021. If, instead, the Restaurant POIs returned to 2019 levels of mobility starting on January 1, we would see a 34% increase in POI visits, but also a 179% increase in predicted new infections. In contrast, if the Essential Retail POIs returned to 2019 levels of mobility, we would only see a 4% increase in POI visits and a 8% increase in predicted new infections. These dierences are partially because there are far more Restaurant than Essential Retail POIs in the Washington DC MSA (10,545 versus 1,606), but also because Restaurant POIs saw a larger drop in mobility during the pandemic, so returning them to "normal" levels of mobility would have a larger impact. Testing every combination of category and mobility also allows us to inspect interactions between categories. For example, if we returned Restaurant and Essential Retail POIs to 2019 mobility levels, we see a 198% increase in predicted new infections; this is higher than the sum of the predicted infections from opening each one on its own (which would be 187% = 179% + 8%). This result highlights why we simulate each combination of mobility levels instead of adding individual impacts, since the whole impact of a policy can be greater than the sum of its parts.

Dashboard
Our dashboard provides an interface to our model results which public health ocials can use to assess the impact of mobility on COVID-19 transmission. We designed our dashboard through iterative meetings with VDH, where they provided valuable feedback on how the interface could be made more intuitive and which visualizations would be most eective in conveying the impact of changing mobility levels. The resulting layout is divided into ve parts, as shown in Figure 5; we discuss each part in detail below.
Visits to Points of Interest Navigation Bar. The POI Navigation Bar is the control center of the tool. From here, users can either view current mobility levels by POI category or use sliders to set "target" mobility levels to 0%, 50% or 100%, indicating the fraction of 2019 mobility levels to use. For each category, we have a current mobility ratio associated with each POI (i.e., how their current mobility compares to their mobility levels from 2019); to communicate heterogeneity in current mobility ratios across POIs within each category, we show each category's median and interquartile range of current mobility ratios as yellow markers on the slider axes. When the application loads, the selected region defaults to Virginia, and the current mobility levels displayed are the average of the mobility levels for the three MSAs.
Map Panel. As the user changes target mobility on the POI Navigation Bar, the colors of each MSA region on the map also change to 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.20.21254022 doi: medRxiv preprint reect increases (more red) or decreases (more blue) in the number of predicted infections. This provides a visual indicator of where a mobility change is likely to have the greatest impact; to get more precise measurements, the user can hover the mouse over each MSA to display predicted infections at current and target mobility levels, as well as the percent dierence between them. Model predictions can be viewed at 1, 2, 3, or 4 weeks following the "intervention date" (in this case, January 1, 2021). We provide this option because the impact of the change in mobility on predicted infections can evolve over time, typically accumulating in magnitude. For example, relative to maintaining current levels of mobility, setting Restaurant POIs in Washington DC to 100% of 2019 mobility levels results in a 76% increase in predicted infections after 2 weeks, a 136% increase after 3 weeks, and, as mentioned in the previous section, a 179% increase after 4 weeks. The user can also click on a specic MSA to select it; this updates the POI Navigation Bar with current mobility levels for the selected MSA, and the Chart Panel with predictions associated with the selection. The user can click on the "Reset to Virginia" button to return to the three MSAs as a collective unit.
Chart Panel. While the Map Panel displays the cumulative impact of the target mobility change for one weekly period at a time, the Chart Panel displays the time series for cumulative predicted infections at current and target mobility levels across the full four-week period along with their 95% condence intervals over model parameters and stochastic realizations. This makes it easier to visualize how much predicted infections at target mobility are expected to deviate from predictions at the current mobility level.
Data Table Panel. The Data Table shows the dierence between predicted infections given current and target mobility levels. It is responsive to the selection of dierent target mobility levels on the POI Navigation Bar, as well as the selected week on the Map Panel. This feature allows the user to conveniently assess the quantitative impact of changing mobility levels, both at the MSA and overall Virginia levels.
Mobility History. The Mobility History Panel is revealed upon selection of the "Mobility History" button on the POI Navigation Bar. This report provides a history from January 2019 to present day of weekly POI visits, aggregated by MSA and POI categories, which helps policy health experts better contextualize the target mobility levels in the POI Navigation Bar. We provide a screenshot from and additional details about this panel in Section A.2.

RELATED WORK
Mobility and COVID-19 modeling. The COVID-19 pandemic and its corresponding social distancing measures have drastically aected human mobility patterns [13,16]. To accurately capture the dynamics of COVID-19 transmission and infection, epidemiological models must account for these changes in mobility. Many such models have been proposed in the last year: for example, it is common for models to use some aggregate measure of real-time mobility to modulate transmission rates [9,17,20]. Others have focused on using historical data to model the initial spread of the disease before social distancing measures were put into place [21,29], or 8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.20.21254022 doi: medRxiv preprint on using synthetic data for analytical purposes [5,18]. In this work, we extend the model from Chang et al. [8], as it uses mobility data that is both ne-grained (e.g., at the individual POI and CBG level) and up-to-date, which enables us to predict the eects of dierent mobility restriction measures at the level of granularity required for policy-making [4,6].
COVID-19 tools for policymakers. Recently, there has been an intense interest in developing easy-to-use interfaces to support computational infectious diseases epidemiology. Much of the eort is focused on providing surveillance information [11,28]; for example, several apps have been built to monitor trends in cases over time [35], tracking their growth rate [1] or aiming to detect clusters [14]. Other modeling tools that are being used for the COVID-19 pandemic include EpiC and Gleamviz [32], for global mobility and epidemic simulations; DiCon, for optimization and control problems related to epidemic dynamics [25]; and, most related to our tool, FRED [26] and GAIA [27], which are open source systems that support research in networked epidemiology. Our team has also been developing epidemiological applications in support of policymakers for over 15 years, including SIBEL, an epidemic modeling tool that allows users to experiment with ne-grained interventions [2,3,10,22,23], and EpiViewer, a tool for visualizing epidemic time series [31]. What sets the tool described in this paper apart from the others is the way it incorporates ne-grained mobility data with the SEIR model, enabling a more detailed ability to create and test interventions. Our tool also focuses on near real-time response; this marks a crucial evolution from earlier eorts that were largely used for planning studies.

CONCLUSION
We have introduced a decision-support tool that allows policymakers to inspect the predicted impacts of thousands of dierent policies, specic to their jurisdictions. Our tool utilizes large-scale data and epidemiological modeling to simulate the eects of negrained changes in mobility on infection rates, and leverages a robust computational infrastructure to run model experiments at scale. As policymakers face dicult challenges ahead, our tool will provide them with much-needed analytical machinery to assess tradeos between future infections and mobility restrictions.
Our approach is not without its limitations, which we have discussed with policymakers. For instance, our mobility data from SafeGraph does not cover all POIs or populations (e.g., children), and our model makes necessary but simplifying assumptions about the dynamics of disease transmission. Furthermore, we specialize in modeling the eects of changes in mobility on infection rates, but not all of the mobility policies that we analyze are directly actionable; for example, changing current mobility levels to 50% of 2019 mobility. Further work is required to analyze how to lever tools in policymakers' toolbox to actually reach target levels of mobility. Despite these limitations, our approach captures a valuable piece of the puzzle, as we provide policymakers with a quantitative and comprehensive near real-time analysis of the eects of mobility on transmission. As we move forward, we will build dashboards for other US states and regions, and continue developing new use cases and technical advances for our model so that we can best support the needs of policymakers around the country.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.20.21254022 doi: medRxiv preprint A APPENDIX A.1 Details to our approach A.1.1 POI and CBG inclusion criteria. Since we model one MSA at a time, we need to dene ltering criteria to determine which POIs and CBGs to include in the model. First, we include all POIs that meet the following requirements: (1) the POI is located in the MSA; (2) SafeGraph has visit data for this POI for every hour from 12am on September 1 2020 to 11pm on November 30 2020 10 ; (3) SafeGraph has recorded this POI's median dwell time and visitors' home CBGs for at least one week from January to October 2020; (4) SafeGraph provides this POI's area in square feet; (5) this POI is not a "parent" 11 POI. After determining the set of included POIs, we keep the union of CBGs that are located in the MSA and those that had at least one recorded visit to at least 100 of the kept POIs; this means that CBGs from outside the MSA may be included if they visit this MSA frequently enough.
A.1.2 Derivation for mask-wearing. Here, we explain how we incorporated mask-wearing into our model dynamics. First, note that without mask-wearing, both infection rate equations (4 and 5) can be written in the generalized form V /# , where V is the transmission rate, is the number of infectious individuals present, and # is the total number of individuals present. Furthermore, the expected number of new infections is ( (V /# ), where ( is the number of 10 We use this range because we want to ensure that the POI was recently active, and we x this range even as we model periods from both the rst and second wave of infections because we want to keep the set of POIs xed across experiments. 11 Parent POIs consist of a small fraction of POIs that overlap with and include visits from their "children" POIs (for example, malls). To avoid double-counting visits, we remove all parent POIs from the dataset. susceptible individuals present. Following Eikenberry et al. [12], let c 2 [0, 1] represent the fraction of the population wearing a mask. We can divide the infectious and susceptible populations into the infectious-unmasked group, * := (1 c) ; the infectious-masked group, " := c ; the susceptible-unmasked group, ( * := (1 c)(; and the susceptible-masked group, ( " := c(. Let the random variable -* represent the number of unmasked susceptible visitors who become infected, and let -" represent the number of masked susceptible visitors who become infected. Then, we can derive the expected values of these variables by separating the cases where they become infected by an unmasked infectious person versus by a masked infectious person: where n > 2 [0, 1] is the "outward" eciency of the mask (how much it prevents a masked infectious person from transmitting) and n 8 2 [0, 1] is the "inward" eciency (how much the mask protects a masked susceptible person from catching the disease).
If we assume n > = n 8 = n, as [12] do in their experiments, and substitute back in the denitions of * , " , ( * , and ( " , we nd that the total expected number of infectionssimplies to In other words, we simply scale the original expected number of new infections by a factor of (1 nc) 2 . Plugging this general form back into the POI and base infection rates yields equations 4 and 5.
A.1.3 Model initialization. We use the county-level death counts from The New York Times to initialize the (⇢ ' states at the beginning of our simulations. For each county in the MSA, rst we convert its cumulative death counts to daily new deaths, and apply 2-week smoothing to the raw counts (as we did to the MSA-level counts). For a county . , let # deaths,. represent its smoothed actual number of new deaths on day 3. Our goal is to use this timeseries to estimate ( . , (B) . , and ' (B) . , the number of people in the county in each disease state for some simulation start hour B.
First, recall that our model assumes deaths are reported exactly X deaths /24 = 18 days after the person becomes infectious, and that on day 3 (B) = bB/24c, a fraction A (B) deaths of the newly infectious cases will eventually result in reported deaths. If we assume 1/24 of the infections on day 3 (B) occurred at hour B, then the number of individuals in county . who became newly infectious at hour B must be # deaths ). Furthermore, our model assumes that exposed individuals always have a 1/X ⇢ probability of transitioning into the infectious state, so the maximum likelihood estimate of ⇢ (B) . Since it takes on average X ⇢ hours for individuals to transition from exposed to infectious, then we estimate . Similarly, since it takes on average X hours for people to transition from infectious to removed, we set ' , where C = 0 represents the start of the pandemic. In other words, by hour B, we assume everyone who transitioned into before hour B X has reached '. Finally, we set ( . . We note that these are rough 10 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) area of POI ? 9 in square feet Variable (SafeGraph) F (C ) 8 9 # visitors from CBG 2 8 to POI ? 9 at time C Variable (SafeGraph) c (C ) fraction of population wearing a mask at time C Variable (IHME) n mask eciency 0.5 [12] Table A1: Model parameters. If the parameter has a xed value, we specify it under Value; otherwise, we write "Variable" to indicate that it varies across CBG / POI / MSA. Eikenberry et al. [12] estimate that inward mask eciency could range from 20-80% for cloth masks, and outward eciency could range from 0-80%, with 50% perhaps typical; thus, we set n to 0.5.  Table A2: Results from ablation studies. We tested the Washington DC model in two time periods, comparing the full model versus its ablations (Section 3). We nd that the full model achieves lower RMSEs than each of the ablations in both time periods, and the impact of removing any feature is substantially larger when tting the rst wave.

Model
estimates, but the uncertainty captured by our parameter selection and stochastic realizations should more than cover the uncertainty carried in our initialization procedure. Furthermore, the estimates produced by this method align with what we would expect: for example, it predicts that by November 1, 2020, across all counties in the US, the median county-level proportion of the population in ' was 19%, with an interquartile range of 9%-34%.
For a CBG 2 8 in county . , we set its initial states to match the proportions of the county's states; for example,' . . However, due to uncertainty in reported deaths and in our estimation method, after setting initial estimates for all CBGs in the MSA, we shrink each CBG's estimate for every disease state toward the mean over all CBGs. For example, ultimately we set where < is the total number of CBGs in the MSA and U 2 [0, 1] is a shrinkage parameter that controls how much we shrink toward the mean. Shrinkage allows us to be more conservative about our estimates, especially for CBGs with unusually high or low estimates for any of the disease states. Since we have greater uncertainty in reported deaths early in the pandemic, for the rst wave period that we model (March to May 2020), we set U = 0.5, and for second wave period (November 2020 to January 2021), we set U = 0.1.

A.2 Dashboard's Mobility History Panel
The Mobility History report is available upon selection of the "Mobility History" button on the POI Navigation Bar. It provides aggregated POI visits per MSA and POI category, based on SafeGraph data, for every week from 2019 through the present time. In addition to providing raw data in tabular format, the report includes dierent visualizations of that data, including graphs of mobility counts in 2019 and 2020; the proportion of 2020-2021 foot trac vs. 2019-2020 foot trac by POI category, including bar graphs for the most recent week in the set for easier visualization; and the percent dierence by POI category ( Figure A1). This allows public health experts to review mobility trends and compare them to other COVID-19 indicators to make correlations and help inform them as they plan their guidance. 11 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 26, 2021. ; https://doi.org/10.1101/2021.03.20.21254022 doi: medRxiv preprint