Impact of close interpersonal contact on COVID-19 incidence: evidence from one year of mobile device data

Close contact between people is the primary route for transmission of SARS-CoV-2, the virus that causes coronavirus disease 2019 (COVID-19). We sought to quantify interpersonal contact at the population-level by using anonymized mobile device geolocation data. We computed the frequency of contact (within six feet) between people in Connecticut during February 2020 – January 2021. Then we aggregated counts of contact events by area of residence to obtain an estimate of the total intensity of interpersonal contact experienced by residents of each town for each day. When incorporated into a susceptible-exposed-infective-removed (SEIR) model of COVID-19 transmission, the contact rate accurately predicted COVID-19 cases in Connecticut towns during the timespan. The pattern of contact rate in Connecticut explains the large initial wave of infections during March–April, the subsequent drop in cases during June–August, local outbreaks during August–September, broad statewide resurgence during September–December, and decline in January 2021. Contact rate data can help guide public health messaging campaigns to encourage social distancing and in the allocation of testing resources to detect or prevent emerging local outbreaks more quickly than traditional case investigation.

the home location of the device user. We do not use time of day to determine primary dwell location, as this could lead to mis-characterization of primary dwell location for people who work at night instead of during the daytime, for example.
For each device, we identify clusters of records as locations where other devices were stationary and nearby.
When two devices were stationary and in proximity to one another at the same time, the device locations and their corresponding GPS location errors are used to calculate a probability that the devices were in close contact, within six feet (approximately two meters). The resulting information is used to generate a contact event record including the date, time, location and horizontal accuracy of each device involved in the contact, both device IDs, and the computed probability of contact between the devices.
To avoid measuring spurious contact between people who are not actually close to one another, or contact between people who live together, we do not record contacts that occur in some places. A buffered polygon derived from roadway center lines is used to determine if a given contact event occurred on a roadway. If so, then the contact record excluded from the calculation of the contact rate within that region. This could result in missing close contact that occurs in vehicles, such as on buses, trains, or carpooling. Similarly, all contact events for devices at their estimated primary dwell location are tagged and excluded when computing contact rates.

Measuring close contact
We consider a simplified version of geospatial location information on Euclidean plane R 2 . We modify this setting below to account for the curvature of the earth. Suppose that for device location point i we observe the triple (X i , Y i , R i ) where (X i , Y i ) is the reported location (in longitude and latitude) of the mobile device and R i is the radius of horizontal uncertainty associated with the device location. We assume that the horizontal uncertainty radius R i is the (1 − α) × 100% quantile of the radial density of the device location. Specify this distribution as a symmetric bivariate Gaussian centered at the true device location (µ x , µ y ) with covariance matrix σ 2 i I, where I is the 2 × 2 identity matrix. Then (X i , Y i ) has density (1) In this paper, we use α = 0.05.
Define the Euclidean distance between the reported location of two points (X i , Y i , R i ) and (X j , Y j , R j ) as and fix a distance > 0. In this paper, we use equal to two meters. We want to evaluate the probability that points i and j are within meters of one another. This probability can be expressed as Now under the assumption that (X i , Y i ) and (X j , Y j ) have independent bivariate Gaussian distribution, the variance-scaled quantity follows the non-central chi-square distribution with 2 degrees of freedom and non-centrality parameter Since the true device locations and variances in (4) are not observed, we substitute the observed device locations X i , Y i , X i , and Y j , as well as the estimated variancesσ 2 i andσ 2 j computed from (1). Because the variance-scaled squared distance (3) follows the non-central Chi-square distribution, the probability that the two devices are within two meters, D ij ≤ 2, can be computed using standard statistical software.
In reality the Earth is not a plane and the Euclidean distance D ij is shorter than the true distance between i and j on the surface of the Earth. But for distant points or those whose uncertainty radius is large, it is necessary to evaluate longer distances on the surface of the Earth. We therefore substitute the Haversine distance [1] for the Euclidean distance D ij in the calculation above. The resulting Gaussian approximation is useful for small geodesic distances because we are interested in points that are less than two meters apart.
3 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03. 10.21253282 doi: medRxiv preprint To describe computation of the contact rate, let Z i (t) = (X i (t), Y i (t), R i (t)) be the location and corresponding horizontal uncertainty radius for device i at time t. A potential contact between devices i and j at time t occurs when the locations of the two devices Z i (t) and Z j (t) are stationary and nearby. Let D ij (t) be the computed distance between the two points i and j. When a potential contact occurs between i and j at time t, let be the probability that these devices are within meters of each other. Let A ad be the set of pairs of devices for which a potential contact event occurred within area a on day d. For a potential contact between a pair {i, j}, let t ij be the time of the potential contact. In area a on day d, the expected number of contacts is the sum of the probabilities of contact, across every potential contact event. We compute two contact rates for each area a and day d. First, we aggregate contact probabilities by the area in which the contact occurred. The contact rate by region of contact is Next, we aggregate contacts by the region (town) of the device's primary dwell location. Let A be the set of all regions and let h(j) be the primary dwell region of device j. The device home contact rate is where the indicator function 1{·} is 1 if its argument is true, and 0 otherwise.

Community case definition
Laboratory tests for SARS-CoV-2 and cases of COVID-19 are reportable to the Connecticut Department of Public Health [2]. A confirmed case of COVID-19 is defined as detection of severe acute respiratory syndrome coronavirus 2 ribonucleic acid (SARS-CoV-2 RNA) in a clinical or autopsy specimen using a molecular amplification test [3]. In this analysis, COVID-19 cases and SARS-CoV-2 tests reported to the Connecticut Department of Public Health among persons residing in long-term care facilities, managed residential communities (e.g., assisted living facilities) or correctional institutions were excluded. Cases and tests among staff working at these locations were not excluded.
The Connecticut Department of Public Health used data from COVID-19 case report forms and laboratory test 4 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint reports to identify cases and tests among residents of congregate settings as follows. For COVID-19 cases, public health follow-up is dependent on whether the person resides in a congregate setting (facility does contact tracing) or not (local and state public health do contact tracing). Therefore, the Connecticut COVID-19 case report form includes a field for healthcare providers to indicate whether the patient resided in a congregate setting, and if so, the type of setting. Because case report forms are not received for all cases, this information might also be completed during case investigations by state or local health department staff. For this analysis, cases among congregate setting residents were excluded by using these case report form data.
To identify tests among congregate setting residents, the Connecticut DPH used a geocoding process because residence in a congregate setting is not included in the laboratory result report form, and the number of tests is too large for public health staff to review individual records. This means that the likelihood of misclassification of congregate setting residence among cases is lower than the likelihood of misclassification for tests.
The home address of each person tested is geocoded by the Connecticut Department of Public Health by using a custom composite locator that first attempts to match on the parcel, then street map data. Approximately 90% of test reports are successfully geocoded. To identify tests performed among residents of congregate settings, geocoded results are compared to a spatial layer of known long term care facilities, managed residential communities, and correctional institutions. Records geocoded to within a 150 foot buffer around these locations are categorized as tests performed among residents of congregate settings.
After the steps above were used to exclude cases and tests among congregate setting residents, the Connecticut Department of Public Health tabulated the aggregate count of cases and tests by town for each day, which were used in this analysis. No individually identifiable information was used in the analysis reported here,

Transmission model for prediction of COVID-19 infections, cases, hospitalizations, and deaths
We developed a county-stratified deterministic model based on the susceptible-exposed-infective-removed (SEIR) framework to represent COVID-19 transmission and predict case counts in Connecticut using the contact rate. Here we provide a brief description of the model and calibration approach. A detailed description of the model is given in [4]. In the model, the population of each county is divided into 10 compartments. Individuals begin in the susceptible (S) compartment. Exposed individuals (E) may develop either asymptomatic (A), mild (I M ), or severe (I S ) infection. Asymptomatic and mild infections resolve without hospitalization and do not lead to death. Mild symptomatic cases self-isolate (move to R M ) shortly after development of symptoms, and transition to recovery (R) when infectiousness ceases. All severe cases require hospitalization (H) unless hospitalization capacity is exhausted, in which case they transition toH representing hospital overflow, then to recovery (R) or death (D). The model assumes a closed population and does not capture non-COVID-19 deaths. Let N i be the population size of county i and J i the set of counties adjacent to county i. Let C (i) represent hospitalization capacity in county i, which may vary over time. Transmission dynamics for county i are given by the following system of ordinary differential equations (ODE): 6 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The function η (i) = 1 + exp(0.5(C (i) − H (i) )) −1 is a "soft" hospitalization capacity overflow function. The models captures infection "community" transmission in non-congregate settings, and excludes cases and deaths occurring in settings like nursing homes and prisons. We assume that recovered individuals remain immune to reinfection for the duration of the study period. The analysis was performed using the R statistical computing environment [5]. We used package deSolve to perform numerical integration of the ODE system [6].

Time-varying model parameters
Recognizing that many of the model parameters are unlikely to be constant over time, we represent the most critical parameters as functions of time. These parameters include: transmission parameter β, rates α I M and α A (reciprocals of average duration of infectiousness among mildly symptomatic and asymptomatic individuals), q I S -proportion of infections, which are severe and require hospitalization, rate of hospital discharge γ H , and hospital case fatality ratio m H .
We use data on close contact to parameterize temporal dynamics of transmission parameter β: where M contact (t) is a measure of close contact at time t relative to the pre-epidemic level, and exp[B(t)] is a function that approximates residual changes in transmission parameter β that are not explained by changes in close contact or other time-varying parameters. B(t) is a smooth function obtained by applying spline smoothing on a piecewise linear function B * (t), where B * (t) is modeled with B * (w) = [(w−t 0 )/14] defined on bi-weekly knots w = {t 0 , t 0 + 14, t 0 + 28, . . .} over the observation period and linearly imputed between the knots. We model the vector of random effects using a random walk of order one: For the hyperparameter σ we let 1 with shape parameter a = 2.5 and rate parameter b = 0.1.

7
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; Rates of isolation or recovery α I M and α A are assumed to depend on the testing volume. Widespread testing is assumed to identify infectious cases sooner, resulting in faster isolation and shorter duration of infectiousness.
Via its relationship to age distribution of cases, probability of severe infection is assumed to change over time, likely due to higher adoption of social distancing measures by individuals with worse prognosis [7]. Timevarying rate of hospital discharge γ H was obtained from the Connecticut Hospital Association (CHA) based on the analysis of insurance claims data. Temporal variation in the hospital case fatality ratio m H was estimated based on data on daily hospital deaths and admissions. Table 1 summarizes the parameters in the model and their meaning. We fix some of the model parameters whose values can be estimated based on prior research.

Calibration data
We calibrate the joint distribution of model parameters at the state level using the observed dynamics of confirmed COVID-19 hospitalizations census, cumulative COVID-19 hospitalizations, and cumulative number of deaths among hospitalized cases, which was obtained from the Connecticut Hospital Association [35]. In addition to close contact data, we use the following data to inform time-varying model parameters: average hospital length of stay by month (source: CHA [35]), testing volume data, and data on the age distribution of confirmed cases (source: Connecticut Department of Public Health (CT DPH) daily reports [36]). We calculated noninstitutionalized county-level population and age structure in Connecticut using the 2014-2018 estimates of the American Community Survey [37]. Daily total available hospital beds (including occupied) in each county were obtained from the Connecticut Hospital Association/CHIMEData [35,38] and used as hospitalization capacity values on a given date.
Since available hospitalizations data does not disaggregate by the patient's place of residence at the time of diagnosis or hospitalization, we estimate the time-varying distribution of hospitalizations and hospital deaths coming from congregate and non-congregate settings based on the daily distribution of COVID-19 death counts occurring in hospitals by the type of residence at the time of diagnosis (congregate vs. non-congregate) and a case fatality ratio among hospitalized residents of congregate settings. These data were obtained from Connecticut Department of Public Health. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

9
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ;

Model calibration and Bayesian posterior inference
We calibrate a joint posterior distribution of model parameters θ = (β 0 , q A , α I S , m H,0 , E 0 , L H , L D , ) and hyperparameters σ = (σ h , σ u , σ d , σ ) to the observed data using a Bayesian approach with a Gaussian likelihood.
We also accommodate reporting lags in observed hospitalizations (L H ) and hospital deaths (L D ). The distributions of hospitalizations census, cumulative hospitalizations, and hospital deaths are given by: where therefore L H and L D should not be strictly interpreted as reporting lags. We assume the date of epidemic onset to be February 16, 2020 -21 days before the first case was officially confirmed in Connecticut on March 8th, 2020, and initialize the model with E 0 exposed individuals at the time of epidemic onset and set the initial size of all downstream compartments to be zero. We assume a prior distribution on E 0 and calibrate it along with other model parameters. The county-level distribution of E 0 is fixed and was estimated based on the county population size and dates of first registered case and death in each county. We put the same independent Inv-Gamma(a, b) prior on the three hyperparameters σ 2 h , σ 2 u , and σ 2 d with a = 50 and b = 5 × 10 7 .
We construct the joint posterior distribution over unknown parameters (θ, σ) as Each likelihood term is weighted by z(t) (observation weight at time t) times the weight assigned to a respective time series. We set w h = 0.89, w u = 0.01, and w d = 0.1. We place a large weight on the hospitalizations 10 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; Sampling from the joint posterior distribution of (θ, σ) given in (11) is performed using Markov Chain Monte Carlo (MCMC). We employ a hybrid algorithm that combines elliptical slice sampling (ESS) [39], Gibbs sampling, and Metropolis-Hastings sampling with random walk proposals. Details of the algorithm implementation are provided in [4]. Bottom row of Figure 1 shows statewide model fit to hospital census, hospital deaths, and total deaths.

Model projections for case counts at the town level
To produce town-level projections, we use parameters estimated from the state-level joint posterior distribution and town-level inputs. Town-level data consist of the normalized close contact function, town population size, and town-specific initial model compartment conditions. Normalized close contact at the town level is obtained by adding an intercept to the town level contact that equals the median difference between the state and the 11 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint town-level contact for all dates t after April 7, 2020. This step enables the application of a random effects vector estimated at the state-level to the town-level contact data. The normalization approach preserves town-level contact dynamics and contact levels in each town early in the epidemic and immediately after the implementation of social distancing measures. Town-specific initial conditions were computed from the statelevel posterior distribution of E 0 based on the town population size and its proximity to New York City.
We simulate epidemic dynamics for each of the 169 Connecticut towns using the model given in (7), modified to include only data from a single town. Using the model-projected dynamics of lagged daily infections within the town, we estimate the time-varying COVID-19 case-finding rate for each town using a log-linear regression model: where Cases it is the 7-day moving average of reported COVID-19 cases in town i at time t, α i is a town effect, γ t is a time effect measured daily and Infections i(t−l) is the model-projected number of daily infections in town i at time t − l, and we set l = 14 days. As a final step to produce model-based estimates of daily cases in each of 169 Connecticut towns, we truncate regression-estimated town-level case-finding rate at 1 and apply this estimated rate to model-projected daily infections. The top row of Figure 1 shows statewide contact measure, projected infections, estimated and observed non-congregate COVID-19 case counts, and estimated case-detection rate. Estimates of statewide case counts is obtained by summing up estimated cases across 169 Connecticut towns.

Space-time regression model
To assess the relationship between close interpersonal contact and COVID-19 cases without assumptions about the dynamics of transmission, we develop a statistical model for describing the associations between town-level contact in previous days and current day COVID-19 cases using a distributed lag negative binomial regression framework. We emphasize that this "phenomenological" regression model does not use any SEIRtype assumptions about the relationship of contact, infections, and cases. Our purpose is to validate the usefulness of contact data in prediction of future case counts throughout the state.
In the model, each town has a unique set of parameters that describe the lagged associations with past contact, as we hypothesize that there may be spatial differences in these associations due to town-level features (e.g., population density). Spatial correlation between these parameters is also modeled since towns close to each 12 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint other may exhibit similar spatial patterns. Specifically, the model is given as where Y it is the number of COVID-19 cases recorded in town i on day t, n is the total number of towns in the analysis (n = 169), and m is the total number of included days of data (m = 351). In our application of the model, the first day of analysis (t = 1) occurs on February 29 th , 2020 and the final day (t = 351) occurs on February 13 th , 2021.
In (12) Contact in town i on day t is denoted as z it , and in our application of the model we include contact up to d = 28 days (i.e., 4 weeks) in the past. Town-specific (η i ) and time-specific (λ t ) random effects are included to account for separable spatial and temporal correlation, respectively, in the case data.
The θ ij parameters are town-and lag-specific, and describe the association between contact j days earlier in the town and current day cases. The collection of all lag parameters from town i, θ i = (θ i1 , . . . , θ i28 ) T , are decomposed into "global" and "local" effects to account for similarities in associations shared across all towns and town-specific deviations from that trend such that where θ = (θ 1 , . . . , θ 28 ) T represents the vector of parameters shared across all towns and θ i = θ i1 , . . . , θ i28 T are town-specific deviations from the global pattern. We anticipate that the global distributed lag parameters closer together in lag time will have similar behavior and model this directly by specifying 13 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint where MVN represents the multivariate normal distribution, 0 28 is a vector of 28 zeros, and the correlation between the global lag parameters is defined by The variance and smoothness of the process are described by τ 2 θ and φ, respectively, where smaller values of φ suggest higher correlation between parameters across the lags.
Spatial and cross-correlations between the town-specific lag parameters ( θ i ) and town-specific intercepts (η i ) are modeled using a multivariate conditional autoregressive model [40] given as where ψ −i is the entire set of ψ j vectors with ψ i removed and w ij is an indicator describing whether towns i and j share a common border or not. As is standard practice, towns are not considered to be neighbors of themselves, resulting in w ii = 0 for all i. This model specifies that a priori, the set of parameters from a specific town have a multivariate normal distribution with a mean vector equal to a weighted mean of its neighbors vectors. The amount of spatial similarity between towns is controlled by ρ ∈ (0, 1) where a value of ρ close to zero indicates near independence and close to one results in strong spatial smoothing. The cross-correlation between parameters within a given town is described by the unstructured covariance matrix Ω.
Finally, λ t represents a flexible time trend that is common to all towns in the analysis and is defined using an autoregressive model such that where λ 0 ≡ 0 and α ∈ (0, 1) controls the level of similarity across time. This effect serves to account for temporal correlation between the case counts and describe global patterns of risk that vary across time.

Prior distributions
We complete the model by specifying prior distributions for all introduced parameters. When possible, we opt for weakly informative priors so that the data are the main drivers of the inference. The priors are given 14 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Competing methods
To determine the importance of previous days contact on explaining trends in current day COVID-19 case counts, we also test two competing modeling options. First, we consider the "Global Contact" model, where (12) is modified such that θ i is replaced with θ (θ previously described). As a result, in Global Contact there are no town-specific lag parameters, only a single set that is shared across all towns. We expect this model to perform poorly compared to the full spatio-temporal model in (12) if there is spatial variability between these associations. Next, we consider the "No Contact" model, which modifies (12) by removing all of the contact information entirely. This model is likely to struggle to explain variability in the case data if prior days contact are important predictors. We note that both of these competing models technically represent subsets of the full spatio-temporal model in (12). For example, when θ i ≡ 0, the full model becomes Global Contact and when θ i ≡ 0 it becomes No Contact. Due to this flexibility, we anticipate that the full model will perform well regardless of the spatio-temporal trends in the data.
We fit each method to the data and compare the findings using multiple methods. First, we use Watanabe-Akaike information criteria (WAIC) [41] to compare the explanatory ability of each method. WAIC is a Bayesian model selection tool that identifies the model, among a small subset of competitors, that has the best balance of fit to the data and complexity, with smaller WAIC values being preferred. Next, we consider a posterior predictive model comparison tool focused on the predictive ability of each method [42]. Specifically, we calculate the posterior mean of the deviance of the negative binomial regression model evaluated at the observed data and data generated from the posterior predictive distribution (from the same design points as the observed data). Smaller values of this metric suggest improved predictive performance of the corresponding method.

Model fitting
We fit all models using Markov chain Monte Carlo sampling techniques, including a Pólya-Gamma latent variable approach for obtaining semi-conjugacy for many of the parameters within the negative binomial regression 15 . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint model [43]. From each model, we collect 100,000 samples from the joint posterior distribution after removing the first 10,000 iterations prior to convergence of the model. We further thin the samples by a factor of 10 to reduce posterior autocorrelation, resulting in 10,000 samples with which to make posterior inference. Convergence of the models was assessed by visually inspecting individual parameter trace plots and calculating Geweke's diagnostic [44] for all relevant parameters. Neither tool suggested any obvious signs of non-convergence.

Mobile device coverage and representativeness in Connecticut
We investigated the coverage of mobile devices in the sample by town population, percent who do not identify as "only white", percent without a high school degree, and percent below the poverty level as defined by the federal government's official poverty threshold, which accounts for household size, number of children, and age of householder. Demographic data were obtained from the American Community Survey [37,45]. Figure 2 shows that device coverage tracks population size throughout the state, but coverage decreases slightly with the percent non-white, the percent that lack a high school degree, and percent below the poverty level. The town of Union is the smallest town in Connecticut, with a total population of 873 people. We do not know whether lower coverage is due to a lower percentage of town residents owning a mobile device, or because of systematic under-coverage of devices from these towns in our sample.
We received mobile device geolocation data in two overlapping batches, using different methods of anonymizing device IDs. In the first batch, covering February 1 -May 31, 2020, we observed a total of 573,452 unique device IDs in Connecticut. The average number of unique device IDs per day was 141,617. Each week an average of 80.5% of the device IDs carried over to the next week. In the second batch, covering May 1, 2020 to January 31, 2021, we observed a total of 788,842 unique device IDs in Connecticut. The average number of unique device IDs per day was 114,946. Each week an average of 77.6% of the device IDs carried over to the next week. Connecticut has a population of 3.565 million people [37,45]. Contacts occurring within a given town are not captured if we cannot calculate a primary device dwell location for both devices involved in the contact. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  • the user uninstalls the applications reporting data to X-Mode, • the user stops using an application that reports location data only while in use, • the user rotates their device ID, or • the user gets a new device with a new advertising ID. Figure 3 shows the number of unique devices per day in the Connecticut sample.

Interactive web application
We created an interactive web application to allow users to explore contact patterns in Connecticut over time, available at https://forrestcrawford.shinyapps.io/ct_social_distancing/. Figure 4 shows a screen shot of the interactive application. The application uses the Shiny framework [46] for the R computing environment [5], Leaflet for mapping [47], and tigris for shapefiles [48]. The map explorer is based in part on Edward Parker's COVID-19 tracker [49] and code [50]. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint  hospitals -that help identify block groups.

Comparison of contact rate to mobility metrics
In order to compare the contact rate described in this paper to other mobility metrics, we acquired Connecticut mobility data from Google [51], Apple [52], Facebook [53], Descartes Labs [54,55], and Cuebiq [56]. Google and Apple provide public data access, while Facebook and Cuebiq provide data through their respective Data For Good programs. We normalized all metrics to a day-of-week baseline using data from January or February depending on availability, and plot their percent change from baseline from February 2020 through January

2021.
Apple state-level data measures Apple Maps routing requests, categorized as transit, walking, or driving. Map routing requests are a proxy for mobility but might not represent actual trips. Movements for which Apple Maps directions are not needed, such as everyday trips for work, school, or shopping, might not be represented in routing request metrics. Figure 5  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ;

19
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in Google state-level mobility data measured visits to areas of interest, categorized as grocery and pharmacy, parks, residential, retail and recreation, transit stations, and workplaces. More detailed information about the definitions of these areas of interest, and the completeness of these categories, is not available. Figure 6 shows mobility metrics published by Google using the day-of-week median from January 3, 2020 to February 6, 2020 as the baseline. All categories other than transit stations and workplaces returned to near baseline levels by summer 2020, and all categories other than residential remained near or below baseline throughout winter 2020.
Facebook county-level mobility data measured the number of 600m by 600m geographic units visited by a device in a day. This metric summarizes how mobile people from different counties are, but might not represent the distance of travel, time away from home, or potential close contacts with others. Figure 7  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint and was available through November 1, 2020 1 . Figure 8 shows mobility data provided by Cuebiq with dayof-week median during February 2 -February 29, 2020 as the baseline. By July 2020, Cuebiq mobility levels returned to near baseline. Figure 9 shows Cuebiq's metric for "contact", when two or more devices are within 50 feet of each other within five minutes. Information about whether this metric takes spatial error (horizontal uncertainty) into account is not available. In July 2020, Cuebiq contact levels remained further below baseline than the Cuebiq mobility metric. The Cuebiq 50-foot contact metric was closer to baseline than our calculated contact rate during summer and fall 2020.
Finally, Descartes Labs state-level mobility data represents maximum distance devices have moved from the first reported location in a given day. Figure 10 shows the mobility metric provided by Descartes Labs with day-of-week median during February 17 -March 7, 2020 as the baseline. It was exceptional amongst the data sources in that mobility remained notably below baseline during March 2020 -January 2021. However, the percent decline in close contact was consistently larger than the observed percent decline in the Descartes Labs mobility metric.
Every mobility metric we studied returns to baseline faster than the contact rate, though Descartes Labs' mo- 1 Aggregated mobility data were provided by Cuebiq, a location intelligence and measurement platform. Through its Data for Good program, Cuebiq provides access to aggregated mobility data for academic research and humanitarian initiatives. This first-party data is collected from anonymized users who have opted-in to provide access to their location data anonymously, through a GDPR-compliant framework. It is then aggregated to the census-block group level to provide insights on changes in human mobility over time.

21
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint bility data provide the closest match to contact when presented on the percent change from February scale.
In general, mobility metrics rebounded to February 2020 levels during summer 2020, and remained near this baseline or declined slightly through fall and winter 2020. Close contact, however, remained well below baseline levels through January 2021, suggesting that trends in mobility do not necessarily align with trends in close contact.

Space-time regression model results
In Table 2, we display the WAIC and posterior predictive deviance model comparison results. WAIC clearly favors the full spatio-temporal method even though the model is more complex than the two competitors as indicated by the effective number of parameters (p WAIC ). This improvement in model fit however, overpowers this increased penalization term, leading to an improved overall balance. In terms of prediction, the full model is again preferred and seems to predict data with similar features to the observed data better than the competing models. Global Contact fits and predicts better than No Contact, suggesting that including contact information in some form within the model leads to improvements in both areas. Overall, these results show the importance of including prior day contact information in a flexible manner in order to explain variability in COVID-19 cases.
Based on these findings, we further explore results from the full spatio-temporal model.

23
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   In Table 3, we display posterior inference for the non-contact regression parameters (β j ), exponentiated for a relative risk interpretation. Results suggest that towns with increased population have more cases on average, and that towns with higher median household income and a larger proportion of residents that are non-Hispanic White and ≥ 65 years have fewer cases. None of the other included spatially-varying covariates had 95% credible intervals that excluded one.
In Figure 11, we show the estimated global lag parameters, θ, and 95% credible intervals across the different lag days from the full spatio-temporal version of the model. The figure suggests that increased contact on prior days 3-7 is associated with increased current day COVID-19 cases. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Daily Lag θ q q q q q q q q q q q q q q q q q q q q q q q q q q q q Figure 11: Estimated daily lag regression parameters (and 95% credible intervals) describing associations between contact and current day COVID-19 cases. Red bars indicate that the credible intervals exclude zero.

25
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint Figure 12: Contact for Connecticut towns and fitted SEIR model predictions with 95% uncertainty intervals.

26
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint Figure 13: Contact for Connecticut towns and fitted SEIR model predictions with 95% uncertainty intervals.

27
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint Figure 14: Contact for Connecticut towns and fitted SEIR model predictions with 95% uncertainty intervals.

28
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint Figure 15: Contact for Connecticut towns and fitted SEIR model predictions with 95% uncertainty intervals.

29
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint   is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint Figure 17: Contact for Connecticut towns and fitted SEIR model predictions with 95% uncertainty intervals.

31
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 12, 2021. ; https://doi.org/10.1101/2021.03.10.21253282 doi: medRxiv preprint