Abstract
During the COVID-19 epidemic, many health professionals started using mass communication on social media to relay critical information and persuade individuals to adopt preventative health behaviors. Our group of clinicians and nurses developed and recorded short video messages to encourage viewers to stay home for the Thanksgiving and Christmas Holidays. We then conducted a two-stage clustered randomized controlled trial in 820 counties (covering 13 States) in the United States of a large-scale Facebook ad campaign disseminating these messages. In the first level of randomization, we randomly divided the counties into two groups: high intensity and low intensity. In the second level, we randomly assigned zip codes to either treatment or control such that 75% of zip codes in high intensity counties received the treatment, while 25% of zip codes in low intensity counties received the treatment. In each treated zip code, we sent the ad to as many Facebook subscribers as possible (11,954,109 users received at least one ad at Thanksgiving and 23,302,290 users received at least one ad at Christmas). The first primary outcome was aggregate holiday travel, measured using mobile phone location data, available at the county level: we find that average distance travelled in high-intensity counties decreased by −0.993 percentage points (95% CI −1.616, −0.371, p-value 0.002) the three days before each holiday. The second primary outcome was COVID-19 infection at the zip-code level: COVID-19 infections recorded in the two-week period starting five days post-holiday declined by 3.5 percent (adjusted 95% CI [-6.2 percent, −0.7 percent], p-value 0.013) in intervention zip codes compared to control zip codes.
One sentence summary In a large scale clustered randomized controlled trial, short messages recorded by health professionals before the winter holidays in the United States and sent as ads to social media users led to a significant reduction in holiday travel, and to a decrease in subsequent COVID-19 infection at the population level.
Main text
Nurses and physicians are among the most trusted experts in the United States (1,2,3). Beyond the individual relationship with their patients, can these health professionals influence behavior at scale by spreading public health messages using social media?
During the COVID-19 crisis many healthcare professionals used social media to spread public health messages (3). For example, the Kaiser Family Foundation has sponsored a large project where doctors have recorded video to provide explanation about COVID-19 vaccination and dispel doubts (1). Since individual adoption of preventative behavior, from mask wearing and staying at home to vaccination, is key to the control of this and future pandemics, it is very important to know whether this communication is effective.
In previous work, we have shown, in online experiments, that video messages, recorded by a diverse group of doctors, affect the knowledge and behaviors of individuals and, and that these effects seem to be strong regardless of race, education, or political leanings (4,5). But there is no systematic evaluation of similar messages when distributed as part of large-scale public health campaigns. Furthermore, given the large sample required, it has not been possible so far to test the impact of such public health campaigns on COVID-19 infection, so the clinical significance of those finding was unclear.
In this study, we sought to estimate whether short video messages recorded by nurses and doctors, and sent on a massive scale as part of a social media ad campaign could impact both behavior and COVID-19 infections at the population level.
In November 2020, the number of COVID-19 cases was rapidly increasing in the United States. Due to concerns that holiday travel would lead to a surge in the epidemic, the Centers for Disease Control and Prevention (CDC) recommended that people stay home for the holidays.
In this context, we ran two large clustered randomized controlled trials with Facebook users. Before Thanksgiving and Christmas, physicians and nurses (all co-authors of this project) recorded twenty-second videos on their smart phones to encourage viewers to stay home for the holidays. Facebook subscribers in randomly selected zip codes in 820 counties in 13 states received these videos as sponsored content (ads). Over 11 million people received at least one ad before Thanksgiving (35% of users in the targeted regions), and over 23 million did before Christmas (66% of users in targeted regions).
The purpose of this study was to identify whether these short videos would influence population level holiday travel in the targeted regions, and in turn a decline in COVID-19 cases after the holidays.
METHODS
Trial Oversight
The design was approved by the institutional review board of the Massachusetts Institute of Technology (MIT) with Massachusetts General Hospital (MGH), Yale and Harvard ceding authority to MIT IRB. Messages were produced by the research team and approved to run (without modification) after going through Facebook’s internal policy review to ensure compliance with policies. Primary outcomes were registered on ClinicalTrials.gov. There was just one deviation from the pre-registration: we initially planned to construct the mobility outcome from fine-grained data. Since the publicly available mobility data is at the county level, we use county-level mobility data instead.
Intervention
Messages encouraging viewers to stay home for the holidays were recorded on smartphones by six physicians before Thanksgiving, and nine physicians and nurses before Christmas who varied in age, gender, race and ethnicity.
For Thanksgiving, the script of the video was:
“This Thanksgiving, the best way to show your love is to stay home. If you do visit, wear a mask at all times. I’m [Title/ NAME] from [INSTITUTION], and I’m urging you: don’t risk spreading COVID. Stay safe, stay home.”
A similar script was recorded at Christmas. The videos were then disseminated as sponsored content to Facebook users from a page created for the project. The videos and the Facebook page are available on the project website (https://www.povertyactionlab.org/project/covid19psa). In the Supplementary Appendix, we provide details on the campaign and full scripts.
Trial Design, Eligibility, Randomization and Recruitment
Eligibility for the trial and randomization strategy were determined by data availability and power considerations. Movement range data computed by Facebook is publicly available at the county-level. COVID-19 level data is available at the zip code level in some states. We thus randomized both at county and zip code level to have experimental variation for each level. The CONSORT diagram (Figure 1) describes the factorial design and the allocation of clusters to each arm.
Before the Thanksgiving campaign, we selected 13 states where weekly COVID-19 case-counts data were available at the zip code level (see maps in Figure S1a and S1b) and selected counties within these states where this data was available.
The research team randomly allocated counties to be “high-intensity” (H) or “low-intensity” (L) with probability ½ each. In H counties, the research team randomized zip codes into intervention with probability ¾ and control with probability ¼. In L counties, zip codes were randomized into intervention with probability ¼ and control with probability ¾. Randomization was performed with Stata prior to each of the two interventions.
The lists of zip codes for each intervention were then provided to our marketing partner AdGlow, who managed the advertising campaigns on Facebook. Within the treated zip codes, AdGlow ran ads to allocate the sponsored video content to users, aiming to reach the largest number of people given the advertising budget (see Supplement 1, Section A for further details about Facebook ad campaigns). The video messages were pushed directly into users’ Facebook feeds (three to five times per user on average), and users were then free to either watch, share, react to, or entirely ignore the content. We did not recruit individuals for the study and do not use individual level data. At Thanksgiving, 30,780,409 videos were pushed to 11,954,109 users, and at Christmas, 80,773,006 videos were sent to 23,302,290 users. AdGlow provided us with overall engagement figures: Each time a user had an opportunity to view a campaign message, 12.3% watched at least 3 seconds of the video at Thanksgiving and 12.9% at Christmas, while 1.7% watched at least 15 seconds at Thanksgiving and 1.4% at Christmas. Our engagement rates of 12-13% (measured as the total of clicks, 3-second views, shares, likes, and comments divided by total impressions) were well above industry standard benchmarks for Facebook ads, 1%-2%, and Facebook video posts, 6% (14, 15).
We determined that a sample of 820 counties would provide 80% power to detect effect sizes of 0.2 standard deviations for county-level outcomes, comparing intervention (H) vs. control (L). For outcomes with zip code level data, using intra-class correlations of 0.2 (0.475) a sample of 6,998 zip codes would provide 80% power to detect effect sizes of 0.057 (0.072) standard deviations.
Outcomes
Our primary outcomes are county level mobility and zip code level COVID-19 infections reported to state health authorities, which we regularly retrieved from state websites beginning on November 12, 2020 (a list of the websites is provided in Supplement 1, Section B).
The movement range data are produced by aggregating location information obtained from mobile devices of Facebook users that opted to share their precise information with Facebook, and adding some noise for privacy protection (6,7) (see Supplement 1, Section B for further details). The change in movement metric is the percentage change in distance covered in a day compared to the same day of the week in the benchmark period of February 2-29, 2020. The mobility data describes the behavior throughout the day, for people who were in each county that morning. Since the campaign was targeted based on home location, we can only capture its impact on travel away from home, not back home. Thus, we define holiday travel as travel during the three days preceding each holiday. The stay put metric is the share of people who stay within a small geographical area (a “bing tile” of 600m*600m) in which they started the day. We used it to compute the leave home variable as = 1-stay put on the day of the holiday (Thanksgiving Day, Christmas Eve, and Christmas Day).
The second primary outcome we study is the number of new COVID-19 cases per fortnight, calculated from the cumulative case counts we manually retrieved from county or state webpages, one or twice a week and cleaned. Our primary outcome is the number of new COVID-19 cases detected in each zip code during fortnight that starts five days after each holiday: given the incubation period of five days, this is the one two-week period where we should see an impact.
Statistical Analysis
The analysis was performed by original assigned group (intention to treat).
Effect on Mobility (County-level)
At the county level, the analysis compares the “high-intensity” counties to the “low-intensity” counties. Because, on average, only 75% of the zip codes in high-intensity counties received the intervention, and 25% in low-intensity counties received the intervention, this is “an intention to treat” specification which is a lower bound of the effect of treatment.
For any day or set of days, the coefficient of interest is β1 in the OLS regression: where yit is the outcome of interest on day t, and yi0 its baseline value. This regression is estimated for both campaigns together, and for each separately. Standard errors are adjusted for heteroskedasticity, and clustering at zip code levels when both campaigns are pooled (we also provide randomization inference p-values) (8). We present a regression controlling for state fixed effects and a set of county level outcomes chosen via machine learning (9) in Table S4 (in supplementary appendix).
Effect on Number of COVID-19 Cases (Zip Code-level)
To measure the effect on COVID-19 cases reported in each zip code, we run the regression: Where forthinghtly COVIDit is the number of new cases of COVID-19 detected in the fortnight beginning five days after each holiday (for primary outcome results), Treatedi is a dummy that indicates that zip code i was a treated zip code. The hyperbolic sine transformation is appropriate when the data is approximately lognormal for higher values, but a small number of observations have zero cases (10,11) (also see Supplement 1, Section C). The coefficient of “Treated” can be interpreted as a proportional change. In the supplementary appendix we explore robustness to other commonly used ways to handle zeros. We also investigate robustness by estimating the same regression for other fortnights.
Regression (2) is estimated for both campaigns pooled, and for the Thanksgiving campaign and the Christmas campaign separately, with county fixed effects (the randomization strata).
Standard errors adjust for heteroskedasticity (and clustering for the pooled specification) and we compute p-values with randomization inference. We estimate the impact of treatment overall, and separately in the two strata (high- and low-intensity counties).
In supplementary material, we also explore heterogeneity of effects by prior COVID-19 circulation and demographic variables. Analyses were performed using R, version 4.0.3, including the following packages (versions): stats (4.0.3), tidyverse (1.3.0), estimatr (0.28.0), readr (1.4.0), dplyr (1.0.5), lubridate (1.7.10), hdm (0.3.1), car (3.0.10), MASS (7.3.53), sandwich (3.0.0), foreign (0.8.80), readstata13 (0.9.2), readxl (1.3.1), quantreg (5.75). The data and all the statistical codes will be made available upon publication.
Role of the Funding Source
Facebook provided the ad credits used to show the ads and connected the research team with AdGlow, the marketing partner. The ad content went through the usual internal policy review at Facebook for compliance with policies. Facebook had no other role in the design or conduct of the trial, and no role in the interpretation of the data or preparation of the manuscript.
RESULTS
Trial Population
Of the 8,671 potentially eligible zip codes in the 13 states in the studies, 1,554 were removed before the Thanksgiving campaign because of missing COVID-19 infection data, and 119 were removed because they could not be matched to county-level census data, yielding a sample of 6998 zip codes in 820 counties. Prior to the Christmas campaign, 60 fully rural counties in the top tercile of votes for Donald Trump in the 2020 election were removed from the study. This was done out of caution and to avoid adverse effects. The research team was concerned that the messaging campaign might have adverse unintended effects in very rural, heavily Republican-leaning counties given the growing polarization in December. The remaining sample had 767 counties. We included in the campaign all zip codes in the intervention in the selected counties (even if they could not be matched to COVID-19 infection data). For the COVID-19 outcomes, we have a final sample of 6716 zip codes. The realized sample size of 820 counties at Thanksgiving and 767 counties at Christmas was close enough to the original sample size to not lead to significant loss in power.
Summary statistics on the sample that was randomized are shown in Table 1 (Figures S1a and S1b in the supplementary appendix shows their localization on the map). Counties had on average 36% Democrats, 62% Republicans (based on election share in 2020) and 46% of zip codes were classified as urban. On November 13, 2020, distance travelled was 8.73% lower than during the benchmark month of February 2020; In the Christmas sample, it was 8.89% lower. In both samples, 82.4% of people left home on November 13, 2020.
Effects of the Campaign on the Mobility of Facebook users
Figure 2 shows day-by-day regressions estimates of equation (1). Distance travelled away from the morning location declined a few days before each holiday in high-intensity counties, relative to low-intensity counties.
Table 2 shows that, pooling both campaigns together, distance travelled three days before each holiday was 4.384 percent lower than in February 2020 in high-intensity counties, and 3.597 percent lower in low-intensity counties. The adjusted difference was 0.993 percentage points (95% CI −1.616, −0.371, p. value 0.002). The effects were very similar at Thanksgiving (adjusted difference: −0.924 percentage point, 95% CI (−1.785, −0.063, p. value 0.035) and Christmas (adjusted difference: −1.041 percentage point 95% CI −1.847, −0.235, p value 0.011).
The intervention had no impact on the share of people leaving home on the day of the holiday (Table 2 and supplementary appendix Figure S2). On average, 72.33% of people left their home tile on the day of the holiday in high-intensity counties, and 72.39% in low-intensity counties (adjusted difference 0.030 95% CI (−0.361, 0.420), p. value 0.881).
Table S4 in the supplement shows that these results are robust to adding control variables chosen by machine learning from a large set of county-level covariates (12).
Effect of the Campaign on COVID-19 Cases
Table 3 shows that the campaigns were followed by a drop in COVID-19 cases in treated zip codes, relative to control zip codes, for the two-week period beginning five days after the holiday. The adjusted difference in asinh (covid) was 0.035 (adjusted 95% CI [-0.062, −0.007], p. value 0.013), which can be interpreted as a 3.5% reduction in COVID-19 cases. The effects were slightly smaller in magnitude at Thanksgiving (adjusted difference: −0.027 (adjusted 95% CI [– 0.059, +0.005], p. value 0.097) than at Christmas (adjusted difference, −0.042 95% CI [-0.073, - 0.012] p. value 0.007). These results are robust to alternative ways to treat zero (Tables S6a, S6b, and S6c in the supplement).
To provide evidence that these differences are indeed due to the campaign, and not to any pre-existing difference, Figure 3 show the results of estimating equation (2) for a number 2-weeks periods (omitting the five days following Christmas). There is no significant difference in intervention and comparison zip codes in any period other than the period where we expected an impact. This makes it very unlikely that the difference in COVID-19 cases is due to random chance.
Treatment Effect Heterogeneity
We test for several dimensions of heterogeneity of the effect of the campaign on mobility and COVID-19 infection in Tables S2a-b and S3a-g in the supplementary appendix: baseline COVID-19 infection, urban versus rural counties, education, and majority Republican versus majority Democratic counties.
We found no significant difference in the impact of the campaign either on mobility or COVID-19 cases by level of education, or between Republican and Democratic counties, or between rural and urban counties. We also did not find that the interaction between political leaning and urban designation is significant (Tables S3e and S3f in the supplement). The effects on COVID-19 infections are lower in counties with high infection at baseline.
DISCUSSION
There was widespread concern before the Thanksgiving and Christmas holidays that heavy travel and mixing households would lead to an increase in COVID-19 patients. Indeed, households did travel more around the holidays, though even then mobility remained lower than its February 2020 level.
In counties where a larger proportion of zip codes were randomly assigned to a high-coverage Facebook ad campaigns in which clinicians encouraged people to stay home before the Thanksgiving and Christmas holidays, Facebook users reduced the distance they travelled in the three days before the holidays. Although they were less likely to leave their homes on the day of the holiday, the clinical importance of this latter finding is unclear, since they could either have been spending time outside or visiting other households.
A potential concern before the campaign was that in a polarized environment, a campaign such as this one could be effective in some communities and backfire in others (this is why we excluded a few counties in the Christmas campaign). But the effects did not seem to depend on county characteristics, including political leanings. These findings accord with previous research that found that individuals are responsive to physician delivered messages, regardless of political affiliation (5).
We found a significant impact on new COVID-19 infections reported by health authorities 5 to 19 days later. These effects might be under-estimated, because the treatment and control zip codes are very close to each other, and the reductions in infection in treatment zip codes might also have led to a decrease in infection in neighboring places.
There are several limitations of the study. First, it is was conducted with Facebook subscribers and mobility is collected for Facebook users. Although Facebook has a remarkable reach, this remains just one type of media. Second, it was an ad campaign. The messages might have been more effective if they had been relayed by celebrities or locally known figures (12,13). Third, we tested one kind of message, recorded by clinicians on smartphones. The results could be different changing message content, identity of the messenger, length of message, production value of the videos, or name recognition of the originating organization.
Despite these limitations, the findings provide evidence that clinicians can be an effective channel to communicate life-saving information at scale, through social media. This a new role that physicians and nurses embraced during the COVID-19 crisis, and we demonstrate that this is another way in which they can prevent illness and save lives.
These findings also demonstrate, in a clustered randomized control trial, the impact of a travel reduction, a key non-clinical intervention whose impact had not been evaluated in a randomized controlled trial before.
The findings suggest directions for future work. In particular, would similar messages be effective in encouraging COVID-19 vaccine uptake?
Data Availability
Data is available by request from the authors.
DISCLAIMER
The findings and conclusions expressed are solely those of the authors and do not represent the views of their funders. Tristan Loisel (co-author) conducted the statistical analysis (and was not involved in the design of the trial). Esther Duflo (PI) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
GRANT SUPPORT
Supported by the National Science Foundation under award number 2029880 (MA, ED). Supported by the Physician/Scientist Development Award (PSDA) granted by the Executive Committee on Research (ECOR) at MGH (FCS), NIH P30 DK040561(FCS), L30 DK118710 (FCS).
DATA SHARING STATEMENT
The authors have indicated that they will be sharing data.
Emily Breza, Ph.D.
Harvard Department of Economics
1805 Cambridge Street
Cambridge, MA 02138
Marcella Alsan, M.D. M.P.H. Ph.D.
Harvard Kennedy School
79 John F. Kennedy Street
Cambridge, MA 02138
Burak Alsan, M.D.
Online Care Group
75 State Street
Boston, MA 02109
Abhijit Banerjee, Ph.D.
MIT Department of Economics
77 Massachusetts Avenue
Cambridge, MA 02139
Fatima Cody Stanford, M.D. M.P.P.
MGH Weight Center
50 Staniford Street, Suite 430
Boston, MA 02114
Arun G. Chandrasekhar, Ph.D.
Stanford Department of Economics
579 Jane Stanford Way
Stanford, CA 94305-6072
Sarah Eichmeyer, Ph.D.
University of Munich
Center for Economic Studies (CES)
Schackstr. 4 / I
80539 Munich
Germany
Traci Glushko, M.S.
Bozeman Health Deaconess Hospital
915 Highland Boulevard
Bozeman, MT 59715
Paul Goldsmith-Pinkham, Ph.D.
Yale School of Management
165 Whitney Avenue
New Haven, CT 06511
Kelly Holland, M.D.
Lynn Community Health Center
269 Union Street
Lynn, MA 01901
Emily Hoppe, M.S.
Johns Hopkins School of Nursing
525 N. Wolfe Street
Baltimore, MD 21205
Mohit Karnani, M.Sc.
MIT Department of Economics
77 Massachusetts Avenue
Cambridge, MA 02139
Sarah Liegl, M.D.
St. Anthony North Family Medicine
2551 W 84th Ave
Westminster, Colorado 80031
Tristan Loisel, M.Sc.
Paris School of Economics
48 Boulevard Jourdan
75014 Paris, France
Lucy Ogbu-Nwobodo, M.D.
Massachusetts General Hospital
55 Fruit St
Boston MA 02114
Benjamin A. Olken, Ph.D.
MIT Department of Economics
77 Massachusetts Avenue
Cambridge, MA 02139
Carlos Torres, M.D.
Chelsea HealthCare Center
151 Everett Avenue
Chelsea, MA 02150
Pierre-Luc Vautrey, M.Sc.
MIT Department of Economics
77 Massachusetts Avenue
Cambridge, MA 02139
Erica Warner, Sc.D.
M.P.H. Massachusetts General Hospital
55 Fruit St
Boston, MA 02114
Susan Wootton, M.D.
University of Texas Health Science Center
7000 Fannin Street #1200
Houston, TX 77030
Esther Duflo, Ph.D.
MIT Department of Economics
77 Massachusetts Avenue
Cambridge, MA 02139
Supplementary Appendix
Supplement 1. Methods, and Results
Methods
Section A. Facebook Ad Campaigns
We disseminated the messages using a Facebook advertising campaign that was managed by AdGlow, our marketing partner. On the Facebook advertising platform, there are many ways to structure a campaign. We selected a “reach” objective, which attempts to maximize the number of Facebook users seeing the ads, along with the number of times each user sees the ad, over a daily horizon or the lifetime of the campaign given the campaign budget. The Thanksgiving campaign had a daily “reach” objective, while the Christmas campaign had a lifetime “reach” objective. Facebook uses an algorithm to implement the campaign objective. (More information is available at https://www.facebook.com/business/help/218841515201583?id=816009278750214.)
An important element of the algorithm is the Facebook Ads Auction. All active ad campaigns define a target audience. For both of our campaigns, the target audience consisted of all Facebook users in the specified zip-codes. Every time there is an opportunity to show an ad to a user, there may be many active campaigns targeting that type of individual. An auction is used to determine the cost of the ad and which ad is shown to the user at that time, and the auction winner is the advertiser with the highest total value. Total value is a combination of three factors: the bid of each advertiser; the estimated action rate (whether the user engages with the ad in the desired way); ad quality, which is measured by Facebook and reflects feedback from previous viewers and assessments of so-called “low-quality attributes.” By defining total value as more than simply the advertiser’s bid, ads that are estimated to create more user engagement or that are of higher quality can beat ads with higher bids in the auction. In this way, the Facebook ad campaign algorithm and Ads Auction led to the delivery of campaign materials to 11,954,108 users at Thanksgiving and 23,302,290 users at Christmas. (More information about the Facebook Ads Auction is available at https://www.facebook.com/business/help/430291176997542?id=561906377587030.)
Section B. Outcomes
County level mobility data
Our mobility outcomes come from the publicly-available Facebook Movement Range dataset, which can be downloaded at https://data.humdata.org/dataset/movement-range-maps. The data are constructed from location information collected by Facebook from users who have opted into Location History sharing and are aggregated to the county level. The publicly released data is subjected to a differential privacy framework to maintain the privacy of individual Facebook users. First, regions with fewer than 300 users in a given data are omitted from the data set. Second, random noise is added during the construction of each metric to limit the risk of being able to identify individual users.
We use both the Change in Movement metric and the Stay Put metric in our analysis. Both are calculated daily and cover the period from 8pm to 7:59pm local time. Both metrics are based off of changes in locations across level-16 Bing tiles, which each represent an area of approximately 600m x 600m.
Change in Movement is a measure of how many tiles the average Facebook user starting in a given county travels through during the day. More specifically, the variable is constructed for each county, on each day following 5 steps: 1) the number of tiles visited is calculated for each user and is top-coded at 200; 2) the total number of tiles visited by all users in that county-day observation is calculated by summing over the top-coded tiles measure; 3) random noise is added to the total tiles measure following a LaPlace distribution with parameters selected to satisfy Facebook’s differential privacy targets; 4) the noisy total tiles variable is scaled by Facebook users observed in the data to generate an average for that day in each county; 5) finally, the average movement measure is scaled by an average baseline measurement for the county taken on the same day of the week between February 2-29, 2020.
Stay Put is calculated as the fraction of observed users in a given county who do not leave a single level-16 Bing tile for the whole day. Specifically, in constructing the public version of this metric, 5 steps are followed: 1) a binary indicator is calculated for each user based on whether they remained in a single level-16 Bing tile for the entire day; 2) the total number of users in each county staying put is generated; steps 3)-5) from the Change in Movement calculation are followed. When we use the Stay Put metric in our analysis, we instead create Leave Home = 1 - Stay Put so that larger values indicate more movement.
The Facebook Movement Range data are described in further detail at https://research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/.
Zip Code level COVID-19 data
The COVID-19 data was retrieved twice a week from the following State health websites. The data is reported by hospital or labs to the centralized State wide health department, which publishes the data we collected and used. Most states report positive cases based on PCR tests, but some (AZ, IL, MN) combine confirmed with probable cases.
Different states have different formats to report their data: some had clean spreadsheets, others had spreadsheets that were reformatted, and others had pdfs, that had to be converted into spreadsheets and cleaned. The data was retrieved manually and organized.
States reported the cumulative cases reported in each zip code. Cases are assigned to a zip code based on the address of the person who tested positive.
Some zip codes were not listed on the states’ websites. (we observe around 8k unique zips before dropping the censored ones, whereas the total zip count for these 13 states is a bit over 10k). There are multiple reasons for this, the most popular being aggregation of small zip codes into larger ones (there were other situations, like suppressing Tribal zips, or simply suppressing small zips instead of aggregating them), and the data were censored for zip codes with low case counts,
We cleaned and appended all the data we collected, totaling 6998 unique zip codes with unsuppressed, non-censored data. A list of the website from which the data was retrieved appears here.
AZ: https://www.azdhs.gov/covid19/data/index.php
FL: https://experience.arcgis.com/experience/96dd742462124fa0b38ddedb9b25e429
IL: https://www.dph.illinois.gov/covid19/covid19-statistics
IN: https://hub.mph.in.gov/dataset?q=COVID
ME: https://www.maine.gov/dhhs/mecdc/infectious-disease/epi/airborne/coronavirus/data.shtml
MD: https://coronavirus.maryland.gov/datasets/mdcovid19-master-zip-code-cases/data
MN: https://www.health.state.mn.us/diseases/coronavirus/stats/index.html
NC: https://covid19.ncdhhs.gov/dashboard
OK: https://looker-dashboards.ok.gov/embed/dashboards/80
OR: https://govstatus.egov.com/OR-OHA-COVID-19
RI: https://ri-department-of-health-covid-19-data-rihealth.hub.arcgis.com/
VA: https://www.vdh.virginia.gov/coronavirus/covid-19-data-insights/
Section C. Regression Models Details
Inverse Hyperbolic Sine function
The hyperbolic sine function is given by: , and the inverse hyperbolic sine function, is given by .
We chose to transform the fortnightly cases with this function, because it has the property to be equivalent to x close to 0, and to be equivalent to ln (x) when . It behaves like a logarithm for most our our observations, except that there is no singularity at 0.
Results
Section D. Figures and Tables
Supplement 2. Statistical Analysis Plan
The Statistical Analysis Plan can be accessed via this link: https://www.dropbox.com/s/ctqdw24vy2g3haq/NEJM_Statistical_Analysis_Plan.pdf?dl=0.
ACKNOWLEDGEMENTS
We thank the health team at Facebook for their in-kind financial support that allowed us to run the campaign, and for their logistical help. In particular, we thank Nisha Deolalikar. We also thank advisors Drew Bernard and Sarah Francis. We thank the team at AdGlow, in particular Camille Orellano and Lauren Novak, for running the campaign. We thank Alex Pompe from Facebook Data for Good for helping us to understand the Facebook mobility data. We thank the team at Damage Control, in particular Pradip Saha, for their tireless work in editing the videos. We thank Nikhil Shankar and Minjeong Joyce Kim for excellent research assistance. We are particularly grateful to all of the members of the “COVID-19 messaging working group” with whom developed and tested the original messages that led to this at-scale study.
Footnotes
First authors: Dr Breza and Dr Fatima Cody Stanford
References
Section E. References
- 2.