Are We There Yet? Big Data Significantly Overestimates COVID-19 Vaccination in the US

Public health efforts to control the COVID-19 pandemic rely on accurate surveys. However, estimates of vaccine uptake in the US from Delphi-Facebook, Census Household Pulse, and Axios-Ipsos surveys exhibit the Big Data Paradox: the larger the survey, the further its estimate from the benchmark provided by the Centers for Disease Control and Prevention (CDC). In April 2021, Delphi-Facebook, the largest survey, overestimated vaccine uptake by 20 percentage points. Discrepancies between estimates of vaccine willingness and hesitancy, which have no benchmarks, also grow over time and cannot be explained through selection bias on traditional demographic variables alone. However, a recent framework on investigating Big Data quality (Meng, Annals of Applied Statistics, 2018) allows us to quantify contributing factors, and to provide a data quality-driven scenario analysis for vaccine willingness and hesitancy.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) discrepancies in the estimates are large enough to significantly alter the relative rankings of states by rate of vaccine hesitancy, willingness, and uptake ( Fig. 1D-F). For instance, Missouri is the 11th most hesitant state according to Delphi-Facebook with 24.4% (95% CI: 23.0%-25.6%) of adult residents vaccine hesitant, but the Census Household Pulse estimates that only 16.7% (13.3%-20.1%) of the population is hesitant, making it the 36th most hesitant state.
These estimates also disagree with the uptake rates from the US Centers for Disease Control and Prevention (CDC). respectively. There is also little agreement in state-level rankings; for example, Massachusetts is ranked 48th in vaccine uptake by both Delphi-Facebook and Census Household Pulse, but 7th by the CDC. For context, for a state near the herd immunity threshold (70-80% based on recent estimates (7-9)), a discrepancy of 10 percentage points in vaccination rates could be the difference between containment and uncontrolled exponential growth in new SARS-CoV-2 infections.
Which of these surveys can we trust? A recently proposed statistical framework (1) permits us to interrogate and quantify the sources of error in big data. This framework has been applied to COVID case counts (10), and in other non-COVID settings (11). Its full application requires ground-truth benchmark data, which is available for vaccine uptake because vaccine providers in the US are required to report daily vaccine inventory and distribution to the CDC (2,12). We therefore are able to quantify the various components of estimation error driving the divergence among three surveys, apportioning it between data quality (due to bias in sampling, response, and weighting mechanisms), data quantity (driven by sample size and the weighting schemes), and problem difficulty (determined by population heterogeneity). This assessment then allows us to use the magnitude of data defect observed in vaccine uptake to conduct a data-driven scenario analyses for the key survey outcomes, vaccine hesitancy and willingness.

4
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Figure 2: Estimates of vaccine uptake for US adults compared to CDC benchmark data, plotted by the end date (in 2021) of each survey wave. 95% confidence intervals shown are calculated based on each study's reported standard errors and design effects from weighting; although those for Delphi-Facebook are too small to be visible.

Conflicting estimates of vaccine uptake and Big Data Paradox
We focus on the Delphi-Facebook, Census Household Pulse and Axios-Ipsos surveys because they are illustrative of surveys run by social media, governmental agencies, and survey firms, respectively. Delphi-Facebook and Census Household Pulse surveys persistently overestimate vaccine uptake relative to the CDC's benchmark. For example, on April 18, the CDC's uptake rate reached 50% (Fig. 2). Delphi-Facebook estimates would indicate that the US passed this same milestone three weeks earlier -with a purported 52.9% (95% CI: 52.6%-53.1) rate by March 27. The Census Household Pulse wave ending on March 29 estimated the uptake rate to be 46.8% (95% CI: 45.5%-48.0%), 8 percentage points higher than the CDC's 39% rate on the same day. Despite being the smallest survey by an order of magnitude, Axios-Ipsos' estimates track well the CDC rates, and their 95% confidence intervals contain the benchmark estimate from the CDC over 90% of the time (10 out of 11).

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.10.21258694 doi: medRxiv preprint The most concerning impact of biased big data is dire overconfidence. Fig. 2 shows 95% confidence intervals for vaccine uptake based on reported sampling standard errors and weighting design effects (13). Axios-Ipsos has the largest confidence intervals, but also the smallest design effects (1.08-1.24) suggesting that its accuracy is driven more by representativeness of the sample rather than post-survey adjustment. Census Household Pulse has small, but visible, 95% confidence intervals that have been greatly inflated by large design effects (4.65-4.85) indicating large weighting adjustments; however confidence intervals still fail to include the true rate of vaccine uptake. Most concerningly, confidence intervals for Delphi-Facebook are vanishingly small, driven by large sample size and moderate design effects (1.42-1.53), indicating that although samples are weighted, the adjustment is not nearly enough to correct for selection bias. This is a vivid illustration of the Big Data Paradox (1): the larger the data size, the surer we fool ourselves when we fail to account for data quality. Mathematically, the probability of an incorrectly-centered confidence interval (procedure) covering the truth vanishes quickly as the sample size increases, highlighting the critical importance of emphasizing data quality over data quantity.
Statistically, we can decompose the actual error into quantities capturing data quality, data quantity, and problem difficulty (1). Given a variable of interest, , in a finite population of units = 1, . . . , , of which a sample of size is observed, where = 1 if unit is recorded in the sample, Meng (1) shows that the error in using the sample mean¯ to estimate the population mean¯ can be written as¯ It is no surprise that, holding all else fixed, increasing the fraction of the population sampled ( / ) will decrease error, or that lower population heterogeneity (small standard deviation of ) results in lower estimator variance and hence lower error. However, the quantityˆ , is 6 less familiar. It measures the population correlation between the outcome of interest, , and the indicator that a unit is observed in the sample, . Meng (1) termsˆ , = Cor( , ) the data defect correlation (ddc). The ddc captures both the sign and magnitude of selection bias, and is therefore a measure of data quality. Studies with values ofˆ , close to 0 indicate low (or no) selection bias for a particular outcome , and therefore have low estimator error.
This identity also allows us to calculate the size of a simple random sample that we would expect to exhibit the same level of error as what was actually observed in a given study, eff .
Unlike the classical effective sample size (13)  The data defect correlation, ddc, increases over time for Census Household Pulse and, most significantly, for Delphi-Facebook (Fig. 3D). For Axios-Ipsos, it is much smaller and steady over time, consistent with what one expects from a representative sample. This decomposition suggests that the increasing error in estimates of vaccine uptake in Delphi-Facebook and Census Household Pulse is primarily driven by increasing ddc, or bias in the mechanism governing which population units are observed in each sample.
A ddc of 0.008 (observed in Delphi-Facebook in late April) is large enough to drive effective sample size ( eff ) below 20, even in the scenario of 5% error in the CDC benchmark (Fig. 3E).
Delphi-Facebook records about 250,000 responses per week so the reduction in effective sample size is over 99.9%. The maximumˆ , that we observe for Census Household Pulse is approximately 0.002, yet it still results in reduction in sample size of more than 99% by the same measure (Fig. 3F). These dramatic reductions are consequences of the Law of Large Populations, which we shall discuss in the concluding section.

Comparing study designs and demographic subgroups
Sampling frames, survey modes, and weighting schemes are all instrumental to survey reliability. Table 1 compares the three surveys across these dimensions (Details in supplementary materials A). All surveys are conducted online, but vary greatly in methods of respondent recruitment: CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. between them is largely inconsequential. All studies weight on age and gender, and Axios-Ipsos and Census Household Pulse also weight on education and race/ethnicity. Education, a known correlate of propensity to respond to surveys (15) and social media use (16), as well as race/ethnicity, are notably absent from Delphi-Facebook's weighting features.

10
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ;

11
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; Table 2 illustrates some consequences of these study designs. For education levels, Axios-Ipsos comes closest to the actual proportion of US adults even before weighting. Both Axios-Ipsos and Census Household Pulse weight on some form of education, i.e., they correct for the unrepresentativeness of the original sample with respect to education. Delphi-Facebook does not explicitly weight on education, and hence the education bias persists in their weighted estimates; those without a college degree underrepresented by nearly 20 percentage points. We observe a similar pattern with respect to race/ethnicity. Delphi-Facebook's weighting scheme does not adjust for race/ethnicity, and hence their weighted sample still over-represents White adults by 8 percentage points, and under-represent Black and Asian proportions by around 50 percent of their size in the population.
The three surveys examined here show that people without a 4-year college degree are, compared to those with a degree, both less likely to have been vaccinated and more willing to be vaccinated if a vaccine is available ( Table 2). Generalizing findings from these sub-populations to the general population requires the assumption that these measured vaccination behaviors do not differ systematically between non-respondents and respondents, within each education level.
If people with lower educational attainment are under-represented in the survey, the survey will suffer from an over-estimation of vaccine uptake.
The unrepresentativeness with respect to race/ethnicity and education explains part of the discrepancy in outcomes. The racial groups that Delphi-Facebook undersamples tend to be more willing and less vaccinated. In other words, re-weighting the Delphi-Facebook survey to upweight racial minorities may bring willingness estimates closer to Household Pulse and the vaccination rate closer to CDC.
However, demographic composition alone cannot explain all of the discrepancies. Census Household Pulse weights on both ethnicity and education and still over-estimates vaccine uptake by a considerable margin in late May. But adults without a college degree are also more 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021.  Therefore, other variables, such as occupation and rurality, may contribute to the differences in estimates, but we are unable to directly examine them because they are either not reported in the surveys or no population benchmark exists. However, we do know from CDC that there is large variation in vaccination rates by rurality (2), which is known to be correlated with home internet access (17), an important factor influencing the propensity to complete an online survey. Neither the Census Household Pulse nor Delphi-Facebook weights on sub-state geography, which may mean that adults in more rural areas are less likely to be vaccinated and 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.10.21258694 doi: medRxiv preprint also underrepresented in the surveys, leading to overestimation of vaccine uptake. Analysis of age-group-level ddc (see supplementary materials D.1) further suggests that selection bias in Delphi-Facebook may be correlated with the relative timing in which different age groups became eligible for the vaccine.
Delphi-Facebook and Census Household Pulse may also be non-representative with respect to political partisanship, which has been found to be correlated strongly with vaccine behavior (18,19). Axios-Ipsos incorporates political partisanship in their weighting for about 40% of their waves, but neither Delphi-Facebook nor Census Household Pulse collects partisanship of respondents.

Assessing hesitancy and willingness via scenario analysis
We can leverage our knowledge of the estimation error for vaccination to provide improved estimates for hesitancy and willingness because the proportions of vaccinated ( ), hesitant ( ), and willing ( ) individuals must sum to 1. For example, if is an overestimate by 20 percentage points, the under-estimate of and must together sum to 20 percentage points. Naively, one might derive "corrected" estimates of and by increasing each raw estimate by 10 percentage points. However, we can improve upon this approach by using ddc to instead allocate the selection bias in vaccine uptake to each and .
As we show in supplementary materials E, the constraint + + = 1 implies that the sum of s of uptake, hesitancy, and willingness (denoted by , and , respectively) is approximately 0 (it is not exactly zero because different variables can have different variances).
Introducing a tuning parameter that controls the relative weight given to selection bias of and on the ddc scale, the zero-sum approximation implies that we can set 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; This allocation scheme allows us to pose scenarios implied by values of that capture three plausible mechanisms driving selection bias. First, if hesitant ( ) and willing ( ) individuals are equally under-sampled ( ≈ 0.5), leading to over-representation of uptake, correcting for data quality implies that both Willingness and Hesitancy are higher than what surveys report ( Fig. 4, yellow bands). We label this the uptake scenario because, among the three components, uptake has the largest absolute ddc. Alternatively, the under-representation of the hesitant population could be the largest source of selection bias, possibly due to under-representation of people with low institutional trust who may be less likely to respond to surveys and more likely to be hesitant. This implies ≈ 0 and is shown in the red bands. The last scenario addresses issues of access, where under-representation of people who are willing but not yet vaccinated is the largest source of bias, perhaps due to correlation between barriers to accessing both vaccines and online surveys (e.g., lack of internet access). This implies ≈ 1 and upwardly corrects willingness, but does not change hesitancy.
In the most recent waves of Delphi-Facebook and Census Household Pulse, the hesitancy scenario suggests that the actual rate of hesitancy is about 31-33%, almost double that of original estimates. In the uptake scenario, both hesitancy and willingness increase by about 5 percentage points. In the access scenario, the proportion of the adult population that is willing increases from 7-8% to about 21%, tripling in size, and suggesting that almost one fifth of the US population still faces significant barriers to accessing vaccines. This analysis alone cannot determine which scenario is most likely, and scenarios should be validated with other studies. However, we hope that these substantive, mechanism-driven scenarios are useful for policymakers who may need to choose whether to devote scarce resources to the Willing or Hesitant populations. Fig. 4 also shows that when positing these scenarios through a ddc framework, the estimates from Delphi-Facebook and Census Household Pulse disagree to a lesser extent than in the reported estimates (Fig. 1).

15
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

16
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Understanding the Law of Large Populations and its consequences
The three surveys discussed in this article demonstrate a seemingly paradoxical phenomenon, that is, the accuracy of our results decreases with the survey size. It is paradoxical because of our long-held intuition that estimation errors decrease with data sizes. However, as proved mathematically in (1), this intuition only applies to probabilistic samples in which the ddc (ˆ , ) is vanishingly small. More precisely, the value of the right-hand side of identity in Equation (1) depends on population size (through the  For example, when concerns of selection bias in data are raised, we often hear a common defense or hope that the revealed selection bias only affects that study, not necessarily other studies that use the same data. The notion of ddc confirms the correctness of this argument at the technical level, but also reveals its potentially misleading nature if it is used as the sole justification for doing business as usual. Indeed ddc is the correlation between a particular outcome and the data recording mechanism , and hence a large ddc for one outcome does not imply it will be similarly large for another. However, ddc reveals that estimator error resulting from selection bias is merely a symptom of unrepresentativeness of the underlying sample, as captured by the -mechanism. Selection bias tells us that respondents are not exchangeable with non-respondents, and hence it may impact all studies to varying degrees. This includes study of associations (4,20): both Delphi-Facebook and Census Household Pulse significantly overestimate the slope of vaccine uptake over time relative to that of the CDC benchmark ( Fig. 2); as well as ranking: the Census Household Pulse and Delphi-Facebook rankings are more correlated with each other ( = 0.64), than either ranking is with that of the CDC (0.31 and 0.33, respectively), as indicated in Fig. 1.
Another common response is that bias is a necessary trade-off for having data that is sufficiently large for conducting high-resolution inference. Again, this is a "double-edged" argument.
It is very true that a key advantage of Big Data is that it renders more data for such inference, such as about individualized treatments (21). However, precisely because data with high-resolution is hard to come by, we tend to be very reluctant to discount them due to low data quality.
The dramatic impact of ddc on the effective sample size should serve as a wake-up call to our potentially devastating overconfidence in biased Big Data, particularly in studies that can affect 18 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.10.21258694 doi: medRxiv preprint many people's lives and livelihoods. This is not the first time that the Big Data Paradox has reared its head, nor the last time that it will. One notable example is that of Lazer et al. (2014), which examines how Google Trends predicted more than two times the number of doctor visits for influenza-like illnesses than did the CDC in February 2013 (22). The data collection methods of the studies we consider here have been far more carefully designed than Google Trends data, yet are still susceptible to some of the same biases. Delphi-Facebook is a widely-scrutinized survey that, to date, has been used in 6 peer-reviewed publications, most recently in Science (23). The Census Household Pulse survey is conducted in collaboration between the US Census Bureau and eleven statistical government partners, all with enormous resources and survey expertise. Both studies take steps to mitigate potential biases in data collection, but still drastically overestimate vaccine uptake.
This demonstrates just how hard it is to correct for selection bias, even with enormous sample sizes and the resources of Facebook or the US government at one's disposal.
In contrast, Axios-Ipsos records only about 1,000 responses per wave and is likely too small to make reliable inferences for sub-national geographies, but makes more of an effort to prevent selection bias for national estimates (e.g., their effort of purchasing tablets for those who otherwise would be less likely to participate in an online survey). This is a telling example of why, for ensuring accuracy of inferences, data quality matters far more than data quantity, and therefore that investing in data quality (particularly in sampling, but also in weighting) is wiser than relying on data quantity. While much more needs to be done to fully examine the nuances of these three surveys, we hope this first comparative study highlights the alarming implications of the Law of Large Populations -the mathematically proven fact that compensating for low data quality by increasing data quantity is a losing strategy.

19
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

21
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

22
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Author contributions V.B. conceived and formulated the research questions. All authors contributed to methodology, writing, visualization, editing, and data analysis.

Competing Interests Authors have no competing interests.
Data and materials availability All data used in this analysis is publicly available from sources listed in the references. Code and data to replicate the findings is included in our publicly available GitHub repository for this project: https://github.com/vcbradley/ddc-vaccine-US.

23
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ;

List of Supplementary Materials
Materials and Methods

Supplementary Materials
A Background materials on the four data sets studied

A.1 CDC Data
The CDC benchmark data used in our analysis was downloaded from the CDC's COVID data tracker (12). We use the cumulative count of people who have received at least one dose of COVID-19 vaccine reported in the "Vaccination Trends" tab. This data set contains vaccine uptake counts for all US residents, not only adults. However, the surveys of interest only estimate vaccine uptake for adults. The CDC receives age-group-specific data on vaccine uptake from all states except for Texas on a daily basis, which is also reported cumulatively over time.
Therefore, we must impute the number of adults who have received at least one dose on each day. For our current purposes, we assume Texas is exchangeable with the rest of the states in terms of the age distribution for vaccine uptake. Under this assumption, for each day, we use the age group vaccine uptake data from all states except for Texas to calculate the proportion of cumulative vaccine recipients who are 18 or older, then we multiply that number by the total number of people who have had at least one dose to estimate the number of US adults who have received at least one dose. The CDC performs a similar imputation for the 18+ numbers reported in their COVID data tracker. However the CDC's imputed 18+ number is available only as a snapshot and not a historical time series, hence the need for our imputation.
The CDC does release state-level snapshots of vaccine uptake each day. These have been scraped and released publicly by Our World In Data (24). These state-level numbers are not historically-updated as new reports of vaccines administered on previous days are reported to the CDC, so they underestimate the true rate of state-level vaccine uptake on any given day.
These data are used only to motivate the inaccuracies of the state-level rank orders implied by vaccine uptake estimates from Delphi-Facebook and Census Household Pulse; hence they are 25 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021

A.2 Axios-Ipsos Data
The Axios-Ipsos Coronavirus tracker is an ongoing, bi-weekly tracker intended to measure attitudes towards COVID-19 of adults in the US. The tracker has been running since March 13, 2020 and has released results from 45 waves as of May 28, 2021. Each wave generally runs over a period of 4 days. The Axios-Ipsos data used in this analysis was scraped from the topline PDF reports released on the Ipsos website (6). The PDF reports also contain Ipsos' design effects, which we have confirmed are calculated as 1 plus the variance of the (scaled) weights.
The question that Axios-Ipsos uses to gauge vaccine hesitancy is worded differently from the questions used in Census Household Pulse and Delphi-Facebook. The question asks about likelihood of receiving a "first generation" COVID-19 vaccine, which may be confusing to respondents. We see that Axios-Ipsos has markedly higher baseline levels of hesitancy than either Census Household Pulse or Delphi-Facebook. While this is likely driven in part by the lower estimated rates of vaccine uptake, it is also likely due in part to question wording. Therefore, we exclude Axios-Ipsos from our scenarios of vaccine hesitancy and willingness.

A.3 Census Household Pulse Data
The is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. ; the USDA Economic Research Service (ERS) (https://www.census.gov/programs-surveys/ household-pulse-survey.html, visited June 5, 2021). Each wave since August 2020 fields over a 13-day time window. All data used in this analysis is publicly available on the US Census website. We use the point estimates presented in Data Tables, as well as the standard errors calculated by the Census Bureau using replicate weights. The design effects are not reported, however we can calculate it as 1 + if they will receive a COVID vaccine when they become eligible. Approximately 6.6% of all respondents reported being "unsure" in wave 27, and were coded as "vaccine hesitant" rather than "willing."

A.4 Delphi-Facebook COVID symptom survey
The Delphi-Facebook COVID symptom survey is an ongoing survey collaboration between Facebook performs inverse propensity weighting on responses, but the reported standard errors do not include variance increases from weighting, and no estimates of design effects are released publicly. We are therefore grateful to the CMU team for providing us with estimated 27 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

A.5 Data Resolution
Both Axios-Ipsos and Census Household Pulse release microdata publicly. Facebook also releases microdata to institutions that have signed Data Use Agreements. We are in the process of acquiring the Facebook microdata. In view of the timely nature of topics and findings, and to keep all three surveys on as equal footing as possible, in this first study we used the aggregated results released by all three surveys rather than their microdata.
In all surveys, data collection happens over a multi-day period (or multi-week in the case of the Census Household Pulse). We calculate error for each survey wave with respect to the CDC-reported proportion of the population vaccinated up to and including the end date of each wave. Some respondents will have actually responded days (or weeks) before the date on which the estimate was released, when the true rate of vaccine uptake was lower. We use the end date instead of a mid-point as we do not have good data on how respondents are distributed over the response window. However, this means that the error we report may underestimate the true error in each survey, particularly those with longer fielding and reporting windows.

28
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The CDC vaccination data includes vaccines administered in Puerto Rico. As of June 9, 2021, approximately 1.6 million adults have received at least one dose, just under 1% of the national total (164,576,933). We use the CDC's reported national total that includes Puerto Rico (we do not have a reliable state-level time series of vaccine uptake), but we use a denominator that does not include Puerto Rico. This means that the CDC's estimate of vaccine uptake used here may be slightly overestimating the true proportion of the US (non-Puerto Rico) adult population that has received at least one dose by about 1%, which would make the observed ddc for Delphi-Facebook and Census Household Pulse and underestimate of the truth. However, this 1% error is well within the benchmark uncertainty scenarios presented with our results.

29
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

B Methods for benchmark uncertainty
To inform our CDC benchmark uncertainty scenarios, we examined changes in vaccine uptake rates reported by the CDC over time. We downloaded versions of the CDC's cumulative vaccine uptake estimates that are updated retroactively as new reports of vaccinations are received on April 12, April 21, May 5, and May 26. This allowed us to examine how much the CDC's estimates of vaccine uptake for a particular day change as new reports are received. Fig. S1 compares the estimates of cumulative vaccine uptake for April 3-12, 2021 reported on April 12, 2021 to estimates for those same dates reported on subsequent dates. The plot shows that the cumulative vaccine uptake for April 12, 2021 reported on that same day is adjusted upwards by approximately 6% of the original estimate over the next month and a half. The estimate of vaccine uptake for April 11, reported on April 12, is only further adjusted upward by approximately 4% over the next 45 days. There is little apparent difference in the amount by which estimates from April 3-8 are adjusted upwards after 45 days, indicating that most of the adjustment occurred in the first 4 days after the initial report, which is consistent with the CDC's findings (12). There is still some adjustment that occurs past day 5; after 45 additional days, estimates are adjusted upwards by an additional 2%.
There are many caveats to this analysis of CDC benchmark under-reporting, including that it depends on snapshots of data collected at inconsistent intervals, and that we mainly examine a particular window of time, April 3-12, so our results may not generalize to other windows of time. This is plausible for a number of reasons including changes to CDC reporting systems and procedures after the start of the mass vaccination program, or due to the fact that true underlying vaccine uptake is monotonically increasing over time. It is also plausible, if not likely, that the reporting delays are correlated with vaccine providers which are in turn correlated with the population receiving vaccines at a given time. As the underlying population receiving vaccines 30 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. changes, so would the severity of reporting delays. Despite these caveats, we believe that this analysis provides reasonable guidance as to the order of magnitude that could be expected from latent systemic errors in the CDC benchmark.
We use these results to inform our choice of benchmark uncertainty scenarios: 5% and 10%.
The benchmark error is incorporated into our analysis by adjusting the benchmark estimates each day up or down by 5% or 10% (i.e. multiplying the CDC's reported estimate by 0.9, 0.95, 1.05, and 1.1). We then calculate ddc on each day for each error scenario, as well as for the CDC reported point estimate.
However, the benchmark data that we use here has been retroactively-adjusted as new reports of vaccine administration are received, so that the scenarios we consider are in addition to the initial reporting lag which has already been accounted for. These scenarios are intended only to demonstrate the robustness of our findings to plausible latent error in the benchmark data rather than to suggest that those scenarios are at all likely. To truly account for errors in the CDC benchmark would require a close collaboration with the CDC, and to have access to its historical 31 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) whereˆ , w is now the population correlation between and w, = (over = 1, . . . , ).
The term w is the classical "effective sample size" due to weighting (13), i.e., w = /(1+ 2 w ), where w is the coefficient of variation of the weights in (not to be confused with willingness ), as defined above.

C.2 Bias-adjusted effective sample size
Meng (2018) derives the following formula for calculating a bias-adjusted effective sample size, Given a weighted estimate¯ w with expected total mean squared error due to data defect, sampling variability, and weighting, this quantity eff represents the size of a simple random sample such that its mean¯ , as an estimator for the same population mean¯ , would have 32 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 15, 2021. the identical mean squared error (which is the same as variance for simple random sampling, because its mean is an unbiased estimator for¯ ). The term [ˆ 2 , w ] represents the amount of selection bias expected on average from a particular recording mechanism and a chosen weighting scheme.
Following (1), for each survey wave, we useˆ 2 , w to approximate [ˆ 2 , w ]. This estimation itself is subject to error. However, it does not suffer from selection bias because our target is exactly defined by the mean of our estimator, as we aim to capture what actually has happened in this particular survey (including the impact of the weighting scheme). Hence, the only error is the sampling variability (with the caveat that the weighting scheme itself does not vary with the actual observed sample), which is typically negligible for large surveys, such as for Delphi-Facebook and the Census Household Pulse surveys. This estimation error may have more impact for smaller traditional surveys, such as Axios-Ipsos' survey, an issue we will investigate in subsequent work.

D Estimates of Hesitancy and Willingness by Demographic Groups
We show estimates of our main outcomes by Education, and then by Race, in Table S1. The estimates vary by mode, but the rank ordering of a particular outcome within a single survey is roughly similar across surveys. In Table 2, we show the estimates from Household Pulse.

33
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; Table S1: Levels of Vaccination, Willingness, and Hesitancy, estimated by demographic group. For each outcome, we estimate the same quantity from the three surveys. The Axios-Ipsos (AX), Census Household Pulse (HP), and Delphi-Facebook (FB) surveys use the same waves as those in Table 2

D.1 Separate ddc estimates by Age Group
The CDC also releases vaccination rates by age groups, albeit not always in bins that overlap with the survey. For overlapping bins (seniors and non-seniors) we can calculate ddc specific to each group (Fig. S2). The ddc in the Census Household Pulse increases modestly overall over time.
Delphi-Facebook's ddc is higher overall, and shows a stark divergence between the two age groups after March 2021. The for seniors flattens and starts to decrease after an early March peak, whereas the error rate for younger adults continues to increase through the month of March 2021, and peaks in mid-April, around the time at which all US adults became eligible (28). This is consistent with the hypothesis that barriers to vaccine and online survey access may be driving some of the observed selection bias in Delphi-Facebook. Early in the year, vaccine 34 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. Figure S2: The ddc separated by Age Group (18-64 year-olds, and those 65 and over). Using CDC benchmark broken out by the same age bins, we recomputed separate data defect correlations (ddc) from weighted survey estimates (but without using a coefficient of variation adjustments). It should be noted that CDC data by demographics may not be representative of the population, due to certain jurisdictions not reporting results by demographics. The ddc for "Both" is computed from that same CDC data (instead of the overall benchmark shown in Figure 3). demand far exceeded supply, and there were considerable barriers to access even for eligible adults, e.g., complicated online sign-up processes, ID requirements, and confusion about about cost (29, 30).
A shortcoming of computing ddc by demographic subgroup is that the CDC benchmark data is less reliable here. They caution that "These demographic data represent the geographic areas that contributed data and might differ by populations prioritized within each jurisdiction's vaccination phase.
Therefore, these data may not be generalizable to the entire US population." Therefore, we do not rely on these data extensively in our main findings.

35
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.10.21258694 doi: medRxiv preprint

E ddc-based scenario analysis for willingness and hesitancy
The main quantity of interest in the surveys examined here is not uptake, but rather willingness and hesitancy to accept a vaccine when it becomes available. Our analysis of ddc of vaccine uptake cannot offer conclusive corrected estimates of willingness and hesitancy; however we propose ddc-based scenarios that suggest plausible values of willingness and hesitancy given specific hypotheses about the mechanisms driving selection bias.

E.1 Setting up scenarios
We adopt the following notation for the key random variables we wish to measure: • -did you receive a vaccine ("vaccination")?
• -if no, will you receive a vaccine when available ("willingness")?
• = 1 − − -vaccine "hesitancy" Just as we have studied the data quality issue for estimating the vaccine uptake, we can apply the same framework to both and . Unlike uptake, however, we do not have CDC benchmarks for willingness or hesitancy. We only know that + + = 1, and therefore that Cov( , ) + Cov( , ) + Cov( , ) = 0 Re-expressing the covariances as correlation, and recognizing that Corr( , ·) = ,· , we obtain , · + , · + , · = 0 It is well-known that for a Bernoulli random variable, its variance is rather stable around 0.25 unless its mean is close to 0 or 1. For simplicity, we then adopt the approximation that 2 ≈ 2 ≈ 2 . Consequently, we have , + , + , ≈ 0 36 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 15, 2021. ; https://doi.org/10.1101/2021.06.10.21258694 doi: medRxiv preprint