Sentinel Event Surveillance to Estimate Total SARS-CoV-2 Infections, United States

Human infections with a novel coronavirus (SARS-CoV-2) were first identified via syndromic surveillance in December of 2019 in Wuhan China. Since identification, infections (coronavirus disease-2019; COVID-19) caused by this novel pathogen have spread globally, with more than 180,000 confirmed cases as of March 16, 2020. Effective public health interventions, including social distancing, contact tracing, and isolation/quarantine rely on the rapid and accurate identification of confirmed cases. However, testing capacity (having sufficient tests and laboratory throughput) to support these non-pharmaceutical interventions remains a challenge for containment and mitigation of COVID-19 infections. We undertook a sentinel event strategy (where single health events signal emerging trends) to estimate the incidence of COVID-19 in the US. Data from a recent national conference, the Conservative Political Action Conference, (CPAC) near Washington, DC and from the outbreak in Wuhan, China were used to fit a simple exponential growth model to estimate the total number of incident SARS- CoV-2 infections in the United States on March 1, 2020, and to forecast subsequent infections potentially undetected by current testing strategies. Our analysis and forecasting estimates a total of 54,100 SARS-CoV-2 infections (80 % CI 5,600 to 125,300) have occurred in the United States to March 12, 2020. Our forecast predicts that a very substantial number of infections are undetected, and without extensive and far-reaching non-pharmaceutical interventions, the number of infections should be expected to grow at an exponential rate.


Introduction
To date, testing for COVID-19 cases (clinical and otherwise) has been limited in the US. Hospitals and healthcare facilities that do have tests have limited the testing criteria to those with highest potential risk, to optimize the use of scarce resources [1]. Where widespread testing has occurred, increased prevalences in high-risk groups have been found [2,3].
Evidence-based public health programming depends on data-driven estimates of disease burden to develop interventions. As such, accurate estimate of the underlying burden of SARS-CoV-2 infection within the US can provide critical information for hospital administrators, county-and state-level Departments of Health, and the US Centers for Disease Control and Prevention (CDC) to optimize and prioritize resource allocation.
The Conservative Political Action Conference (CPAC), that took place Feb. 26-29, 2020 outside Washington DC is a large annual event that draws approximately 20,000 participants from across the entire US, thus serving as a representative national sample. A single person was reported as 'presumptive positive' for COVID-19 on Mar 7, 2020, and exposure was reported to be prior to the CPAC event [4].
The point incidence of COVID-19 in this defined cohort was used to estimate the national incidence during this period, and subsequent forecasting of current SARS-CoV-2 infections across the entire US. The number of attendees and number of confirmed cases were used to fit a simple exponential growth model to estimate the total COVID-19 infections per day throughout the United States up to March 15th, 2020.

Overview of Data and Methods
This analysis proceeded in three stages. Firstly, the point incidence of SARS-CoV-2 infection was calculated for the CPAC cohort. Secondly, With an assumption that this group was a near-random sample of Americans, this proportion (daily incidence rate) was then extrapolated to the entire US population (329 million persons) to provide an estimated national daily incidence on March 1, 2020. Finally, this estimate with appropriate error structures, was projected forward in time using epidemiological parameters from the epidemic in Wuhan, PRC to forecast cumulative infections on March 12, 2020.
Data for this analysis were complied from routine sources; press reports from CPAC; and parameters estimated from modeling studies for the early (exponential growth phase) of the epidemic in China. The number of attendees and the number of confirmed COVID-19 cases at CPAC was collected from press reporting. The epidemic doubling time was taken from previous work which estimated from data on the outbreak in Wuhan, China. This doubling time from the Wuhan data was estimated to be 6.4 days, (95 CrI 5.8-7.1), and we used a lognormal distribution for D following literature values for other doubling times [5,6].

Results and Conclusions
Projecting this total infection burden forward (Fig. 1)   2 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. This forecasted cumulative number of infections is consistent with those of Perkins et al., from independent data sources and with entirely different modeling frameworks, and greatly strengthens the validity of both sets of estimates [7]. The large number of predicted infections compared against the currently reported total confirmed COVID-19 cases in the United States suggest an important proportion of infections may be currently undetected, while entering a phase of potentially very rapid growth of the epidemic.

Limitations of this analysis
This analysis is not without limitations. Our forecasts are based on parameter estimates using best-available data at the time of analysis to inform public health policy (coined "outbreak science" [8]). Demographic data for CPAC attendees was not available, so we are unable to assess how representative this sentinel event may be for the entire US population. This analysis also relies on simplistic models that assumes a well-mixed population, and uses data derived from transmission scenarios that may not represent transmission within the US. The upper limit of total infections should be interpreted with caution, as small uncertainties early in the projections also accumulate exponentially. Finally, this analysis does not account for spatial heterogeneity and diverse population mixing patterns throughout the United States, which are a critical driver of observed transmission.

The model
We assumed that cumulative infections up to day t, denoted by I t , followed an exponential growth model: where I 0 is the number of initial infections, t is the time, in days, since March 1, 2020, and D is the doubling time (in days). Estimates of I 0 and D are needed to forecast the cumulative number of infections at time t.

Estimating I 0
We estimated total incidence among the US population on March 1st, denoted by I 0 , as follows, where P denotes the size of the US population (329M, [9]), I (CPAC) 0 the incidence among CPAC attendees and P (CPAC) the number of CPAC attendees. The parameter I 0 represents the assumed fraction of the US population that is infected with COVID19. The incidence estimated from CPAC was extrapolated to the US population based on the assumption of equal incidence in CPAC attendees and the US population.
The initial relative incidence I (s) 0 /P was assumed to follow a beta distribution with a prior over θ, the probability of infection, equal to a B I . The prior is generated by having observed one sample from the general population with incidence "as observed in the CPAC population"

Estimating D
The doubling time of the disease was taken from previous work using data on the outbreak in Wuhan, China, and was estimated to be 6.4 days, (95 CrI 5.8-7.1) and lognormally distributed [5,6] log D (s) ∼ N log(6.4), (1.3/4) 2 .
3 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020.

Model-based projections
Point estimates for projected incidence were obtained directly using Eq. 1 and the point estimates of doubling time and initial incidence. Uncertainty in the number of infections was obtained by taking 30, 000 Monte Carlo samples of I 0 and D according to their probability densities, and constructing a probability mass function for trajectories, I t , Specifically, the 80% projection interval for I t is given by the 10 th and 90 th percentiles of samples {I Analysis was performed using R [10] for a 16-day window after March 1, in 1-day intervals. Standard 95% and 80% confidence intervals were constructed across the forecast period.

Revision Date
Author(s) Description is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.17.20037648 doi: medRxiv preprint