Estimating the early death toll of COVID-19 in the United States

Background Efforts to track the severity and public health impact of the novel coronavirus, COVID-19, in the US have been hampered by testing issues, reporting lags, and inconsistency between states. Evaluating unexplained increases in deaths attributed to broad outcomes, such as pneumonia and influenza (P&I) or all causes, can provide a more complete and consistent picture of the burden caused by COVID-19. Methods We evaluated increases in the occurrence of deaths due to P&I above a seasonal baseline (adjusted for influenza activity) or due to any cause across the United States in February and March 2020. These estimates are compared with reported deaths due to COVID-19 and with testing data. Results There were notable increases in the rate of death due to P&I in February and March 2020. In a number of states, these deaths pre-dated increases in COVID-19 testing rates and were not counted in official records as related to COVID-19. There was substantial variability between states in the discrepancy between reported rates of death due to COVID-19 and the estimated burden of excess deaths due to P&I. The increase in all-cause deaths in New York and New Jersey is 1.5–3 times higher than the official tally of COVID-19 confirmed deaths or the estimated excess death due to P&I. Conclusions Excess P&I deaths provide a conservative estimate of COVID-19 burden and indicate that COVID-19-related deaths are missed in locations with inadequate testing or intense pandemic activity.


Evidence before this study
Deaths due to the novel coronavirus, COVID-19, have been increasing sharply in the United States since mid-March. However, efforts to track the severity and public health impact of COIVD-19 in the US have been hampered by testing issues, reporting lags, and inconsistency between states. As a result, the reported number of deaths likely represents an underestimate of the true burden.

Added Value of this study
We evaluate increases in deaths due to pneumonia across the United States and relate these increases to the number of reported deaths due to COVID-19 in different states and evaluate the trajectories of these increases in relation to the volume of testing and to indicators of  morbidity. This provides a more complete picture of mortality due to COVID-19 in the US and demonstrates how delays in testing led to many coronavirus deaths not being counted in certain states.

Implications of all the available evidence
The number of deaths reported to be due to COVID-19 represents just a fraction of the deaths linked to the pandemic. Monitoring trends in deaths due to pneumonia and all-causes provides a more complete picture of the tool of the disease.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint ABSTRACT Background Efforts to track the severity and public health impact of the novel coronavirus, COVID-19, in the US have been hampered by testing issues, reporting lags, and inconsistency between states.
Evaluating unexplained increases in deaths attributed to broad outcomes, such as pneumonia and influenza (P&I) or all causes, can provide a more complete and consistent picture of the burden caused by COVID-19.

Methods
We evaluated increases in the occurrence of deaths due to P&I above a seasonal baseline (adjusted for influenza activity) or due to any cause across the United States in February and March 2020. These estimates are compared with reported deaths due to COVID-19 and with testing data.

Results
There were notable increases in the rate of death due to P&I in February and March 2020. In a number of states, these deaths pre-dated increases in COVID-19 testing rates and were not counted in official records as related to COVID-19. There was substantial variability between states in the discrepancy between reported rates of death due to COVID-19 and the estimated burden of excess deaths due to P&I. The increase in all-cause deaths in New York and New Jersey is 1.5-3 times higher than the official tally of COVID-19 confirmed deaths or the estimated excess death due to P&I.

Conclusions
Excess P&I deaths provide a conservative estimate of COVID-19 burden and indicate that COVID-19-related deaths are missed in locations with inadequate testing or intense pandemic activity.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint

Introduction:
The novel coronavirus SARS-CoV-2 first emerged in December 2019 in Wuhan, China and rapidly grew into a large-scale global pandemic. 1 Tracking the severity and impact of COVID-19 is, at the time of writing, a critical need, hampered by testing issues and reporting lags for key epidemiological indicators.
Many countries, including the US, were caught off-guard by the speed with which COVID-19 spread from China. Without adequate capacity to test for the SARS-CoV-2 virus causing COVID-19 for much of February and March 2020, available laboratory-confirmed cases captured only an estimated 10-15% of all infections. 2 Although most countries have adopted the strategy to preferentially test severe cases, estimating the number of severe infections and deaths caused by COVID-19 will be a challenge. Typically, a large proportion of deaths caused by infectious diseases are not attributed to a specific pathogen. With the limited availability of testing for the novel coronavirus and imperfect sensitivity of the tests, 3,4 there have undoubtedly been a number of deaths caused by the virus that are not counted in official tallies. Even in situations of ample testing, deaths from viral pathogens, including SARS-CoV-2, can occur indirectly via secondary bacterial infections or exacerbation of chronic conditions. Further, in the midst of a large outbreak, there is an unavoidable delay in compilation of death certificates and ascertainment of cause of deaths, which contributes to uncertainty about severity and burden.
Finally, in the US, there has been a high degree of variability in public health resources, laboratory testing, and recognition of the outbreak at the state level, which could lead to significant under-estimation of the true impact of the outbreak in certain geographies.
To estimate the burden of death due to novel respiratory pathogens, previous studies have compared the observed incidence of influenza-related deaths ascribed to pneumonia and influenza ("P&I") with the baseline incidence of P&I that would be expected at that time of year. 5, 6 These "excess deaths" provide an estimate of pathogen-specific burden. This approach was used in the early months of the 2009 influenza A/H1N1-pdm pandemic; it was estimated that just 1 in 7 pandemic-related deaths was captured by laboratory testing in 2009 in the US. 7,8 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. ; In this study, we estimated the increase in P&I deaths across the United States in each week in excess of a seasonal baseline to capture the direct and indirect mortality burden of COVID-19.
We compare these estimates of excess deaths to the reported numbers of deaths due to COVID-19 in different states and evaluate the trajectories of these increases in relation to the volume of testing and to indicators of COVID-19 morbidity. These analyses provide insights into the burden of COVID-19 in the early months of the outbreak in the United States and serve as a surveillance platform that can be updated as new data accrue.

Data
Data on deaths due to pneumonia and influenza (P&I, ICD-10 codes J09-J18) and all-causes by state and week were obtained from the National Center for Health Statistics' (NCHS) mortality surveillance system. 9 Here we analyze spikes in P&I mortality rather than straight pneumonia to be more comprehensive, as influenza-coded deaths do not necessarily require laboratory confirmation of influenza infection, there is overlap of symptoms between influenza and COVID-19, and P&I mortality has been used in US to monitor the severity of influenza and other respiratory pathogens since the 1918 pandemic. In addition, analysis of spikes in all-cause mortality provides a full picture of the direct and indirect burden of COVID-19.
The NCHS mortality data are available with a 2-week lag and are partially complete for the most recent weeks. The P&I mortality data provide the provisional number of deaths due to pneumonia or influenza (ie, deaths with a code of pneumonia or influenza anywhere in the death certificate) and should be adjusted by the total number of deaths reported in real time each week.
Connecticut, North Carolina, and West Virginia were missing mortality data for recent months and were therefore excluded from the analyses.
The P&I grouping will include individuals who had a listed cause of death as COVID-19 (along with a P&I code) as well as people who did not have COVID-19 listed as a cause of death. It does not capture people who had COVID-19 listed as a cause of death but did not have P&I recorded among the causes.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; We also compiled data on COVID-19-related morbidity to gauge the timing and intensity of the pandemic in different locations. Influenza-like illness (ILI) is a longstanding indicator of morbidity from acute respiratory pathogens, including SARS-CoV-2. Weekly state-level ILI data were obtained from the CDC's ILINet system 10 , which aggregates data from a network of outpatient providers. To adjust for activity of non SARS-CoV-2 respiratory pathogens, we used state-level data on laboratory-confirmed influenza activity from the CDC's National Respiratory and Enteric Virus Surveillance System (NREVSS) 11 . This dataset captures the number of tests performed for influenza and the number that were positive by week and state. The ILI data provide the percent of visits to participating outpatient providers that were for ILI. ILINet and NREVSS data are available with a 1-week lag.
The ILI, NREVSS, and mortality datasets were accessed through the CDC's FluView portal using the cdcfluview package in R. Data from NCHS, ILINet and NREVSS were obtained for the weeks ending January 5, 2013 through March 28, 2020.
To compare our burden estimates with official COVID-19 tallies, we compiled weekly reports of laboratory-confirmed deaths due to COVID-19 in each state from several sources, including the Covid Tracking Project, 12 and NCHS 13 . State-specific testing information was obtained was The Covid Tracking Project 12 .

Excess mortality and morbidity analysis
The 9 largest states by population, and Washington state, were analyzed individually. The remaining states for which we had data were grouped into the corresponding health regions defined by the US Department of Health and Human Services ("HHS regions"). We fit Poisson regression models to the weekly state-level counts of reported deaths due to P&I from January 5, 2015 to February 8, 2020 (see Supplement for details). The baseline was then projected forward until March 28, 2020. We adjusted for seasonality using harmonic variables and included yearto-year baseline variation. Influenza activity was controlled for by adjusting for the percent of tests that were positive for influenza in the previous week. The 1-week lag between the testing data and the mortality data accounts for the delay influenza testing and death. The number of allcause deaths was used as a denominator. Poisson 95% prediction intervals were estimated by sampling from the uncertainty distributions for the estimated model parameters. 14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint To calculate the number of COVID-19 related excess deaths, we subtracted the expected number of deaths in each week from the observed number of deaths for the period February 9, 2020 to March 28, 2020. Because reporting of deaths for recent weeks is incomplete, NCHS calculates a 'completeness' score (between 0 and 1) based on the number of death reports that have been received from a state and the number expected from that state based on previous years. The excess deaths were divided by the completeness score for each week to get an estimate for excess cases, adjusted for reporting delays.
To evaluate the sensitivity of the estimates of excess mortality to the adjustment for influenza activity, we refit the regression excluding the influenza covariate. We also fit a model with the same structure to all-cause mortality (without a denominator). To get a measurement of outpatient visits related to COVID-19, the same model was fit to data on influenza-like illness.

Code and data availability
The analyses were run using R v 3.6.1. All analysis scripts and archives of the data are available from https://github.com/weinbergerlab/excess_pi_covid RESULTS Many states experienced a notable increase in the proportion of total deaths due to P&I starting in mid-March through March 28 compared to what would be expected based on the time of year and influenza activity (Figure 1, Figure 2). Expressed as the relative increase above the baseline, these increases were particularly notable in New Jersey, Washington, New York, Illinois, and Georgia. The increase in New York was largely driven by spikes in New York City ( Figure S1).
In some states, such as Florida and Georgia, the increase in deaths due to P&I preceded the widespread adoption of testing for the novel coronavirus by several weeks (Figure 3). As a result, the increase in P&I preceded the first reported COVID-19 deaths, and the excess P&I was greater than the number of reported COVID-19 deaths each week. In contrast, in Washington . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The timing of the epidemic varied across states. Some states and regions that have not yet seen large increases in deaths are earlier in the epidemic, though there is some indication of sharp increases in outpatient visits for ILI (eg, Ohio, Texas Figure 4). The increase in P&I mortality typically lags behind the increase in ILI visits, so increases in excess P&I deaths might be expected in coming weeks.
The degree of under-reporting in some states can be seen when comparing the estimated excess deaths due to P&I with the number of reported COVID-19 cases through March 28 (Table 1, Figure S4) was agreement between the reported COVID-19 deaths and the excess P&I deaths (Figure 3, Figure S1). In Michigan, the number of excess P&I deaths was notably lower than the number of reported COVID-19 deaths, potentially reflecting reporting delays in the P&I data. Overall, in the states we evaluated for the period from February 9 to March 28, 2020 there were 3101 (2769, 3433) excess P&I deaths, compared with 1958 reported deaths due to COVID-19 from the COVID Tracking Project. There were 2537 deaths due to COVID-19 provisionally reported by NCHS (for the entire U.S.) during this period.
Deaths due to P&I represent just a fraction of all deaths caused by COVID-19. To highlight this, we compared increases in deaths due to any cause in New York and New Jersey with increases in deaths due to P&I and reported deaths due to COVID-19 ( Figure S2). There were 2-3 times as many excess all-cause deaths as reported COVID-19 deaths or excess P&I deaths. In New York City, this discrepancy was even more stark, with 3-4 times as many excess all-cause deaths as P&I deaths (Figure S3). The observation that excess P&I deaths underestimate the burden of COVID-19 is born out in the NCHS data, where fewer than half of the deaths recorded as being due to COVID-19 had pneumonia listed as a cause of death. 13 The proportion of COVID-19-coded deaths with a pneumonia diagnosis itself varies considerably by state (Table S2).
In sensitivity analyses, we refit the seasonal baseline but did not adjust for influenza activity (Table S3), and we calculated excess P&I mortality without using a correction factor for . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04.15.20066431 doi: medRxiv preprint incomplete reporting (Table S4). These changes influence the specific estimates for some of the states with smaller increases but do not affect the overall pattern.

DISCUSSION
Excess P&I mortality has been used as a method for tracking influenza mortality for more than a century. Here we used a similar strategy to capture COVID-19 deaths that had not been attributed specifically to the pandemic coronavirus. Monitoring trends in broad mortality outcomes, like P&I or all-causes, provides a window into the magnitude of the mortality burden missed in official tallies of COVID-19 deaths. Given the lack of adequate testing and geographical variability in testing intensity, this type of monitoring provides key information on the severity of the epidemic in different geographic regions. It also provides some indication of the degree to which viral testing is missing deaths associated with COVID-19 directly or indirectly. Our findings suggest that the degree to which deaths due to COVID-19 are being correctly attributed to SARS-CoV-2 varies by state. Some states, such as Florida and Pennsylvania, might have missed deaths early on and might be undercounting deaths by a substantial degree currently. Other states, like Washington, have an accurate estimate of the mortality burden of the pandemic virus due to intense testing. And in states that have been hit hard by the pandemic virus, such as New Jersey or New York, the total excess mortality burden is 2-3 times that ascribed to COVID-19 in official statistics. Together, these findings demonstrate that estimates of the death toll of COVID-19 based on excess P&I and all-cause mortality will be more reliable than those relying only on reported deaths, particularly in places that lack widespread testing.
Local epidemics of COVID-19 started at different times across the US. States with early epidemics, like New York and New Jersey, are now reporting large increases in both P&I and all-cause mortality. Other states that had later epidemic onsets are just now entering the period of rapid growth of deaths. Because some states instituted social distancing measures at an earlier phase of the epidemic, they might benefit more from reducing the intensity of the peak in deaths.
Syndromic endpoints, such as deaths due to P&I, outpatient visits for ILI, and emergency department visits for fever, can provide a crude but informative measure of the progression of the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint outbreak. 15 These measures themselves can be biased by changes in health seeking behavior and how conditions are recorded. However, in the absence of widespread and systematic testing for COVID-19, they provide a useful measure of epidemic progression and the effect of interventions.
To get a complete picture of the burden of the burden of deaths due to COVID-19, it will be necessary to evaluate spikes in all-cause mortality, as we have done for New York and New Jersey here. However, it is difficult to do such analyses reliably in real time with provisional death statistics because the data are incomplete for recent weeks, and the delays in reporting can only be determined retrospectively. Our analyses here suggest that excess P&I deaths represent a fraction of all of the deaths related to COVID-19 (25-50% based on preliminary data), so the P&I excess mortality estimates we present here represent a lower bound of the burden.
There is often a lag in the reporting of death statistics. NCHS reports data from 2 weeks prior, but these provisional data are typically still incomplete. In our main analysis, we addressed this by adjusting deaths due to P&I with all-cause deaths, with the assumption that reporting delays for P&I are similar to reporting delays for other causes of death. When calculating excess deaths, we attempted to adjust for reporting delays by using the NCHS estimate of data completeness as a multiplier. However, this estimate of data completeness is itself based on numbers of deaths during previous years and might not be reliable during the pandemic period. If anything, we would expect that longer reporting lags would tend to underestimate the excess mortality burden of COVID-19 in real-time analyses.
It is estimated that the COVID-19 epidemic in Europe predates that in the US by a few weeks. Several European countries have experienced a high death toll, particularly Italy and In conclusion, monitoring syndromic causes of death can provide crucial additional information on the severity and progression of the COVID-19 pandemic. Estimates of excess deaths due to P&I and all-causes will be less biased by variations in viral testing, but reporting lags have to be properly accounted for. Together with information on official tallies of COVID-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04.15.20066431 doi: medRxiv preprint 19 deaths, monitoring excess mortality provides a key tool in evaluating the effects of an ongoing pandemic.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; Table 1. Excess deaths due to pneumonia and influenza and deaths due to COVID-19, as reported by the COVID Tracking Project from February 9, 2020 through Mar 28, 2020.    . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint SUPPLEMENTAL FIGURES Figure S1: Excess all-cause deaths. The black line shows the observed number of all deaths per week, regardless of cause. The red line and shaded area represent the 95% Prediction Interval. The latest data is for the week ending 2020-03-28. Note that these are adjusted for percent completeness of the data using the NCHS' estimate of data completeness. There are clear jumps in all-cause mortality in NY and NJ, other states are stable or decreasing, likely due to reporting delays.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint Figure S2: Excess all-cause deaths (black) vs Excess deaths due to pneumonia and influenza (red) and reported COVID-19 deaths from the COVID Tracking Project (blue dashed line) For New York (including New York City) and New Jersey. Figure S3. Excess P&I deaths per week (red) vs all-cause excess deaths (black) in New York City only.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint Figure S4. Map of Excess deaths by state and COVID-19 deaths reported by The COVID Tracking Project . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04.15.20066431 doi: medRxiv preprint Table S2: Proportion of COVID-19 deaths with a pneumonia code, by state, through most recent date. Note these values will be greater than those in other tables, which are 2 weeks behind . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04.15.20066431 doi: medRxiv preprint Table S4. Comparison of estimates when the estimates for excess cases are adjusted based on the estimated completeness of the database. Excess deaths due to pneumonia & influenza from February 9, 2020 through Mar 28, 2020 with or without adjustment for delayed reporting

Statistical model
We developed a regression model for P&I deaths in epidemiological year i (July-June) and week t as a function of seasonal parameters and the percent positive influenza tests in the prior week. Models were fit separately for each location with data from January 5, 2015 to February 8, 2020; fitted values were projected for the period until March 28, 2020. Let PI_Deathsi,t be the number of P&I deaths and let Flu_Pct_Posi,t-1 be the percent positive influenza tests. We modeled PI_Deathsi,t ~ Poisson( i,t) where log( i,t/Total_deathi,t) = 0 + 1*sin(ϴt) + 2*cos(ϴt) + 3*sin(ϴt/2) + 4*cos(ϴt/2) + + 6*log(Flu_Pct_Posi,t-1) + i + i*log(Flu_Pct_Posi,t-1) and ϴt=2* *t/52.1775 To compute prediction intervals, we used the following procedure. Once the regression coefficients were estimated, we extracted the estimated asymptotic covariance matrix for the parameters and constructed a multivariate normal distribution approximating the sampling distribution, centered at the estimated parameter values. We drew 100 samples from this parameter distribution, computed the resulting mean value i,t, and then drew 100 samples from . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint the Poisson distribution with this mean. This resulted in 10,000 samples from an empirical predictive distribution of PI_Deathsi,t. Empirical 95% prediction intervals were computed by taking the 2.5th and 97.5th percentiles of this resulting distribution. In further sensitivity analyses, we evaluated a model in which Flu_Pct_Pos was excluded altogether.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. ; https://doi.org/10.1101/2020.04. 15.20066431 doi: medRxiv preprint