Structured Abstract
Objective Determine age-specific infection fatality rates for COVID-19 to inform public health policies and communications that help protect vulnerable age groups.
Methods Studies of COVID-19 prevalence were collected by conducting an online search of published articles, preprints, and government reports. A total of 111 studies were reviewed in depth and screened. Studies of 33 locations satisfied the inclusion criteria and were included in the meta-analysis. Age-specific IFRs were computed using the prevalence data in conjunction with reported fatalities four weeks after the midpoint date of the study, reflecting typical lags in fatalities and reporting. Meta-regression procedures in Stata were used to analyze IFR by age.
Results Our analysis finds a exponential relationship between age and IFR for COVID-19. The estimated age-specific IFRs are very low for children and younger adults but increase progressively to 0.4% at age 55, 1.3% at age 65, 4.2% at age 75, and 14% at age 85. We find that differences in the age structure of the population and the age-specific prevalence of COVID-19 explain nearly 90% of the geographical variation in population IFR.
Discussion These results indicate that COVID-19 is hazardous not only for the elderly but also for middle-aged adults, for whom the infection fatality rate is two orders of magnitude greater than the annualized risk of a fatal automobile accident and far more dangerous than seasonal influenza. Moreover, the overall IFR for COVID-19 should not be viewed as a fixed parameter but as intrinsically linked to the age-specific pattern of infections. Consequently, public health measures to mitigate infections in older adults could substantially decrease total deaths.
Introduction
As the COVID-19 pandemic has spread across the globe, some fundamental issues have remained unclear: How dangerous is COVID-19? And to whom? Answering these questions will help inform appropriate decision-making by individuals, families, and communities.
The case fatality rate (CFR), the ratio of deaths to reported cases, is commonly used in gauging disease severity. However, this measure can be highly misleading for SARS-CoV-2, the virus that causes COVID-19, because a high proportion of infections are asymptomatic or mildly symptomatic (especially for younger people) and may not be included in official case reports.[1, 2] Consequently, the infection fatality rate (IFR), the ratio of fatalities to infections, is a more reliable metric than the CFR in assessing the hazards of COVID-19.
Assessing the IFR for COVID-19 is difficult. As shown in Table 1, a recent seroprevalence study by the New York Department of Health estimated ~1·6 million infections among the 8 million residents of NYC, but only one-tenth of those infections were captured in reported COVID-19 cases.[3, 4] About one-fourth of reported cases were severe enough to require hospitalization, many of whom succumbed to the disease. All told, fatalities represented a tenth of reported cases but only a hundredth of all infections.
While the NYC data indicate an IFR of ~1%, analyses of other locations have produced a wide array of IFR estimates, e.g., 0·6% in Geneva, 1·5% in England, and 2·3% in Italy. Indeed, a recent meta-analysis noted the high degree of heterogeneity across aggregate estimates of IFR and concluded that research on age-stratified IFR is “urgently needed to inform policymaking.”[5]
In this paper, we consider the hypothesis that the observed variation in IFR across locations may primarily reflect the age specificity of COVID-19 infections and fatalities. Consequently, this paper reports on a systematic review and meta-analysis of age-specific IFRs for COVID-19. Based on our findings, we are able to assess and contextualize the severity of COVID-19 and examine how age-specific prevalence affects population IFR and the total incidence of fatalities.
Methodology
To perform the present meta-analysis, we collected published papers and preprints on the seroprevalence and/or infection fatality rate of COVID-19 that were publicly disseminated prior to 17 September 2020. As described in Supplementary Appendix B, we systematically performed online searches in MedRxiv, Medline, PubMed, Google Scholar, and EMBASE, and we identified other studies listed in reports by government institutions such as the U.K. Parliament Office.[6] Data was extracted from studies by three authors and verified prior to inclusion.
We restricted our meta-analysis to studies of advanced economies, based on current membership in the Organization for Economic Cooperation and Development (OECD), in light of the distinct challenges of health care provision and reporting of fatalities in developing economies.[7] We also excluded studies aimed at measuring prevalence in specific groups such as health care workers.
Our meta-analysis encompasses two distinct approaches for assessing the prevalence of COVID-19: (1) seroprevalence studies that test for antibodies produced in response to the virus, and (2) comprehensive tracing programs using extensive live-virus testing of everyone who has had contact with a potentially infected individual. Seroprevalence estimates are associated with uncertainty related to the sensitivity and specificity of the test method and the extent to which the sampling frame provides an accurate representation of prevalence in the general population; see Supplementary Appendix C. Prevalence measures from comprehensive tracing programs are associated with uncertainty about the extent of inclusion of infected individuals, especially those who are asymptomatic.
Sampling frame
To assess prevalence in the general population, a study should be specifically designed to utilize a random sample using standard survey procedures such as stratification and weighting by demographic characteristics. Other sampling frames may be useful for specific purposes such as sentinel surveillance but not well-suited for assessing prevalence due to substantial risk of systemic bias. Consequently, our meta-analysis excludes the following types of studies:
Blood donor studies. Only a small fraction of blood donors are ages 60 and above—a fundamental limitation in assessing COVID-19 prevalence and IFRs for older age groups—and the social behavior of blood donors may be systematically different from their peers.[8, 9] These concerns can be directly investigated by comparing alternative seroprevalence surveys of the same geographical location. As of early June, Public Health England (PHE) reported seroprevalence of 8·5% based on specimens from blood donors, whereas the U.K. Office of National Statistics (ONS) reported markedly lower seroprevalence of 5·4% (CI: 4·3-6·5%) based on its monitoring of a representative sample of the English population.[10, 11]
Hospitals and Urgent Care Clinics. Estimates of seroprevalence among current medical patients are subject to substantial bias, as evident from a pair of studies conducted in Tokyo, Japan: One study found 41 positive cases among 1071 urgent care clinic patients, whereas the other study found only two confirmed positive results in a random sample of nearly 2000 Tokyo residents (seroprevalence estimates of 3·8% vs. 0·1%).[12, 13]
Active Recruitment. Soliciting participants is particularly problematic in contexts of low prevalence, because seroprevalence can be markedly affected by a few individuals who volunteer due to concerns about prior exposure. For example, a Luxembourg study obtained positive antibody results for 35 out of 1,807 participants, but nearly half of those individuals (15 of 35) had previously had a positive live virus test, were residing in a household with someone who had a confirmed positive test, or had direct contact with someone else who had been infected.[14]
Our critical review has also underscored the pitfalls of seroprevalence studies based on “convenience samples” of residual sera collected for other purposes. For example, two studies assessed seroprevalence of Utah residents during spring 2020. The first study analyzed residual sera from two commercial laboratories and obtained a prevalence estimate of 2·2% (CI: 1·2-3·4%), whereas the second study collected specimens from a representative sample and obtained a markedly lower prevalence estimate of 0·96% (CI: 0·4-1·8%).[15, 16] In light of these issues, our meta-analysis includes residual serum studies but we flag such studies as having an elevated risk of bias.
Comprehensive Tracing Programs
Our meta-analysis incorporates data on COVID-19 prevalence and fatalities in countries that have consistently maintained comprehensive tracing programs since the early stages of the pandemic. Such a program was only feasible in places where public health officials could conduct repeated tests of potentially infected individuals and trace those whom they had direct contact. We identify such countries using a threshold of 300 for the ratio of cumulative tests to reported cases as of 30 April 2020, based on comparisons of prevalence estimates and reported cases in Czech Republic, Korea, and Iceland; see Supplementary Appendices D and E.[17] Studies of Iceland and Korea found that estimated prevalence was moderately higher than the number of reported cases, especially for younger age groups; hence we make corresponding adjustments for other countries with comprehensive tracing programs, and we identify these estimates as subject to an elevated risk of bias.[18-20]
Measurement of fatalities
Accurately measuring total deaths is a substantial issue in assessing IFR due to time lags from onset of symptoms to death and from death to official reporting. Symptoms typically develop within 6 days after exposure but may develop as early as 2 days or as late as 14 days.[1, 21] More than 95% of symptomatic COVID patients have positive antibody (IgG) titres within 17-19 days of symptom onset, and those antibodies remain elevated over a sustained period.[22-25] The mean time interval from symptom onset to death is 15 days for ages 18–64 and 12 days for ages 65+, with interquartile ranges of 9–24 days and 7–19 days, respectively, while the mean interval from date of death to the reporting of that person’s death is ~7 days with an IQR of 2–19 days; thus, the upper bound of the 95% confidence interval between symptom onset and reporting of fatalities is about six weeks (41 days).[26]
Figure 1 illustrates these findings in a hypothetical scenario where the pandemic was curtailed two weeks prior to the date of the seroprevalence study. This figure shows the results of a simulation calibrated to reflect the estimated distribution for time lags between symptom onset, death, and inclusion in official fatality reports. The histogram shows the frequency of deaths and reported fatalities associated with the infections that occurred on the last day prior to full containment. Consistent with the confidence intervals noted above, 95% of cumulative fatalities are reported within roughly four weeks of the date of the seroprevalence study.
As shown in Table 2, the precise timing of the count of cumulative fatalities is relatively innocuous in locations where the outbreak had been contained for more than a month prior to the date of the seroprevalence study. By contrast, in instances where the outbreak had only recently been contained, the death count continued rising markedly for several more weeks after the midpoint of the seroprevalence study.
Therefore, we construct age-specific IFRs using the seroprevalence data in conjunction with cumulative fatalities four weeks after the midpoint date of each study; see Supplementary Appendix F. We have also conducted sensitivity analysis using cumulative fatalities five weeks after the midpoint date, and we flag studies as having an elevated risk of bias if the change in cumulative fatalities between weeks 4 and 5 exceeds 10%.
By contrast, matching prevalence estimates with subsequent fatalities is not feasible if a seroprevalence study was conducted in the midst of an accelerating outbreak. Therefore, our meta-analysis excludes seroprevalence studies for which the change in cumulative fatalities from week 0 to week 4 exceeds 200%.
Metaregression procedure
To analyze IFR by age, we use meta-regression with random effects, using the meta regress procedure in Stata v16.[27, 28] We used a random-effects procedures to allow for residual heterogeneity between studies and across age groups by assuming that these divergences are drawn from a Gaussian distribution. Publication bias was assessed using Egger’s regression and the trim-and-fill method. See Supplementary Appendix G for further details.
Role of the funding source
No funding was received for conducting this study.
Results
After an initial screening of 1145 studies, we reviewed the full texts of 111 studies, of which 50 studies were excluded due to lack of age-specific data on COVID-19 prevalence or fatalities.[11-13, 25, 29-75] Seroprevalence estimates for two locations were excluded because the outbreak was still accelerating during the period when the specimens were being collected and from two other locations for which age-specific seroprevalence was not distinguishable from zero.[15, 76-78] Studies of non-representative samples were excluded as follows: 11 studies of blood donors, 4 studies of patients of hospitals and outpatient clinics, 4 studies with active recruitment of participants, and 5 narrow sample groups such as elementary schools.[10, 13, 14, 76, 79-98] Supplementary Appendix H lists all excluded studies.
Consequently, our metaregression analyzes IFR data from 28 locations, which can be classified into three distinct groups:
Representative samples from studies of England, Ireland, Italy, Netherlands, Portugal, Spain, Geneva (Switzerland), and four U.S. locations (Atlanta, Indiana, New York, and Salt Lake City).[16, 99-109]
Convenience samples from studies of Belgium, France, Sweden, and a study of eight U.S. locations (Connecticut, Louisiana, Miami, Minneapolis, Missouri, Philadelphia, San Francisco, and Seattle).[15, 110-112]
Comprehensive tracing programs for Australia, Iceland, Korea, Lithuania, and New Zealand.[113-117]
The metaregression includes results from the very large REACT-2 seroprevalence study of the English population.[104] Thus, to avoid pitfalls of nested or overlapping samples, two other somewhat smaller studies conducted by U.K. Biobank and the U.K. Office of National Statistics are not included in the metaregression but are instead used in out-of-sample analysis of the metaregression results.[11, 118] Similarly, the metaregression includes a large representative sample from Salt Lake City, and hence a smaller convenience sample of Utah residents is included in the out-of-sample analysis along with two other small-scale studies.[15, 16, 119, 120] Data taken from included studies is shown in Supplementary Appendix I. Supplementary Appendix J assesses the risk of bias for each individual study. As indicated in Supplementary Appendix K, no publication bias was found using Egger’s test (p > 0.10), and the trim-and-fill method produced the same estimate as the metaregression.
We obtain the following meta-regression results:
where the standard error for each estimated coefficient is given in parentheses. These estimates are highly significant with t-statistics of -42·9 and 38·5, respectively, and p-values below 0·0001. The residual heterogeneity τ2 = 0·432 (p-value < 0.0001) and I2 = 97·0, confirming that the random effects are essential for capturing unexplained variations across studies and age groups. The adjusted R2 is 94·2%.
As noted above, the validity of this meta-regression rests on the condition that the data are consistent with a Gaussian distribution. The validity of that assumption is evident in Figure 3: Nearly all of the observations fall within the 95% prediction interval of the metaregression, and the remainder are moderate outliers.
Figure 4 depicts the exponential relationship between age and the level of IFR in percent, and Figure 5 shows the corresponding forest plot. Evidently, the SARS-CoV-2 virus poses a substantial mortality risk for middle-aged adults and even higher risks for elderly people: The IFR is very low for children and young adults but rises to 0·4% at age 55, 1·3% at age 65, 4·2% at age 75, 14% at age 85, and exceeds 25% for ages 90 and above. These metaregression predictions are well aligned with the out-of-sample IFRs; see Supplementary Appendix L.
As shown in Figure 6, the metaregression explains nearly 90% of the geographical variation in population IFR, which ranges from ~0·5% in Salt Lake City and Geneva to 1·5% in Australia and England and 2·7% in Italy. The metaregression explains this variation in terms of differences in the age structure of the population and age-specific prevalence of COVID-19.
Discussion
Our meta-analysis indicates that COVID-19 poses a low risk for children and younger adults but is hazardous for middle-aged adults and extremely dangerous for older adults. Table 4 contextualize these risks by comparing the age-specific IFRs from our meta-regression analysis to the annualized risks of fatal automobile accidents or other unintentional injuries in England and in the United States.[121, 122] For example, an English person aged 55–64 years who gets infected with SARS-CoV-2 faces a fatality risk that is more than 200 times higher than the annual risk of dying in a fatal car accident.
This analysis also confirms that COVID-19 is far more deadly than seasonal flu. For example, during the influenza season of winter 2018–19 the U.S. population had ~63 million infections and 34 thousand fatalities, with a population IFR of 0·05% an order of magnitude lower than COVID-19; see Supplementary Appendix M.
These results indicate that the population IFR should not be interpreted as a fixed parameter of COVID-19 but as an outcome that reflects public health measures to limit the incidence of infections among vulnerable age groups. To illustrate these considerations, we have constructed three scenarios for the U.S. trajectory of COVID-19 infections and fatalities; see Supplementary Appendix N. Each scenario assumes that U.S. prevalence rises to a plateau of around 20% but with different patterns of age-specific prevalence. In particular, if prevalence becomes uniform across age groups, this analysis projects that total U.S. fatalities would exceed 500 thousand and that population IFR would converge to around 0·8%. By contrast, a scenario with relatively low incidence of new infections among vulnerable age groups would be associated with less than half as many deaths and a much lower population IFR of ~0·3%.
Our critical review underscores the substantial benefits of assessing prevalence using large-scale studies of representative samples of the general population (rather than convenience samples of blood donors or medical patients). Conducting such studies on an ongoing basis will enable public health officials to monitor changes in prevalence among vulnerable age groups and gauge the efficacy of public policy measures. Moreover, such studies will enable researchers to assess the extent to which antibodies to SARS-CoV-2 may gradually diminish over time as well as the extent to which advances in treatment facilitate the reduction of age-specific IFRs.
As shown in Supplementary Appendix O, our metaregression results are broadly consistent with the pathbreaking study of Verity et al. (2020), which was completed at a very early stage of the COVID-19 pandemic and characterized an exponential pattern of age-specific IFRs that was very low for children and much higher for older adults.[123] Our results are also well-aligned with a more recent meta-analysis of population IFR; indeed, our age-specific analysis explains a very high proportion of the dispersion in population IFRs highlighted by that study.[5] In contrast, our findings are markedly different from those of an earlier review of population IFR, mostly due to differences in selection criteria.[124] Finally, the exponential pattern of our age-specific IFR estimates is qualititatively similar to that of age-specific CFRs but the magnitudes are systematically different, as shown in Supplementary Appendix P.
This meta-analysis has focused on the role of age in determining the IFR of COVID-19 but has not incorporated other factors that may have significant effects on IFR. For example, a recent U.K. study found that mortality outcomes are strongly linked to specific comorbidities such as diabetes and obesity but did not resolve the question of whether those links reflect differences in prevalence or causal effects on IFR.[125] See Supplementary Appendix Q for additional evidence. Likewise, we have not considered the extent to which IFRs may vary with other demographic factors such as race and ethnicity.[29, 59] Further research on these issues is clearly warranted.
It should also be noted that our analysis has focused exclusively on the incidence of fatalities but has not captured the full spectrum of adverse health consequences of COVID-19, some of which may be severe and persistent. Further research is needed to assess age-stratified rates of hospitalization as well as longer-term sequelae attributable to SARS-CoV-2 infections.
In summary, our meta-analysis demonstrates that COVID-19 is not just dangerous for the elderly and infirm but also for healthy middle-aged adults. The metaregression explains nearly 90% of the geographical variation in population IFR, indicating that the population IFR is intrinsically linked to the age-specific pattern of infections. Consequently, public health measures to protect vulnerable age groups could substantially reduce the incidence of mortality.
Data Availability
This study is a meta-analysis using information from published articles, preprints, and government reports; all sources are listed in the bibliography with active URLs. The data and Stata code used in performing the meta-regression analysis are provided as Supplementary Materials.
Declaration of Interests
The authors have no financial interests nor any other conflicts of interest related to this study.
No funding was received for conducting this study. This study was preprinted at: https://www.medrxiv.ors/content/10.1101/2020.07.23.20160895v3.
Footnotes
Updated meta-analysis to include recent seroprevalence studies disseminated as of September 17. Computes test-adjusted prevalence estimates and confidence intervals for studies that only reported raw prevalence. Supplemental materials updated.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.
- 20.↵
- 21.↵
- 22.↵
- 23.
- 24.
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.
- 31.
- 32.
- 33.
- 34.
- 35.
- 36.
- 37.
- 38.
- 39.
- 40.
- 41.
- 42.
- 43.
- 44.
- 45.
- 46.
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.
- 55.
- 56.
- 57.
- 58.
- 59.↵
- 60.
- 61.
- 62.
- 63.
- 64.
- 65.
- 66.
- 67.
- 68.
- 69.
- 70.
- 71.
- 72.
- 73.
- 74.
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.
- 81.
- 82.
- 83.
- 84.
- 85.
- 86.
- 87.
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.
- 94.
- 95.
- 96.
- 97.
- 98.↵
- 99.↵
- 100.
- 101.
- 102.
- 103.
- 104.↵
- 105.
- 106.
- 107.
- 108.
- 109.↵
- 110.↵
- 111.
- 112.↵
- 113.↵
- 114.
- 115.
- 116.
- 117.↵
- 118.↵
- 119.↵
- 120.↵
- 121.↵
- 122.↵
- 123.↵
- 124.↵
- 125.↵