Abstract
Introduction The infection-fatality rate (IFR) of COVID-19 has been carefully measured and analyzed in high-income countries, whereas there has been no systematic analysis of age-specific seroprevalence or IFR for developing countries.
Methods We systematically reviewed the literature to identify all COVID-19 serology studies in developing countries that were conducted using population representative samples collected by early 2021. For each of the antibody assays used in these serology studies, we identified data on assay characteristics, including the extent of seroreversion over time. We analyzed the serology data using a Bayesian model that incorporates conventional sampling uncertainty as well as uncertainties about assay sensitivity and specificity. We then calculated IFRs using individual case reports or aggregated public health updates, including age-specific estimates whenever feasible.
Results Seroprevalence in many developing country locations was markedly higher than in high-income countries. In most locations, seroprevalence among older adults was similar to that of younger age cohorts, underscoring the limited capacity that these nations have to protect older age groups. Age-specific IFRs were roughly 2x higher than in high-income countries. The median value of the population IFR was about 0.5%, similar to that of high-income countries, because disparities in healthcare access were roughly offset by differences in population age structure.
Conclusion The burden of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to accelerate the provision of vaccine doses to populations in developing countries.
Key Points
- Age-stratified infection fatality rates (IFRs) of COVID-19 in developing countries are about twice those of high-income countries.
- Seroprevalence (as measured by antibodies against SARS-CoV-2) is broadly similar across age cohorts, underscoring the challenges of protecting older age groups in developing countries.
- Population IFR in developing countries is similar to that of high-income countries, because differences in population age structure are roughly offset by disparities in healthcare access as well as elevated infection rates among older age cohorts.
- These results underscore the urgency of disseminating vaccines throughout the developing world.
Introduction
An important unknown during the COVID-19 pandemic has been the relative severity of the disease in developing countries compared to higher-income nations. The incidence of fatalities in many developing countries appeared to be low in the early stages of the pandemic, suggesting that the relatively younger age structure of these countries might have protected them against the harms of the disease. More recently, however, it has become clear that the perceived differences in mortality may have been illusory, reflecting poor vital statistics systems leading to underreporting of COVID-19 deaths (4, 5). Moreover, relatively low mortality outcomes in developing countries would be starkly different from the typical pattern observed for many other communicable diseases, reflecting the generally lower access to good-quality healthcare in these locations (6).
As shown in Table 1, mortality attributable to COVID-19 in many developing locations exceeds 2,000 deaths per million. Of the ten nations with the highest number of deaths attributed to COVID-19, seven are developing countries. Furthermore, these statistics may understate the true death toll in a number of lower- and middle income countries. Numerous studies of excess mortality have underscored the limitations of vital registration and death reporting, particularly in developing countries (4, 5, 7-11). For example, recent studies of India have found that actual deaths from COVID-19 were about ten times higher than those in official reports (5, 7). Similarly, a study in Zambia found that only 1 in 10 of those who died with COVID-19 symptoms and whose post-mortem COVID-19 test was positive were recorded as COVID-19 deaths in the national registry (12). Strikingly, the continuation of that study has demonstrated the catastrophic impact of COVID-19 in Zambia, raising the overall mortality by as much as five to ten times relative to a normal year (13).
There has, however, been a relative dearth of systematic research concerning the early experience of COVID-19 and the associated infection fatality rate (IFR) in developing countries. Previous evaluations have largely focused on assessing these patterns in high-income countries, where high quality data on seroprevalence and fatalities has been readily available throughout the pandemic (15, 16). In particular, seroprevalence studies conducted in high-income countries in 2020 found low overall prevalence of antibodies to COVID-19 (generally less than 10%) (17), with much lower prevalence among older adults compared to younger cohorts. Analysis of these data has clearly underscored the extent to which the IFR of COVID-19 increases exponentially with age, that is, the disease is far more dangerous for middle-aged and older adults compared to children and young people (3, 15, 16). Two prior meta-analytic studies have considered variations in IFR by age but did not consider the possibility that IFR in developing locations might differ systematically from high-income countries due to healthcare quality, access, and other socioeconomic factors (15, 18).
Objectives
Determine overall prevalence of COVID-19 infection in locations in developing countries
Assess age-specific patterns of seroprevalence in these locations
Estimate age-specific IFRs and compare to benchmark values for high-income countries
Investigate possible reasons for differences in population IFR between locations
Methods
To perform this meta-analysis, we collected published papers, preprints, and government reports of COVID-19 serology studies for which all specimens were collected before March 1st 2021 and that were publicly disseminated by July 14, 2021. The full search methodology is given in the supplementary appendices. The study was registered on the Open Science Foundation: https://osf.io/edpwv/
We restricted the scope of our analysis to locations in developing countries using the classification system of the International Monetary Fund (IMF); that is, we excluded locations that the IMF classifies as “high-income countries.” (19). In some contexts developing countries are also described as low- to middle-income countries or as emerging and developing economies.
Inclusion/Exclusion Criteria
Our analysis only included studies that had a random selection of participants from a sample frame representative of the general population (20, 21). Consequently, studies of convenience samples – such as blood donors or residual sera from commercial laboratories – were excluded. Such samples are subject to intrinsic selection biases that may vary across different settings and hence would detract from systematic analysis of the data Indeed, there is abundant evidence from the pandemic that convenience samples provide inaccurate estimates of seroprevalence, with assessments indicating that they are likely to overestimate the true proportion infected (22, 23). See the supplementary appendices for further details.
Serology Data
A crucial part of our analysis entailed adjusting raw seroprevalence to reflect the sensitivity and specificity of the particular assay used in each serology study, and to construct credible intervals that reflect uncertainty about assay characteristics as well as conventional sampling uncertainty. Where a reported study did not include that information, we requested it from study authors. This included start and end dates of specimen collection, the specific assay used, and age-specific serology data.
Deaths
For locations with publicly-available databases of all individual cases, we tabulated the fatality data to match the age brackets of that serology study, using cumulative fatalities as of 14 days after the midpoint date of specimen collection to reflect the time lags between infection, seropositivity, and fatal outcomes. In the absence of individual case data, we searched for contemporaneous public health reports and tabulated cumulative deaths as of 28 days after the midpoint date of specimen collection to incorporate the additional time lags associated with real-time reporting of COVID-19 fatalities.
Matching prevalence estimates with subsequent fatalities is not feasible if a serology study was conducted in the midst of an accelerating outbreak. Therefore, as in previous work, we estimated seroprevalence but did not analyze IFRs for locations where the cumulative death toll increased by 3x or more over the four-week period following the midpoint date of specimen collection. For details, see the supplementary appendices. In instances where we were not able to match deaths to serology data, or there were accelerating outbreaks, we used this information to look at serology only.
Additionally, we extracted data on excess deaths for all countries that were included in our analysis. We used two primary sources of estimates on excess mortality: the Institute for Health Metrics and Evaluation (IHME) (2) and the World Mortality Dataset (WMD) (4). The IHME produces national or regional estimates of excess mortality for every location included in this review, while the WMD has estimates for a subset of those locations. We then computed the ratio of excess mortality to reported fatalities for each location.
Covariates
We selected covariates that were judged likely to have an impact either on the IFR of COVID-19 itself or on the accuracy of official data on COVID-related mortality based on prior research and expertise. Where possible, we extracted these covariates at a state or regional level within a country, otherwise they were identified at national level. A full list of covariates and the method of extraction can be found in the supplementary appendices. In instances where a covariate was only available at the national level, we aggregated location-specific seroprevalence and IFRs by weighting each location using the square root of the number of serology specimens collected in that location.
Statistical Analysis
We use a Bayesian modelling framework to simultaneously estimate age-specific prevalence and infection fatality rates (IFRs) for each location in our study. We model age-specific prevalence for each location at the resolution of the serology data reported. We model the number of people that test positive in a given study location and age group as coming from a binomial distribution with a test positivity probability that is a function of the true prevalence, sensitivity and specificity, accounting for seroconversion and seroreversion (see the supplementary appendices).
As in Carpenter and Gelman (2020) (24), acknowledging the uncertainty in the test assay sensitivity and specificity itself, we consider sensitivity and specificity to be unknown and directly model the lab validation data (e.g., true positives, true negatives, false positives, false negatives) for each test. Independent weakly informative priors are placed on the seroprevalence parameters, and independent, informative priors akin to those in Carpenter and Gelman (2020) (24) are placed on the sensitivity and specificity parameters. To avoid assumptions about the variability of prevalence across age within a serology age bin, we aggregate deaths for each location to match their respective serology age bins. Independent mildly informative priors are assumed on the age group specific IFR parameters.
Prevalence for a given age group and location is estimated by the posterior mean and equal-tailed 95% credible interval. Uniform prevalence across age is deemed plausible for locations where the 95% credible intervals for the ratio of seroprevalence for age 60 and older over the seroprevalence estimate for ages 20 to 60 contains 1.
We model the number of individuals at a given location and age group that are reported dying of COVID-19 as Poisson distributed with rate equal to the product of the age group IFR, age group population, and age group prevalence. For locations where deaths were reported separately for different age bins this model provides IFR estimates for specific age groups and for broader population cohorts, including adults aged 18-65 years. For locations where death data was not disaggregated by age the model provides a population IFR. The model was implemented in the programming language R, with posterior sampling computation implemented with the Stan software package (25).
To perform a meta-analysis of age-specific IFRs across locations, we conduct a meta-regression with random effects. In the metaregression, the dependent variable is the estimated IFR for a specific age group in a specific geographical location, the explanatory variable is the median age of that particular age group, and the standard deviation of each idiosyncratic error is taken from the Bayesian analysis described above. We used a random-effects procedures to allow for residual heterogeneity between studies and across age groups by assuming that these divergences are drawn from a Gaussian distribution. We also allowed for fixed effects by location, to account for locations that deviate from the norm. Since the metaregression used IFR estimates based on reported deaths, we compared the location-specific fixed effects to two estimates of the ratio of excess mortality to COVID-19 deaths in each location. We also compared these metaregression results to a prior metaregression of age-specific IFR for high-income countries (3). This was performed using the meta regress procedure in Stata v17.
Results
We identified a total of 2,384 study records, with 2,281 records identified from online databases and a further 103 from Twitter and Google Scholar. After excluding 2,062 records we assessed 322 records for inclusion in the final analyses. There were a total of 89 studies that could be used to describe either seroprevalence or IFR. The final sample for IFR estimates included 62 estimates from 25 developing countries. The search and exclusion process can be seen in the supplementary appendices. The distribution of included seroprevalence estimates can be seen in Figure 1. A full list of studies included in the IFR calculations can be found in Table 2, and the full list of studies and links to each study can be seen on our Github repository https://covid-ifr.github.io/.
Map of study locations with specifics of how these locations were used in the study. St. Petersburg, Russia (not shown on the map) has total IFR data.
Seroprevalence
As shown in Figure 2, numerous locations in developing countries had relatively high levels of seroprevalence during the study period (March 2020 thru February 2021). That pattern is strikingly different from the outcomes in high-income countries, where seroprevalence generally remained below 20%.
Map of areas with seroprevalence during the studied period. St. Petersburg, Russia (not shown on the map) had measured seroprevalence of 11% as of June 2020. This represents the same seroprevalence as used in IFR calculations.
In most developing country locations, seroprevalence was roughly uniform across age strata. In particular, infection rates in older age groups were broadly similar to those in younger cohorts--a striking contrast to the typical pattern in high-income countries, where prevalence among older adults was markedly lower than among younger adults (3). Figure 3 shows the heatmap of age-specific seroprevalence across all age cohorts. As shown in Figure 4, the ratio of seroprevalence for older adults (ages 60+ years) compared to middle-aged adults (ages 40 to 59 years) is indistinguishable from unity in most of these locations.Infection Fatality Ratios
Map of areas with seroprevalence in t he studied period
Green shaded area – range of high-income nations for ratio during the studied period (3), orange line – ratio of 1.
Our statistical analysis produced age-specific IFRs and confidence intervals for 28 locations, and population IFRs for those locations as well as an additional 27 places. The full results of this analysis are shown in the supplementary appendices. We obtain the following metaregression results:
Here the standard error for each estimated coefficient is given in parentheses. These estimates are highly significant with t-statistics of -28.7 and 21.0, respectively, and p-values below 0·0001. The residual heterogeneity τ2 = 0.039 (p-value < 0.0001) and I2 = 92.5, confirming that the random effects are essential for capturing unexplained variations across studies and age groups. The adjusted R2 is 91.1%. Location-specific fixed effects are only distinguishable from zero for three locations: Maranhão, Brazil (−0.50); Chennai, India (−0.68); and Karnataka, India (−1.29).
The metaregression results can be seen in Figure 6. Nearly all of the observations fall within the 95% prediction interval. The importance of the location-specific effects is readily apparent. Indeed, these effects imply that the age-specific IFRs for Maranhão are about 1/3 of the metaregression prediction, while those for Chennai and Karnataka are 1/5 and 1/20, respectively.
This metaregression analysis uses age-specific IFRs based on reported COVID-19 deaths in each location. As a cross-check, table 3 reports the ratio of excess mortality to reported deaths for each of these locations.
For nearly all of these locations, the ratio is indistinguishable from unity; that is, reported COVID-19 deaths are broadly consistent with the evidence from excess mortality assessments. There were three exceptions (Chennai, Karnataka, and Nairobi, Kenya), two of which had significant location-specific effects in the metaregression.
The precision of IFR estimates varied by age. At lower age groups, the number of deaths becomes very small, and thus the uncertainty is large regarding the IFR. Conversely, at older ages the number of infections and deaths can be very small in countries with extremely small populations of those aged over 65, and thus these estimates are also uncertain. The full figures across all ages can be found in the supplementary appendices.
IFR estimates are presented in figure 5 for each location. Here the age-adjusted IFR estimates for each location were weighted based on the location specific prevalence of each age group and a common baseline population structure so that these population IFR estimates are comparable across locations with differing population structure. We also adjust for excess mortality using the ratios shown in Table 3.
IFR estimates for the population aged 18-65 across locations.
IFR estimates for the population aged 18-65 across locations.
Assessment of Death Reporting
For the full set of locations for which population IFR can be assessed, we found that the adequacy of death certification was highly significant in explaining cross-country variations. As shown in Figure 7, the median value of population IFR was about 0.5% in countries where a majority of deaths were well-certified (using SDG assessments conducted prior to the pandemic) compared to only 0.05% in countries with lower proportions of well-certified deaths. In the latter set of countries, adjustments for excess mortality shift the population IFR upwards by an order of magnitude, to a median of 0.6%. Indeed, the population IFR for Zambia increases from 0.23% to 1.96% – the highest value of any country in our sample. In contrast, the excess mortality adjustments make relatively little difference for countries with a majority of well-certified deaths.
Population IFR for regions divided into areas with <50% well-certified deaths and areas with >50% well-certified deaths as per SDGs (1) (purple) and these IFRs adjusted for estimates of excess mortality (blue). For two locations (Bolivia and Nepal) information on well-certified deaths was over ten years old and so these countries were excluded.
Finally, we considered the extent to which the adjusted measures of population IFR were robust to alternative estimates of the ratio of excess mortality to reported deaths. As shown in Figure 8 the estimates from IHME and WMD were generally well aligned, with just a small number of exceptions. The adjusted population IFRs had a median value of 0.49% using the IHME estimates and 0.58% using the WMD estimates.
Population IFR for regions adjusted with either IHME (2) or WMD (4) estimates for mortality. Population IFRs adjusted for excess mortality are shown for all locations except Santa Cruz (Bolivia).
Discussion
This analysis shows the enormous impact that COVID-19 has had on developing countries. The risk of infection observed across developing countries is higher than in high-income nations. Prevalence in developing countries is roughly uniform across age groups, in contrast to the typical pattern in high-income countries where seroprevalence is markedly lower among middle-aged and older adults. The IFR is substantially higher in developing countries than higher-income locations.
We showed that at 20 years of age, the mean IFR in developing countries is 2.7 times higher than that in high-income countries and at age 60 the risk is doubled. At the oldest ages, this discrepancy is reduced, with only a modestly increased risk at age 80. These comparisons are shown in Figure 9 below.
Comparison of IFRs at different ages for high-income vs developing countries
These results are consistent with the pattern observed for most other communicable diseases. In locations with little ability to work from home, where quarantine is difficult or impossible, where opportunities for physical distancing and access to sanitation are poor, with lower healthcare resources, and where even basic resources such as supplemental oxygen are in short supply, people have fared substantially worse during the pandemic than in high-income settings. Indeed, in low-income settings where fewer hospital beds and health care workers are available, COVID-19 has caused great devastation and an enormous death toll. With a much higher IFR, particularly in younger people, the ultimate burden for developing nations from COVID-19 is likely to be very high.
Another important facet of our results is that seroprevalence was both higher and consistent across age-groups in developing countries, in contrast to the lower rates of infection seen in high-income areas, particularly in older populations. Evidently, it is very difficult to insulate elderly people from the virus in a slum or a rural village. For example, seroprevalence in slum neighbourhoods of Mumbai was about four times higher than in non-slum neighbourhoods (63). Our analysis indicates that the relatively uniform prevalence of COVID-19 in developing countries has dramatically increased the number of fatalities in these locations.
Our findings reinforce the conclusions of previous studies that have assessed the IFR of COVID-19 (16, 73). In particular, COVID-19 is dangerous for middle-aged adults, not just the elderly and infirm (3). Our results are also well-aligned with IFR estimates produced for specific locations in developing countries (see supplementary appendices).
Our analysis underscores that incomplete death reporting is a crucial source of apparent differences in COVID-19 death rates. In particular, this is related to the proportion of deaths that are assigned to so-called “garbage codes” (1, 74, 75). These deaths are, by definition, not included in national tallies of the population that has died from COVID-19. In places where death reporting systems are adequate to record deaths, the IFR is on average 10x higher than in places where many deaths are left uncertified.
The divergence between population IFRs for locations is similar whether adjusted for death certification or excess mortality. Adjustment for estimates of excess mortality produced location population IFRs that were consistent with IFRs produced in the age-stratified analysis aside from a few minor outliers. The median of these population IFRs for developing nations, once adjusted for potential undercounting of COVID-19 deaths, was either 0.49% or 0.58%, which was very similar to estimates of IFR from earlier in the pandemic (16, 76).
Excess mortality is a useful metric for adjusting IFR estimates in areas where deaths are well-registered but not well-certified, that is, captured in national vital statistics but without a specific cause of death (4). Nonetheless, caution is warranted in applying national estimates of excess mortality to specific regions within a country, recognizing that death reporting systems may vary markedly with the degree of urbanization and other socioeconomic factors. In the case of Ecuador, for example, the national estimate for the ratio of excess mortality to reported COVID-19 deaths in 2020 was 2.6 (2, 4), whereas that ratio was only 1.01 in the province of Azuay (77).
Moreover, estimates of excess mortality may partly reflect indirect effects of the pandemic on other sources of mortality. On the one hand, non-pharmaceutical interventions may reduce mortality from causes such as vehicle accidents (78). Conversely, mortality may be elevated by impaired access to healthcare for non-infectious diseases such as chronic cardiovascular disease or cancer (79).
Finally, the true burden of COVID-19 may be practically impossible to assess in locations where many deaths are never entered into the national vital statistics system (80). For example, total mortality in Kenya was lower in 2020 than in 2019, but those statistics should certainly not be interpreted as suggesting that Kenya was unscathed by the pandemic (2). Indeed, assessments of Kenya’s vital statistics found that only two-thirds of actual deaths were recorded in the system (80). Such considerations may explain other outliers in our analysis, such as Senegal, which remains far below similar locations even when estimates are adjusted for excess mortality.
In the absence of better death reporting, it is challenging to assess the extent to which differences in IFR across locations reflect systematic disparities in healthcare access, socioeconomic status, and other indicators. Nonetheless, such effects have been clearly demonstrated by studies that have assessed distinct socioeconomic groups within specific regions such as Santiago, Chile (9). Moreover, these considerations are almost certainly relevant in interpreting our finding that age-stratified IFR is markedly higher in developing countries compared to high-income countries. Indeed, our results underscore the tragedy that a Zambian young adult with COVID-19 would be far more likely to die than a Swiss person of similar age.
Our analysis makes a novel contribution in providing a systematic and comprehensive assessment of the implications of seroreversion, that is, the proportion of people who develop antibodies but whose tests will fall below the limit of detection at a later date. Prior studies have either ignored this issue or have assumed that seroreversion occurs at a fixed geometric rate regardless of the assay used (15, 16). In contrast, we have collated detailed information about the characteristics of all assays used in the serology studies included in our analysis, including data on seroreversion as well as test specificity and sensitivity; that information is fully described in our supplementary appendices. Our analysis clearly indicates that the extent of seroreversion differs in magnitude depending on the assay used. Moreover, accounting for seroreversion and other assay characteristics is crucial for assessing seroprevalence accurately in many of the locations covered by our analysis.
Our analysis makes a very strong case for swifter action on vaccine equity. While countries have largely sought to protect their own populations, there is increasing commitment to ensuring that key populations in low and middle income countries receive vaccines, at a minimum for their front-line health and other personnel. It is widely accepted that failing to control the pandemic across the globe will contribute to the emergence of additional strains of COVID-19, potentially undermining the efficacy of available vaccines (81). Current vaccine distribution efforts are grossly inequitable (82). Current estimates suggest that fewer than 10% of people in low-income countries have received an immunization, while well above 50% of people in high-income countries have had at least one vaccination (14).
As with all research, our study is subject to a number of limitations. Firstly, while we made every effort to capture seroprevalence data, including corresponding with dozens of researchers and public health officials worldwide, it is possible that some studies have been missed. However, it is unlikely that any small number of additional studies would make a material difference to our results.
Our analysis did not incorporate time series data on the evolution of COVID-19 deaths. However, some studies of high-income countries have shown how such data can be useful in refining assessment of IFR to incorporate the stochastic timing of COVID-19 deaths (16, 83). Such analysis should be a priority for future research about IFR in developing countries.
Our work also did not consider non-mortality harms from COVID-19. Recent work has shown that even at younger ages a substantial fraction of infected individuals will have severe, long-lasting adverse effects from COVID-19 (84). Consequently, the impact on the healthcare system and society may be far greater than would be reflected in mortality rates alone. Focusing only on survival rates obscures the large number of deaths that occur from non-COVID-19 when many people are infected (85), SARS-CoV-2’s relatively high fatality rate in comparison to other pathogens and other causes of death (86), and non-mortality harms of COVID-19, such hospitalization from serious disease (84). Future work should address these non-mortality harms, including Long COVID.
Finally, our analysis only includes serology studies where specimen collection was completed by the end of February 2021. Consequently, our results do not reflect any potential changes in IFR that may have resulted from more recent advances in COVID-19 care, most notably, the development of novel antiviral medications and dissemination of vaccines. Of course, the IFR could also shift with the spread of new variants of SARS-CoV-2 (87).
Conclusion
The prevalence and IFR by age of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to accelerate the provision of vaccine doses to vulnerable populations in developing countries. Moreover, many developing countries require ongoing support to upgrade the quality of their vital statistics systems to facilitate public health decisions and actions, not only for the COVID-19 pandemic but for future global health concerns.
Data Availability
Data and code are available online at https://covid-ifr.github.io/
Statements of Competing Interests
This work was not externally funded and the authors report no financial or other conflicts of interest.
Code and Data
All data and code are available publicly online: https://covid-ifr.github.io/
Acknowledgements
Thanks to Ariel Karlinsky for assistance with death registration and mortality data.
Footnotes
Updated text, added review of excess mortality and further sensitivity analyses. Amalgamated supplementary appendices for easier reading.