Abstract
Introduction The infection-fatality rate (IFR) of COVID-19 has been carefully measured and analyzed in high-income countries, whereas there has been no systematic analysis of age-specific seroprevalence or IFR for developing countries. Indeed, it has been suggested that the death rate in developing countries may be far lower than in high-income countries—an outcome that would be starkly different from the typical pattern for many other infectious diseases.
Methods We systematically reviewed the literature to identify all serology studies in developing countries that were conducted using representative samples of specimens collected by early 2021. For each of the antibody assays used in these serology studies, we identified data on assay characteristics, including the extent of seroreversion over time. We analyzed the serology data using a Bayesian model that incorporates conventional sampling uncertainty as well as uncertainties about assay sensitivity and specificity. We then calculated IFRs using individual case reports or aggregated public health updates, including age-specific estimates whenever feasible.
Results Seroprevalence in many developing country locations was markedly higher than in high-income countries but still far short of herd immunity. In most locations, seroprevalence among older adults was similar to that of younger age-groups. Age-specific IFRs were 1.3-2.5x higher than in high-income countries. The median value of population IFR was 0.5% among developing countries with satisfactory death reporting as of 2016, compared to a median of 0.05% for other developing countries.
Conclusion The burden of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to accelerate the provision of vaccine doses to vulnerable populations in developing countries.
Key Points
- Age-specific prevalence and infection fatality rate (IFR) of COVID-19 for developing countries has not been well assessed.
- Seroprevalence in developing countries (as measured by antibodies against SARS-CoV-2) is markedly higher than in high-income countries but still far short of herd immunity.
- Seroprevalence among older adults is broadly similar to that of younger age-groups.
- Age-specific IFRs in developing countries are roughly twice those of high-income countries.
- Population IFR in developing countries with satisfactory death reporting (based on Global Burden of Disease data as of 2019) is ten times higher than in other developing countries.
- These results underscore the urgency of disseminating vaccines to vulnerable people in developing countries.
Introduction
Since the early stages of the COVID-19 pandemic, it has been commonly assumed that the burden of this disease would be substantially lower in developing countries, due to their relatively younger age structure compared to higher-income nations (3). That perspective was reinforced by the apparently low incidence of fatalities in many developing countries during the first wave. More recently, however, it has become clear that the perceived differences in mortality may have been illusory, reflecting poor vital statistics systems leading to underreporting of COVID-19 deaths (4, 5). Moreover, relatively low mortality outcomes in developing countries would be starkly different from the typical pattern observed for many other infectious diseases, reflecting the generally lower access to high-quality healthcare that has been documented in these locations (6).
As shown in Table 1, mortality attributable to COVID-19 in many developing locations exceeds 2,000 deaths per million. Of the nations with the top ten most deaths attributed to COVID-19 in the world, seven are developing countries. Indeed, even these statistics may well understate the true death toll in some lower-income places. Numerous studies of excess mortality have underscored the issues with death reporting, particularly in developing countries (4, 5, 7–11). For example, recent studies of India have found that actual deaths from COVID-19 were about ten times higher than in official reports (5, 7). Similarly, a study in Zambia found that only 1 in 10 of those who died with COVID-19 symptoms and whose post-mortem COVID-19 test was positive were recorded as COVID-19 deaths in the national registry (12). Strikingly, the continuation of that study has demonstrated the catastrophic impact of COVID-19 in Zambia, raising the overall mortality rate by as much as five to ten times relative to a normal year (13).
Unfortunately, there has been a dearth of systematic research about the spread of disease and the infection fatality rate (IFR) in developing countries. Previous evaluations have largely focused on assessing these patterns in high-income countries, where high quality data on seroprevalence and fatalities has been readily available throughout the pandemic (15, 16). In particular, seroprevalence studies conducted in high-income countries in 2020 found low overall prevalence of antibodies to COVID-19 (generally less than 10%) (17), with much lower prevalence among older adults compared to younger cohorts. Analysis of these data has clearly underscored the extent to which the IFR of COVID-19 increases exponentially with age, that is, the disease is far more dangerous for middle-aged and older adults compared to children and young people (2, 15, 16). The existing evaluations have generally assumed that IFR varies with age and sex at birth but have not considered the extent to which it could be affected by disparities in socioeconomic status and access to high-quality healthcare (15, 18).
Objectives
Determine overall prevalence of COVID-19 infection in locations in developing countries
Assess age-specific patterns of seroprevalence in these locations
Estimate age-specific IFRs and compare to benchmark values for high-income countries
Investigate possible reasons for differences in population IFR between locations
Methods
To perform this meta-analysis, we collected published papers, preprints, and government reports of COVID-19 serology studies for which all specimens were collected before March 1 2021 and that were publicly disseminated by July 14, 2021. The full search methodology is given in Supplementary Appendix 1. The study was registered on the Open Science Foundation: https://osf.io/edpwv/
We restricted the scope of our analysis to locations in developing countries using the classification system of the International Monetary Fund (19).
Inclusion/Exclusion Criteria
Our analysis only included studies that had a random selection of participants from a sample frame that was representative of the general population (20, 21). Consequently, studies of convenience samples – such as blood donors or residual sera from commercial laboratories – were excluded. This is detailed more fully in Supplementary Appendix 3. There is abundant evidence from the pandemic that convenience samples provide inaccurate estimates of seroprevalence, with assessments indicating that they are likely to overestimate the true proportion infected (22, 23).
Serology Data
A crucial part of our analysis is adjusting raw seroprevalence to reflect the sensitivity and specificity of the particular assay used in each serology study, and to construct credible intervals that reflect uncertainty about assay characteristics as well as conventional sampling uncertainty. Thus, in instances where a study did not include that information, we requested it from study authors. This included start and end dates of specimen collection, the specific assay used, and age-specific serology data.
Deaths
For locations with publicly-available databases of all individual cases, we tabulated the fatality data to match the age brackets of that serology study, using cumulative fatalities as of 14 days after the midpoint date of specimen collection to reflect the time lags between infection, seropositivity, and fatal outcomes. In the absence of individual case data, we searched for contemporaneous public health reports and tabulated cumulative deaths as of 28 days after the midpoint date of specimen collection to incorporate the additional time lags associated with real-time reporting of COVID-19 fatalities. Matching prevalence estimates with subsequent fatalities is not feasible if a serology study was conducted in the midst of an accelerating outbreak. Therefore, as in previous work, we estimated seroprevalence but did not analyze IFRs for locations where the cumulative death toll increased by 3x or more over the four-week period following the midpoint date of specimen collection. For details, see Supplementary Appendix 2. In instances where we were not able to match deaths to serology data, or there were accelerating outbreaks, we used this information to look at serology only.
Covariates
We selected covariates that were judged likely to have an impact either on the IFR of COVID-19 itself or on the accuracy of official data on COVID-related mortality based on prior research and expertise. Where possible, we extracted these covariates at a state or regional level within a country, otherwise they were identified at national level. A full list of covariates and the method of extraction can be found in Supplementary Appendix 4.
Statistical Analysis
We use a Bayesian modelling framework to simultaneously estimate age-specific prevalence and infection fatality rates (IFRs) for each location in our study. We model age-specific prevalence for each location at the resolution of the serology data reported. We model the number of people that test positive in a given study location and age group as coming from a binomial distribution with a test positivity probability that is a function of the true prevalence, sensitivity and specificity, accounting for seroconversion and seroreversion (see Supplementary Appendix 5). As in Carpenter and Gelman (2020) (27), acknowledging the uncertainty in the test assay sensitivity and specificity itself, we consider sensitivity and specificity to be unknown and directly model the lab validation data (e.g., true positives, true negatives, false positives, false negatives) for each test. Independent weakly informative priors are placed on the seroprevalence parameters, and independent, informative priors akin to those in Carpenter and Gelman (2020) (27) are placed on the sensitivity and specificity parameters.
Prevalence for a given age group and location is estimated by the posterior mean and equal-tailed 95% credible interval. Uniform prevalence across age is deemed plausible for locations where the 95% credible intervals for the ratio of seroprevalence for age 60 and older over the seroprevalence estimate for ages 20 to 60 contains 1.
In order to avoid assumptions about the variability of prevalence across age within a serology age bin, we aggregate deaths for each location to match their respective serology age bins. We model the number of individuals at a given location and age group that are reported dying of COVID-19 as Poisson distributed with rate equal to the product of the age group IFR, age group population, and age group prevalence. Independent mildly informative priors are assumed on the age group specific IFR parameters. This model provides age-group level IFR estimates for locations where deaths were reported separately for different age bins and an overall IFR estimate for locations with only total death data. The model was implemented in the programming language R, with posterior sampling computation implemented with the Stan software package (28).
To examine the curve of age-specific fatalities in developing countries compared to high-income countries, we re-created a metaregression of IFR on age in previously published work (2). The full methodology for this is given in Supplementary Appendix 7.
Results
We identified a total of 2,347 study records, with 2,281 records identified from online databases and a further 66 from Twitter and Google Scholar. After excluding 2,061 records we assessed 286 records for inclusion in the final analyses. There were a total of 88 studies that could be used to describe either seroprevalence or IFR. The final sample for IFR estimates included 56 estimates from 21 developing countries. The search and exclusion process can be seen in Supplementary Appendix 10. The distribution of included seroprevalence estimates can be seen in Figure 1. A full list of studies included in the IFR calculations can be seen in Table 2, and the full list of studies and links to each study can be seen on our Github repository.
Seroprevalence
In contrast to high-income countries, the seroprevalence across developing countries was substantially higher after a single wave. This is shown in the map on Figure 2, where the majority of high-income locations have seroprevalence below 20%, while a large number of developing countries have seroprevalence far exceeding this rate.
A major finding of this research was that seroprevalence in the majority of developing areas was consistent across age strata. What this means is that infection rates in older age groups were similar to those in younger age groups, which is in contrast to observed rates of infection seen in high-income countries (2). This is displayed below in Figures 3 and 4. Figure 3 shows the heatmap of age-specific seroprevalence in included locations, demonstrating that for the majority of developing countries the proportion of people with evidence of past infection is consistent across age strata. Figure 4 demonstrates this numerically, showing that the majority of developing countries have seroprevalence consistent with no protection of older age groups (i.e. equal infection rates between older and younger adults).
Population IFRs
The primary output of our model is the population IFR. This is an estimate of the total number of deaths over the total number of infections for a given location between the ages of 18-65. These estimates are presented in figure 5 for each location. Here the age-specific IFR estimates for each location were weighted based on the location specific prevalence of each age group and a common baseline population structure so that these population IFR estimates are comparable across locations with differing population structure (see Supplementary Appendix 11).
There was a great deal of heterogeneity in these population IFRs. There were 5 locations for which the population IFR for ages 18-65 was lower than earlier AE estimates, 4 locations for which the results were consistent with earlier AE estimates, and 16 locations for which the results were higher. Most estimates above the predictions for high-income countries were substantially higher, with 8 locations having a population IFR for ages 18-65 more than double that of the high-income prediction. There was also disparity between locations with very similar population characteristics, for example the enormous variation seen in different estimates from areas within Colombia.
The metaregression results can be seen in Figure 6. At 25 years of age, the mean IFR in developing countries is 2.3times higher than that in high-income countries. At older ages, this discrepancy is reduced, with only a modestly increased risk at age 80. These comparisons are shown in Table 3 below.
IFR estimates varied fundamentally differently for higher and lower age groups. At lower age groups, the number of deaths becomes very small, and thus the uncertainty is very large regarding the IFR. Conversely, at older ages the number of infections and deaths can be very small in countries with extremely small populations of those aged over 65, and thus these estimates are also uncertain. The full figures across all ages can be found in Supplementary Appendix 6.
Covariates
A full examination of covariates considered in this analysis is presented in Supplementary Appendix 4. Using the Sustainable Development Goals (SDGs) definition from 2019 of deaths properly recorded (1), we found that the median population IFR in areas where <50% of deaths were well-certified was 0.05% compared to a median population IFR of 0.46% in areas where >50% of deaths were well-certified. There was a strong correlation between death reporting adequacy prior to the pandemic and the IFR.
Discussion
This analysis shows the enormous impact that COVID-19 has had on developing countries. The risk of infection observed across developing countries is higher than in high-income nations. Prevalence in developing countries is roughly uniform across age groups, in contrast to the typical pattern in high-income countries where seroprevalence is markedly lower among middle-aged and older adults. The IFR is substantially higher in developing countries than higher-income locations.
These results are consistent with the usual pattern that has been observed for other infectious diseases and are completely inconsistent with the hypothesis that COVID-19 infections are mostly dangerous for high-income countries with aging populations. In locations with no ability to work from home, where quarantine is difficult or impossible, with lower healthcare resources, and where even basic resources such as supplemental oxygen are in short supply, people have fared very substantially worse during the pandemic than high-income places such as the US. Indeed, in low-income areas where hospital beds are only accessible for a small proportion of the total population, it appears that COVID-19 has caused great devastation and an enormous death toll.
Our findings reinforce the conclusions of previous studies that have assessed the IFR of COVID-19 (16, 69). In particular, COVID-19 is dangerous for middle-aged adults, not just the elderly and infirm (2). Our results are also well-aligned with IFR estimates produced for specific locations in developing countries (see Supplementary Table 8).
The implications for estimating the magnitude of COVID-19 fatalities in developing countries are considerable. It has previously been demonstrated that these figures may be substantial underestimates (12), and our study confirms that the most likely explanation for places with very low death rates is simply that these places are not recording COVID-19 deaths adequately.
In particular, this is related to the proportion of deaths that are assigned to so-called “garbage codes” (1, 24, 25). These deaths are, by definition, not included in national tallies of the population that has died from COVID-19. In places where death reporting systems are adequate to record deaths, the IFR is on average 8x higher than in places where many deaths are left uncertified. We can say with some certainty that the true difference between developing countries with many similar characteristics, such as areas of Brazil and India, is probably minimal, and that the apparent difference in COVID-19 death rates is due to incomplete death reporting systems. This relationship has also been demonstrated through other areas of investigation, including excess mortality (4), which adds weight to the likelihood that perceived differences are likely due to reporting systems. This provides an urgent impetus for higher-income nations to assist with the development and implementation of better reporting systems for lower-income areas of the world.
Our model makes a very strong case for swifter action on vaccine equity. While countries have largely sought to protect their own populations, there is increasing commitment to ensuring that key populations in low and middle income countries receive vaccines, at a minimum for their front-line health and other personnel. It is widely accepted that failing to control the pandemic across the globe will contribute to the emergence of additional strains of COVID-19, potentially undermining the efficacy of available vaccines (70). Current vaccine distribution efforts are grossly inequitable (71). Current estimates suggest that less than 10% of people in low-income countries have received an immunization, while well above 50% of people in high-income countries have had at least one vaccination (14).
Our research has demonstrated how damaging COVID-19 can be in areas where healthcare resources are strained. While it has been to argued that developing countries are likely to have been spared the travails of pandemic disease due to their younger population, our estimates show that this is not precisely true. While younger people are much less likely to die from an infection, in places with very low resources there are large numbers of deaths that may have been prevented with better access to medical services. Focusing only on survival rates also obscures the large number of deaths that occur when many people are infected (72), SARS-CoV-2’s relatively high fatality rate in comparison to other pathogens and other causes of death(73), and non-mortality harms of COVID-19, such hospitalization from serious disease (74).
Another important facet of our results is that seroprevalence was both higher and consistent across age-groups in developing countries, in contrast to the lower rates of infection seen in high-income areas, particularly in older populations. This demonstrates that, despite efforts, it has not been possible to protect elderly populations in these lower-income settings, which has likely contributed to the terrible toll that COVID-19 has had in these areas. Despite the much higher disease rates in developing countries, they were still far off proposed herd immunity thresholds, underscoring the urgent need for vaccines in these places.
We have also worked through several potential explanations that have been posited for why some developing countries have seemed less impacted by the pandemic. In general, the most likely explanation for large differences in reported IFR appears to simply be the recording of deaths in each region. While other factors such as GDP are correlated with death rates, they are also highly correlated with death reporting, and a likely explanation appears to be that the majority of places with very low IFRs are simply those places that cannot capture COVID-19 deaths adequately. This does not exclude some impact from other covariates, but it is likely that this impact is small.
As with all research, our study is subject to a number of limitations. Firstly, while we made every effort to capture seroprevalence data, including corresponding with dozens of researchers and public health officials worldwide, it is possible that some studies have been missed. However, it is unlikely that any small number of additional studies would make a material difference to our results.
For each location we take the number of deaths as given, alternatively one can incorporate uncertainty treating the outcome of death as a random process which may contribute a great deal to uncertainty in confidence intervals particularly in places with low populations (75). We also did not incorporate time series data on the evolution of COVID-19 deaths; previous work has shown how such data can be used to analyse the random timing of COVID-19 deaths, but unfortunately complete time series data is not readily available for many specific locations, especially in developing countries (16). Finally, we did not use data on total mortality in years prior to the pandemic; such information has been immensely valuable in constructing estimates of excess mortality during the pandemic (4). However, such estimates have generally been produced at a national level, due to the same limitations on data availability at regional and local levels.
A very substantial limitation of our analysis is that we don’t consider the extent to which total mortality may be severely underestimated in developing countries. Indeed, recent research has documented the importance of this issue (76).
Our work also did not consider non-mortality harms from COVID-19. Recent work has shown that even at younger ages a substantial fraction of infected individuals will have severe, long-lasting adverse effects from COVID-19 (74). Consequently, the impact on the healthcare system and society may be far greater than would be reflected in mortality rates alone.
As with all studies of this type, the ecological fallacy is an inherent limitation. Using country or region levels for covariates means that the diversity which is apparent even in subnational units is homogenized, and important granularity may have been lost.
Conclusion
The burden of COVID-19 is far higher in developing countries than in high-income countries, reflecting a combination of elevated transmission to middle-aged and older adults as well as limited access to adequate healthcare. These results underscore the critical need to accelerate the provision of vaccine doses to vulnerable populations in developing countries. Moreover, many developing countries require urgent assistance in upgrading the quality of their vital statistics systems to facilitate public health decisions and actions, not only for the COVID-19 pandemic but for future global health concerns.
Data Availability
Data and code are available online at https://covid-ifr.github.io/
Statements of Competing Interests
This work was not funded and the authors report no financial or other conflicts of interest.
Code and Data
All data and code is available publicly online: https://covid-ifr.github.io/
Acknowledgements
Thanks to Ariel Karlinsky for assistance with death registration and mortality data.
Footnotes
Added ORCID IDs and updated text, rewrote introduction somewhat and updated several graphs.