## Abstract

**Introduction** Fervorous investigation and dialogue surrounding the true number of SARS-CoV-2 related deaths and implied infection fatality rates in India have been ongoing throughout the pandemic, and especially pronounced during the nation’s devastating second wave. We aim to synthesize the existing literature on the true SARS-CoV-2 excess deaths and infection fatality rates (IFR) in India, through a systematic search followed by viable meta-analysis. We then provide updated epidemiological model-based estimates of the wave 1, wave 2 and combined IFRs using an extension of the Susceptible-Exposed-Infected-Removed (SEIR) model, using data from April 1, 2020 to June 30, 2021.

**Methods** Following PRISMA guidelines, the databases PubMed, Embase, Global Index Medicus, as well as BioRxiv, MedRxiv, and SSRN for preprints (accessed through iSearch), were searched on July 3, 2021 (with results verified through August 15, 2021). Altogether using a two-step approach, 4,765 initial citations were screened resulting in 37 citations included in the narrative review and 19 studies with 41 datapoints included in the quantitative synthesis. Using a random effects model with DerSimonian-Laird estimation, we meta-analyze IFR_{1} which is defined as the ratio of the total number of observed reported deaths divided by the total number of estimated infections and IFR_{2} (which accounts for death underreporting in the numerator of IFR_{1}). For the latter, we provide lower and upper bounds based on the available range of estimates of death undercounting, often arising from an excess death calculation. The primary focus is to estimate pooled nationwide estimates of IFRs with the secondary goal of estimating pooled regional and state-specific estimates for SARS-CoV-2 related IFRs in India. We also try to stratify our empirical results across the first and the second wave. In tandem, we present updated SEIR model estimates of IFRs for waves 1, 2, and combined across the waves with observed case and death count data from April 1, 2020 to June 30, 2021.

**Results** For India countrywide, underreporting factors (URF) for cases (sourced from serosurveys) range from 14.3-29.1 in the four nationwide serosurveys; URFs for deaths (sourced from excess deaths reports) range from 4.4-11.9 with cumulative excess deaths ranging from 1.79-4.9 million (as of June 2021). Nationwide pooled IFR_{1} and IFR_{2} estimates for India are 0.097% (95% confidence interval [CI]: 0.067 – 0.140) and 0.365% (95% CI: 0.264 – 0.504) to 0.485% (95% CI: 0.344 – 0.685), respectively, again noting that IFR_{2} changes as excess deaths estimates vary. Among the included studies in this meta-analysis, the IFR_{1} generally appear to decrease over time from the earliest study end date to the latest study end date (from 4 June 2020 to 6 July 2021, IFR_{1} changed from 0.199 to 0.055%), whereas a similar trend is not as readily evident for IFR_{2} due to the wide variation in excess death estimates (from 4 June 2020 to 6 July 2021, IFR_{2} ranged from (0.290-1.316) to (0.241-0.651) %).

Nationwide SEIR model-based combined estimates for IFR_{1} and IFR_{2} are 0.101% (95% CI: 0.097 – 0.116) and 0.367% (95% CI: 0.358 – 0.383), respectively, which largely reconcile with the empirical findings and concur with the lower end of the excess death estimates. An advantage of such epidemiological models is the ability to produce daily estimates with updated data with the disadvantages being that these estimates are subject to numerous assumptions, arduousness of validation and not directly using the available excess death data. Whether one uses empirical data or model-based estimation, it is evident that IFR_{2} is at least 3.6 times more than IFR_{1.}

**Conclusion** When incorporating case and death underreporting, the meta-analyzed cumulative infection fatality rate in India varies from 0.36%-0.48%, with a case underreporting factor ranging from 25-30 and a death underreporting factor ranging from 4-12. This implies, by June 30, 2021, India may have seen nearly 900 million infections and 1.7-4.9 million deaths when the reported numbers stood at 30.4 million cases and 412 thousand deaths (*covid19india.org*) with an observed case fatality rate (CFR) of 1.35%. We reiterate the need for timely and disaggregated infection and fatality data to examine the burden of the virus by age and other demographics. Large degrees of nationwide and state-specific death undercounting reinforce the call to improve death reporting within India.

## INTRODUCTION

The second wave of SARS-CoV-2 in India–a country broaching 1/5^{th} of the world population, wrought a devastating toll of an astronomical 414 thousand daily reported infections and 4.5 thousand reported daily deaths from COVID-19 (*1*) at its peak in May of 2021, leading to a collapse of healthcare infrastructure (*2*). Concerns have been raised regarding the inadequacy of the healthcare systems of adjacent countries in this region (*3*), as well as tragically realized amidst recent surges of SARS-CoV-2 in some of these countries (*4*).

It is now well-known that only a fraction of SARS-CoV-2 infections are captured, stemming from a large degree of covert infections, access, willingness and availability of testing, and sometimes a desire to maintain public image (*5*) (*6*). Similarly, due to incomplete and lagged reporting, lack of medical certification and misclassification of the cause of deaths, not all COVID-19 related deaths have been captured correctly (*7*) (*8*) (*9*). Infection fatality rates (IFR) are measured as the ratio of the total number of deaths to the total number of infections over a given period. Serological surveys, with rigorous sampling design and cogent analysis, can provide an estimate of the prevalence of antibodies (Ab) formed in response to a past natural infection, and thereby an estimate of the total number of infected cases in an unvaccinated study population, which forms the basis for the denominator of the IFR measure. With introduction of vaccines, seropositivity could arise both from vaccines or from past infections and thus such studies are less informative about the denominator in an IFR calculation. Excess deaths calculation using the death data released during the pandemic period can give us an idea on the number of COVID-19 deaths we failed to report or the extent of death underreporting (*7*) (*9*) (*10*). Compartmental epidemiological models, including extended versions of the susceptible-exposed-infected-removed (SEIR) model and renewal process models can also estimate the total number of latent infections and deaths by making stringent assumptions needed to identify certain key parameters (*11*) (*12*) (*13*).

From a global lens, several systematic reviews have examined this integral measure of SARS-CoV-2 mortality, the IFR, at various points of the pandemic. As of 16 June 2020, a systematic review and meta-analysis of 24 studies estimated a pooled **worldwide** IFR of 0.68% (95% CI: 0.53 – 0.82), while noting considerable heterogeneity among the included studies and being unable to adjust for age, comorbidities, and other demographic factors (limited by the availability of data) (*14*). Other efforts to systematically investigate the infection fatality rates of SARS-CoV-2 include a review and meta-analysis of age-specific IFRs from 27 studies among OECD countries (as of 18 September 2020), which concluded that the IFR increases substantially with age estimating an age-specific IFR of 0.002% among children aged <10, 0.01% among younger adults aged ≥25, 0.4% among adults aged ≥55, 4.6% among adults aged ≥65, and up to 15% among adults aged ≥85 (*15*). Limited to non-developing countries, this review estimated that 90% of the variation in IFR estimates is explained by the age make-up of the underlying populations (*15*). Regarding reconciling SARS-CoV-2 seroprevalence, a key ingredient to know IFR estimates, a systematic review and meta-analysis (as of 14 August 2020) of 47 studies (encompassing 23 countries worldwide) found a worldwide seroprevalence estimate of 3.38% (95% CI: 3.05-3.72) with a range of 0.37% (Malaysia) to 22.1% (Iran) among the included studies, and reported considerable overall heterogeneity among the included studies as well as regional variation in the effect size with 1.45% (95% CI: 0.95-1.84) for South America to 5.27% (95% CI: 3.97-6.57%) for Northern Europe (*16*). India was not examined in these multi-country systematic reviews of SARS-CoV-2 IFR and seroprevalence, and nor were any of its neighboring countries.

Concerning India and its neighboring countries (i.e., Bangladesh, Nepal, Pakistan, and Sri Lanka), certain features common in this region are distinct from the countries considered in the above reviews and thereby beckon a comprehensive investigation of the SARS-CoV-2 fatality within these countries. First, the age structures of these countries are relatively young in comparison to higher income countries (as of 2020, the estimated median age in years are the following: India 28.4, Bangladesh 27.6, Nepal 24.6(*20*), Pakistan 22.8, Sri Lanka 34.0 versus the United Kingdom 40.5, Italy 47.3, Germany 45.7, and the United States 38.3 (*17*)). Considering this difference in the population age composition, a meta-analysis of SARS-CoV-2 infection fatality rates and seroprevalence would aid in discerning the values of these measures specific to South-East Asia. We note that, at the time of this report, no systematic review has examined the existent studies on IFRs within the region encompassing and surrounding India.

Here, we perform a systematic review of existent literature on SARS-CoV-2 infection fatality rates (IFRs) in India and India’s neighboring countries of Bangladesh, Nepal, Pakistan, and Sri Lanka, as of 3 July 2021, with verification of search thoroughness extending to 15 August 2021. Due to the paucity of data and availability of relevant studies in the adjacent countries and considering the intensity of the latest SARS-CoV-2 outbreak in India as detailed above, we focused on India for the quantitative synthesis. Through a meta-analysis of SARS-CoV-2 IFR estimates, we examine the nationwide, regional, and state-specific estimates and heterogeneity over the course of the pandemic within India. As an alternative approach to empirical evidence synthesis and meta-analysis, we use an extended susceptible-exposed-infected-removed (SEIR) model to present updated estimates of SARS-CoV-2 infection fatality rates with data through June 30, 2021, as well as underreporting factors for cases and deaths, corresponding to the first and second waves, as well as cumulative, for COVID-19 in India. This paper concludes with a discussion of takeaways from the systematic review and meta-analysis, as well as from the updated model-based results. We highlight some immediate and future considerations regarding revising the death reporting system in India and why capturing deaths are important for the health and future of the living.

## METHODS

### Terminology and Definitions

Below we provide the definitions, with formulas as needed, of the various terms used throughout this systematic review and meta-analysis.

** Seroprevalence survey (i.e., serosurvey)** in the context of SARS-CoV-2 are large-scale studies aimed at estimating the true prevalence of SARS-CoV-2 (i.e., the cumulative percent infected over a period of time) within the target population utilizing serology testing for IgG or IgM antibody presence (

*18*) scaled to the level of a geographic location (e.g., nationwide, statewide, citywide, or districtwide), a community, or a smaller group. Study designs of serosurveys commonly involve population-based sampling methods (e.g., multi-stage cluster sampling, probability proportionate to size, etc.) applied to obtain a randomly selected, representative sample.

** Case Underreporting Factor (URF (C))** is defined as the estimated total cumulative infections divided by the reported (i.e., observed) cumulative cases at the indicated date.
The numerator (estimated total cumulative infections) is estimated by the population of the target geographic region from which samples were selected, multiplied by the seroprevalence, as sourced from serosurveys. The population estimate is age-adjusted according to the age cut-off for the underlying study by multiplying the total population estimate of the study area by the proportion of the population in the age range included in the study sample.

** Excess Deaths** is defined as the absolute difference between the observed all-cause mortality in a specific time period and the expected all-cause mortality in the same period. When calculating excess deaths, the expected all-cause mortality is commonly projected using prior years of data, an average of prior years, or an alternative benchmark, as it largely depends on the availability and continuity of all-cause mortality data within the location of study.

**is defined as the estimated total cumulative deaths divided by the reported (i.e., observed) cumulative deaths at the indicated date, as collected 14 days after the date indicated. The numerator of the URF (D) (i.e., estimated total cumulative deaths) is the location and time period-specific excess deaths estimate (available at the time of this report), as sourced from media reports and studies on excess deaths. Using all-cause-mortality to calculate excess deaths over a long period has the limitation that the numerator often is not specific to COVID-19 disease. Some model-based studies have alternative ways of estimating this fraction using assumed values of COVID-19 fatality rates.**

*Death Underreporting Factor (URF (D))*** Case Fatality Rate (CFR)** is defined as the reported (i.e., observed) cumulative deaths divided by the reported/observed cumulative cases. Reported cumulative deaths are routinely collected 14 days after the date indicated to allow for fatality delay between symptom onset and death. Whereas reported cumulative cases are collected on the date indicated, as a conservative approach to capturing the cumulative infections up to the date indicated. Further details on the intuition behind the assumption of a 14-day fatality lag can be found in Supplementary Appendix F.

**is defined as the reported (i.e., observed) cumulative deaths (collected 14 days after the indicated date) divided by the estimated total cumulative infections. In other words, IFR**

*Infection Fatality Rate*_{1}(IFR_{1})_{1}is an estimate of the true fatality rate for which the uncertainty in the denominator is adjusted but not the uncertainty in the numerator, namely, death undercounting is not accounted for. In our meta-analysis, for studies without precalculated IFR

_{1}, the denominator (estimated total cumulative infections) is computed as the seroprevalence multiplied by the population estimate of the study location. Again, the population estimate is age-adjusted, according to the age inclusion criterion of the study design, by multiplying the population number by the percent of the population in the specified age range.

** Infection Fatality Rate_{2} (IFR_{2})** is defined as the estimated total cumulative deaths (14 days after the indicated date) divided by the estimated total cumulative infections. In other words, IFR

_{2}is an estimate of the true fatality rate for which both the uncertainty in the denominator and the uncertainty in the numerator (i.e., death undercounting) are adjusted. For studies without precalculated IFR

_{2}, the denominator (estimated total cumulative infections) is the same as the denominator in IFR

_{1}. The numerator (estimated total cumulative infections) is computed as the reported (i.e., observed) cumulative deaths (collected 14 days after the indicated date) multiplied by the death underreporting factor, as either model-derived or sourced directly from media reports and studies on excess deaths.

### Identification of Studies for Systematic Review and Meta-analysis

The present systematic review and meta-analysis was conducted following the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) statement and guidelines (*19*) (see Supplementary Appendix A for the completed PRISMA checklist). Two reviewers developed the search strategy with the expertise and guidance of information specialists from the University of Michigan Taubman Health Sciences Library. On July 3, 2020, we performed the search for relevant articles publicly disseminated from January 1, 2020 onwards and conducted in any language within the databases PubMed, Embase, World Health Organization’s database Global Index Medicus, as well as BioRxiv, MedRxiv, and SSRN through isearch, an interface that houses SARS-CoV-2 related reports curated by the National Institute of Health (a complete search strategy, including search terms, Boolean operators, and applied filters, for each database is published in Supplementary Appendix B). Up until August 15, we verified that no relevant studies were missed by manually combing through studies referenced in media reports, government press releases, and any additional preprints or reports published after July 3 and through August 15, 2021.

For the narrative review, eligible studies included a nationwide, regional (e.g., state, union territory, city, district, or slum designated area, etc.), or age/sex-specific measure of seroprevalence or infection fatality rate (IFR) for India or adjacent countries of Bangladesh, Nepal, Pakistan, and Sri Lanka. Consistent with a systematic review that examines international infection fatality rates of SARS-CoV-2 among OECD countries (i.e. non-developing countries) (*15*), studies were excluded in the full-text screening (a) if the studies examined narrower subpopulations of the location of interest (e.g., healthcare workers, HIV positive adults, or industrial workers, etc.), (b) if the studies recruited participants from certain facilities (e.g., blood donors, dialysis centers), (c) if the studies considered individuals at hospitals, urgent care centers, intensive care units (ICUs), or tertiary care centers, (d) if the studies performed active recruitment of subjects (i.e. participants volunteered), and (e) if the studies limited their cohort to asymptomatic or symptomatic individuals. These criteria are used to identify studies both with representative samples of the general population of interest and with measurements of the outcomes that reflect seropositivity of SARS-CoV-2 regardless of virus manifestation (or irrespective of clinical presentation and presence of symptoms).

For the meta-analysis of infection fatality rates (IFRs), studies were included if (in addition to meeting the eligibility criteria for inclusion to the qualitative review) (a) 95% confidence interval was available for either the seroprevalence and/or the precalculated IFR, when available, and (b) data on COVID-19 confirmed fatalities was available for the study location from *covid19india.org* (*20*) at the time of this review. Additionally, for studies that do not provide a precalculated IFR_{2} (i.e., do not provide an IFR estimate adjusted for death underreporting), a study was excluded from IFR_{2} examination if no excess deaths or death underreporting factors were available (at the time of this review) for the location of interest (e.g., district or city) in media reports or excess deaths focused publications. Namely, this report utilizes excess deaths from three sources for the nationwide meta-analysis of IFR_{2} (*21*) (*22*) (*23*) and the following sources for the regional and state-level IFR_{2} analysis: for cities and districts Ahmedabad (*22*), Bangalore Rural District (*22*), Chennai (*22*) (*24*), Indore (*22*), Mumbai (*25*) (*22*) (*26*), Ujjain (*22*), and for states Delhi (*27*) (*13*) (*22*), Karnataka (*28*) (*29*), Tamil Nadu (*22*) (*28*). Table A below summarizes the sources used in nationwide excess deaths studies (*21*) (*22*) (*30*) (*23*), as well as in the Data Development Lab (*31*), an interactive open source data platform recently made available for India.

### Two-Step Screening and Data Abstraction

Citation titles and abstracts were screened independently by two reviewers for verifying the inclusion and exclusion criteria. For articles not satisfying the exclusion criterion, the full text was accessed, and the two screeners separately reviewed the methods and results sections of these studies for final inclusion into this systematic review and meta-analysis. The two reviewers subsequently extracted data from the included articles on the following items: (a) study design, location, and time period, as well as inclusion criteria, (b) sample size denoting the number of participants in the study, (c) descriptive statistics of the proportion of the study cohort male versus female and the mean age of the study cohort, (d) estimate of infection fatality ratio (IFR) or seroprevalence for SARS-CoV-2 and corresponding 95% confidence intervals, when available, (e) any statistical sensitivity tests performed.

### Statistical Meta-analysis

A detailed description of the meta-analysis framework is presented in Supplementary Appendix F. To perform the meta-analyses presented herein, the R package *meta* (*33*) was utilized and the R package *metafor* (*34*) was employed to perform the included sensitivity analyses.

In preparation for the meta-analysis, we extract the IFR or seroprevalence estimates, as well as the provided 95% CI, from each study corresponding to the study’s intended overall population. For seroprevalence studies without precalculated IFR_{1}, we collect data on **(a)** COVID-19 confirmed fatalities from *covid19india.org* (*20*) 14 days after the end date of the study period (as previously reasoned), **(b)** 2019 projected population estimates from the World Bank for India (*35*), 2019 projected population estimates from the 2011 census for states (*36*), and 2011 census population estimates for districts and cities (limited by available projections) (*36*), **(c)** 2011 census information on age demographics for the study location (*37*), and **(d)** underreporting factors, as derived or directly reported from media reports and studies on excess deaths for the nation, state, district, or city were collected nearest to the end date of when the study was conducted. With these collected data items, for studies without a precalculated fatality rate, we compute IFR_{1} and IFR_{2} following the definitions provided in the ** Terminology and Definitions** section above. We note that for smaller regional denominations (e.g., cities or districts) for which data on excess deaths were unavailable, we use the statewide death underreporting factor (if available) in computing IFR

_{2}.

For the meta-analysis itself, we estimate three versions of SARS-CoV-2 pooled IFRs– namely (a) IFR_{1}, (b) IFR_{2} with lower limit of the reported estimates of URF (D), and (c) IFR_{2} with higher limit of the reported estimates of URF (D)– for each sub-analysis. First, using a random effects model with inverse variance method and DerSimonian-Laird (DL) estimation, we estimate the nationwide SARS-CoV-2 pooled IFRs, as well as 95% confidence intervals, for India across all available national IFR datapoints (both precalculated and computed herein) included in this review. Next, a regional analysis of IFRs is performed considering India partitioned into the following regions: North, Northeast, East, South, West, and Central. Regions with at least one available eligible study are examined. To compute the regional SARS-CoV-2 pooled IFRs, we first pool the IFR estimates from statewide, citywide, and districtwide studies associated with each state, using the same random effects estimation approach. Then, the regional SARS-CoV-2 pooled IFRs are approximated with the statewide pooled IFR estimates, using the analogous random effects model with the inverse variance approach and DL estimator.

In both the nationwide and regional analyses, **heterogeneity** is assessed using the *I ^{2}* statistic, where a

*I*of 50% to 90% denotes substantial heterogeneity and

^{2}*I*≥ 75% (and up to 100%) indicates considerable heterogeneity among the included studies, as recommended by Cochrane guidelines (

^{2}*38*). For the regional analysis, we interpret the

*I*with caution as we do expect there to be variation in the true effect size among the included studies, as these studies cover varying states and IFRs have been found to vary by state (

^{2}*13*).

To assess **publication bias,** we produce a funnel plot to examine the distribution of p-values and 95% confidence intervals, as well as perform the Egger’s and Begg’s tests to formally test for funnel plot asymmetry. To assess the **risk of bias** in the seroprevalence estimates across the included studies, we utilized the Joanna Briggs Institute (JBI) approach (*39*) (*40*). Upon completing the series of questions for each study, the tool provides a rating of risk of bias (low/ moderate/ high) based on the sampling design and data collection methods. We answer each question according to the provided values (“Yes”/ “No”/ “Unclear”/ “Not applicable”). Remarks are included with respect to the IFR estimates within the returned strata of the risk of bias.

### Model-based Estimation of Fatality Rates in India during Wave 1, Wave 2, and combined over waves from April 1, 2020 to June 30, 2021

The present study follows the same methodology as the previous versions (*13*). In brief, we employ a compartmental epidemiologic model, the extended susceptible-exposed-infected-removed (SEIR-*fansy*), to model the transmission dynamics of SARS-CoV-2. Accounting for the false negative rates of the SARS-CoV-2 antigen test type (e.g., RT-PCR and rapid antigen tests) observed in India, we include unascertained cases and deaths as a compartment. As depicted in Supplementary Appendix K Figure S1, the population is further divided into 10 disjoint compartments: S (Susceptible), E (Exposed), T (Tested), U (Untested), P (Tested Positive), F (Tested False Negative), RR (Reported Recovered), RU (Unreported Recovered), DR (Reported Deaths) and DU (Unreported Deaths). A full description of the methodological framework is available in Supplementary Appendix K. Nine differential equations form the basis for imitating the spread of SARS-CoV-2. Parameters and 95% credible intervals (CrI) are estimated using Bayesian techniques by performing Metropolis–Hastings algorithm-based random sampling from the posterior distribution with a Gaussian proposal density.

Considering the inherent differences between the first and second waves of the pandemic (e.g., the strain on the healthcare infrastructure and dearth of medical resources in wave 2), we examine the entailed epidemiological measures for wave 1 and wave 2 separately, with wave 1 as April 1 2020–January 31, 2021 and wave 2 as February 1, 2021–June 30, 2021 (latest until this report). The start date of wave 2 of February 1, 2021 is reasoned as two weeks before the national effective reproduction number crossed unity on February 14, 2021 to allow for time between virus exposure and symptom onset. Daily infected cases, recovered cases, and deaths are collected from *covid19india.org* (*20*). We compute nationwide and state-wise fatality rates and case–as well as death– underreporting factors for both wave 1 and wave 2. Additionally, we estimate cumulative infected case and death counts across waves 1 and 2 adjusting for the respective underreporting factors. To obtain the combined number of cases across both waves, we multiply the wave-specific infections with the underreporting estimate and sum across the two waves. Similarly, to obtain the combined number of deaths across both waves, we compute the combined number of deaths by multiplying the wave-specific deaths with the underreporting estimate and sum across the two waves. Lastly, an overall infection fatality rate (IFR) for India as of 30 June 2021 is provided as the estimated cumulative number of deaths divided by the estimated cumulative number of infections.

## RESULTS

### Search Results and Included Studies

As shown in Figure 1 below, the search resulted in a total of 5174 citations across the four search engines: PubMed (2940), Embase (1119), WHO Global Index Medicus (Southeast Asia Region: 5, Eastern Mediterranean Region: 2), and isearch (1078). After removing duplicate citations across databases, 4765 articles were screened in the initial level of title/abstract, and we reviewed 137 articles in the second level of full-text screening. Of these studies, 61 studies were excluded as no seroprevalence, nor infection fatality rate estimates were presented. Moreover, 18 studies were excluded as the cohort consisted of healthcare workers, 15 studies were excluded as the cohort consisted of patients from hospitals, urgent care units, or tertiary care centers, 5 studies were excluded because the cohorts were limited to either asymptomatic (4 studies) or symptomatic (1 study) individuals, 4 studies were excluded as the cohort consisted of working individuals, 2 studies were excluded as the cohort consisted of blood donors, 2 studies were excluded as the cohort was sampled from testing centers, 1 study was excluded because the cohort consisted of members from elderly care centers, 1 study was excluded as the participants were actively recruited. Overall, across these subpopulations, 48 studies were collectively excluded for the reason of having narrow or non-representative samples, which in turn impedes generalizability to the broader target population within the location of study.

### Qualitative Review

#### Locations of included articles

Of the 37 articles included in the narrative review, 2/5 countries originally in the search (namely, India and Pakistan) were present in these articles. In other words, no eligible studies were available for Bangladesh, Nepal, nor Sri Lanka. The predominant country studied was India (35/37 included articles) with Pakistan being the other country (2/37 included articles). In the initial screening, nearly all citations were written in English with 1 citation written in French. In the full-text screening, all articles were written in English.

#### Study designs, populations, and objectives

As shown in Supplementary Appendix C Table 1, among the 37 included articles, 27/37 were serosurvey studies (25 serosurveys for India (*42*) (*43*) (*44*) (*45*) (*46*) (*47*) (*48*) (*49*) (*50*) (*51*) (*52*) (*53*) (*54*) (*55*) (*56*) (*57*) (*58*) (*59*) (*60*) (*61*) (*62*) (*63*) (*64*) (*65*) (*66*) and 2 serosurveys for Pakistan (*67*) (*68*)) and 10/37 were other study designs (*69*) (*70*) (*71*) (*72*) (*73*) (*74*) (*13*) (*12*) (*75*) (*76*), such as extended compartmental epidemiological model-based approaches.

The age distributions of the study populations reflect the younger populations in India and Pakistan with 6/11 studies (that reported the study age composition) having a mean age of ≤24 to <40 years and 5/11 studies with a mean age of ≤40 to <50 years (Supplementary Appendix C Table 1). Regarding the participation rates by gender, out of the 19 studies (that reported the study gender composition) 5 studies had a considerably larger proportion of male than female participants, whereas 3 studies had a heavier percentage of female than male participants (Supplementary Appendix C Table 1). The remaining 11/19 had a comparable percentage of male versus female participants.

A nationwide target population accounted for 10/37 studies, namely, 8 for India (*59*) (*60*) (*61*) (*62*) (*74*) (*13*) (*12*) (*75*) (*76*) and 1 for Pakistan (*68*). The predominant population of interest was cities with 19/37 studies focusing on municipal target populations and with the following counts of studies per city: 6 Delhi–India (*47*) (*48*) (*49*) (*50*) (*70*) (*71*), 4 Mumbai–India (*56*) (*57*) (*58*) (*73*), 1 Ahmedabad–India (*42*), 2 Bhubaneswar–India (*44*) (*45*), 2 Chennai–India (*46*) (*69*), 1 Indore–India (*55*), 1 Pimpri-Chinchwad–India (*63*), 1 Ujjain–India (*66*), and 1 Karachi–Pakistan (*67*). The next highest population of interest was state and union territories with 6/37 studies focusing on statewide generalizations and the following counts of studies per state or union territory: 3 Karnataka–India (*52*) (*53*) (*72*), 1 Kashmir–India (*54*), 1 Puducherry–India (*64*), and 1 Tamil Nadu–India (*65*). The prevalence of SARS-CoV-2 in slum dwellings was considered in 8/37 studies with the following counts of studies per slum area: 3 slum sections within Mumbai–India (*56*) (*57*) (*73*), 1 slum sections within Matunga, Chembur West, and Dahisar wards in Mumbai– India (*58*), 1 Devarajeevana Halli (DJ Halli) slum in Bengaluru–India (*51*), 1 slum sections within Pimpri-Chinchiwad–India (*63*) and 2 nationwide slum prevalence approximations from the 2^{nd} and 3^{rd} nationwide serosurveys for India (*60*) (*61*). Seroprevalence of SARS-CoV-2 within rural districts versus urban was examined in 9/37 included studies and with the following counts per area: 1 Bangalore Rural District (with subdistricts–otherwise referred to as taluks of Devanahalli, Doddaballapura, Hoskote, Nelamangala)–India (*43*), 1 Delhi–India (*50*), 1 Karnataka–India (*52*), 1 Kashmir–India (*54*), 1 Puducherry–India (*64*), 1 Tamil Nadu–India (*65*) and 3 nationwide–India (*59*) (*60*) (*61*).

#### SARS-CoV-2 surveillance measurements and levels of granularity

As presenting either an IFR or seroprevalence estimate was an inclusion criterion, all 37 included studies estimated one or both of these measures. The breakdown of measures examined was as follows: 8/37 studies presented only IFR estimates without estimating seroprevalence, 15/37 studies presented only seroprevalence estimates without computing IFRs, and 14/37 studies reported both estimated seroprevalence and IFRs. Among the included studies, seroprevalence was estimated at the following levels: overall region, subregion (e.g., district/ward), type of residence (e.g., urban/rural/slum/non-slum), age, sex, age-sex, and for select comorbidities (e.g., diabetes and hypertension). In contrast, IFRs were estimated largely at the overall level with just 2 studies presenting age-sex stratified estimates (1 serosurvey for Tamil Nadu (*65*) and 1 other study design for Karnataka, Mumbai and Bihar migrants (*72*)).

#### Remarks on excluded articles

The prevailing reason for exclusion of studies in the full-text screening was that the article did not provide concurrent prevalence estimate nor an IFR (61/109). Of these 61 studies presenting other epidemiological measures, a large number examined CFRs. Additionally, a considerable number of studies (18/109) were excluded that examined the subpopulation of healthcare workers and with the following counts of healthcare worker studies per country: India (16) and Pakistan (2). The third notable exclusion criterion was studies that examined hospital, urgent care, or tertiary care centers (15/109), which resulted in the following counts of patient or hospitalized studies per country: India (11), Pakistan (2), Bangladesh (1), and Nepal (1). A complete list of studies excluded with reasonings is presented in the Supplementary Appendix D, as well as a summary table synthesizing the total number of studies excluded per eligibility criterion by country (see Supplementary Appendix E).

### Quantitative Summary for India: Meta-analysis of Infection Fatality Rates (IFR)

Of the 37 eligible studies in the narrative review, 19 and 17 studies were included in the meta-analysis of IFR_{1} and IFR_{2}, respectively. Supplementary Appendix G lists the studies that were not able to be included in the meta-analysis and each respective reason for exclusion. To summarize, the reasons for exclusion include: (a) did not provide a 95% confidence interval for the seroprevalence and/or IFR estimate, (b) fatality data were unavailable for the study location (e.g., city or select districts) from *covid19india.org* (*20*), (b) excess deaths data were unavailable at the state level for the study location in order to derive the death underreporting factor, (d) studies in Pakistan as previously discussed the meta-analysis focuses on India, and (e) studies that stratify by slum versus non-slum and do not provide an overall estimates are excluded, as individuals living in slum areas tend to be of low socioeconomic status relative to persons in non-slum areas.

#### Summary of Nationwide Underreporting of Infections

As shown in Table 1 below, the breadth of case underreporting countrywide has remained high in wave 2 compared to wave 1, as is evidenced by the dip in case underreporting factors (URF) from 29.1 to 14.3 in the 1^{st} to the 2^{nd} nationwide serosurveys compared to the consistency in high case URFs from 25.7 to 24.9 in the 3^{rd} and 4^{th} nationwide serosurveys, respectively, in India (*59*) (*60*) (*61*) (*62*). Please note that the 4^{th} serosurvey did include vaccinated individuals, but by June 30, only roughly 5% of the Indian population was fully vaccinated and under 20% were partially vaccinated, so the estimates reflect largely antibodies from past infections but are likely overestimates.

#### Background on Nation and State Level Excess Deaths

As shown in Table 2, five recent studies have provided nationwide excess deaths estimates for India, as of June 2021 (*21*) (*32*) (*22*) (*30*) (*23*). Of the three studies that examine excess deaths in India broken out by waves 1 and 2 (*21*) (*22*) (*23*), excess deaths are generally found to be considerably higher in wave 2 than wave 1 with the exception of the Anand et al., 2021 study (*23*), where excess deaths decrease from 3.4 (wave 1) to 1.5 million (wave 2). We note that the time period of the second wave in this study (*23*) extends into wave 2, as the national effective reproductive number crossed unity on February 14, 2021 in India, which may in part explain the high excess deaths estimate comparable to the other two studies. The degree of nationwide death underreporting has remained high from waves 1 to 2 with the Wave 1 URF (D) ranging from 3.8-19.6 and the Wave 2 URF (D) ranging from 4.0-11.2 (Table 2). Up to June 2021, the cumulative excess death estimates across waves 1 and 2 for India range from 1.79 million (*21*) to 4.9 million (*23*), suggesting a combined nationwide URF (D) of 4.4-11.9.

As shown below in Table 3, the statewide excess deaths available at the time of this report largely show a similar story of increased excess deaths in wave 2 relative to wave 1, corroborating the severe toll of the second wave of the pandemic on India. In parallel, the state-specific death URFs are consistently higher in wave 2 compared to wave 1 (Table 3), actualizing the widely discussed absence and deterioration of timely and accurate death reporting in wave 2 in various states in India. The exception to this assertion is Kerala where the wave-specific death URFs are not meaningful different and excess deaths have remained low or albeit negative (Table 3). In wave 1, the two states with the highest death URFs are Assam (14.1-19) and Andhra Pradesh (9.1-11.1), whereas the two states with the lowest death underreporting that in fact exhibited negative excess deaths compared to prior year(s) are Gujarat (−70 thousand deaths) and Uttar Pradesh (−37 thousand deaths). In wave 2, the two states with the largest death underreporting are Madhya Pradesh (23.8-42.4) and again Andhra Pradesh (24.6-36.2). The two states with the lowest death URFs in wave 2, where excess deaths were lower than the number of reported COVID-19 deaths, are Kerala and Delhi. It is important to recognize that the timing of the onset of the second wave differs by state and city, wherein Table 3 the categorization of waves is based on the timing of the second wave nationwide in India.

#### Nationwide IFR_{1} and IFR_{2}

Figure 2 presents the results of the nationwide meta-analysis of IFR_{1} for India. Using a random effects approach with DL estimation, the estimated nationwide pooled SARS-CoV-2 IFR_{1} for India is 0.097% (95% CI: 0.067 – 0.140). This suggests a nationwide SARS-CoV-2 fatality ratio for India of 97 deaths per 100,000 infections, when not accounting for underreporting of deaths. As seen in Figure 2, there is considerable variation in the time periods encompassed in the studies belying this overall estimate. The range of IFR_{1} (from highest to lowest) among the included studies is 0.199% (4 June 2020) to 0.055% (6 July 2021).

Figure 3 shows the results of the nationwide meta-analysis of IFR_{2} for India. Using a random effects approach with DL estimation, the estimated nationwide pooled SARS-CoV-2 IFR_{2} for India ranges from 0.365% (95% CI: 0.264 – 0.504) to 0.485% (95% CI: 0.344 – 0.685), when applying the lower and higher ends of the uncertainty bracket for death underreporting, respectively. This can be interpreted as a nationwide SARS-CoV-2 fatality ratio for India of 365-485 deaths per 100,000 infections, when further adjusting for underreporting of deaths.

A detailed forest plot of the nationwide seroprevalences of SARS-CoV-2 in India, as well as the accompanying 95% confidence interval, underlying the computed nationwide IFRs, as directly reported in the included studies, can be found in Supplementary Appendix H Figure 1.

With respect to observed time trends from this meta-analysis, Table 4 includes a comparison of the case fatality rate (CFR) to IFR_{1} and IFR_{2} in waves 1 versus combined across waves 1 and 2. Both the CFR (i.e., not accounting for case nor death underreporting) and IFR_{1} (i.e., not accounting for death underreporting) have decreased overtime. Specifically, the nationwide CFR has decreased from 5.56% in June 2020 to 1.36% in July 2021 (Table 4). The nationwide IFR_{1} has also largely been decreasing over time from 0.199% (4 June 2020) to 0.055% (6 July 2021), based on estimates (both precalculated and computed) from included studies (Table 4). Whereas for IFR_{2} (i.e., when adjusting for both case and death undercounting) this trend is not readily apparent. As a whole, IFR_{2} has changed overtime between 0.290% (4 June 2020) to 0.241-0.651% (6 July 2021), oscillating between 0.241% (6 Jan 2021) to 0.916-1.316% (4 June 2020), as detailed in Table 4. The absence of this trend for IFR_{2} reconciles with the large excess deaths estimates observed for wave 2 compared to wave 1 (Table 2) and, in particular, the increase in the nationwide death URFs of 3.8-4.2 (wave 1) to 10.1-11.2 (wave 2) (Table 2), as derived from the excess deaths estimated in Deshmukh et al., 2021 (*22*).

##### Regional and Statewide IFR_{1} and IFR_{2}

Detailed estimates of regional pooled IFR_{1} and IFR_{2} of SARS-CoV-2 in India and 95% confidence intervals (CI), as well as the statewide pooled IFR_{1} and IFR_{2} estimates inputted for each region, are presented in Figures 4 and 5, respectively. Using the same random effects model with DerSimonian-Laird estimation as in the nationwide analysis, the regional pooled IFR_{1} is estimated to be the following (as ordered from highest to lowest pooled effect size and reporting only regions with more than one state): East 0.136% (95% CI: 0.062 – 0.302), West 0.125% (95% CI: 0.052 – 0.303), North 0.109% (95% CI: 0.037 – 0.325), and South 0.062% (95% CI: 0.023 – 0.164) (Figure 4). Hence, East and West India have the largest estimated IFR_{1} of 0.136% and 0.125%, respectively, suggesting estimated infection fatality rates (when not adjusting for underreporting) for SARS-CoV-2 of 136 deaths per 100,000 persons and 125 deaths pers 100,000 persons in these regions, respectively. The estimate for the East is largely driven by the comparatively high statewide IFR_{1} estimate for West Bengal of 0.322% (95% CI: 0.290 – 0.357), whereas the estimate for the West is driven by Maharashtra (0.244%) (Figure 4). The lowest IFR_{1} estimate is for South India of 0.062 (95% CI: 0.023 – 0.164), backed by the Andhra Pradesh and Karnataka which have the lowest statewide pooled IFR_{1} of 0.019% and 0.019%, respectively (Figure 4). Also considering regions with 1 state, pooled regional IFR_{1} estimates range from 0.062 for South to 0.174% for Central.

We also find considerable heterogeneity exists even within each respective region in India. As examples, the statewide pooled IFR_{1} varies in the North from 0.043% (Jammu and Kashmir) to 0.362% (Punjab), in the South from 0.019% (Andhra Pradesh and Karnataka, respectively) to 0.162% (Puducherry), and in the West from 0.051% (Goa) to 0.244% (Maharashtra).

Using the same random effects model with DL estimation, the range of the regional pooled IFR_{2} is estimated to be the following (as ordered from highest to lowest pooled effect size and considering only regions with >1 state): North 0.477% (95% CI: 0.250 – 0.911) to 0.477% (95% CI: 0.250 – 0.911), West 0.396% (95% CI: 0.294 – 0.534) to 0.399% (95% CI: 0.297 – 0.535), East 0.304% (95% CI: 0.153 – 0.605) to 0.304% (95% CI: 0.153 – 0.605), and South 0.191% (95% CI: 0.165 – 0.223) to 0.205% (95% CI: 0.175 – 0.241) (Figure 5). Hence, North and West India have the largest estimated IFR_{2} of 0.477-0.477% and 0.396-0.399%, respectively, indicating estimated infection fatality rates (when adjusting for death underreporting) for SARS-CoV-2 of 477 deaths per 100,000 persons and 396-399 deaths per 100,000 persons in these regions, respectively. The Northern IFR_{2} estimate is largely driven by the comparatively high statewide IFR_{2} estimate for Punjab (1.010%) which has the second highest statewide pooled IFR_{2} (after Madhya Pradesh at 1.061%) of all states encompassed among the included studies, whereas the estimate for the West is driven by Gujarat (0.607%) (Figure 5). The lowest IFR_{2} estimate is again for South India of 0.191-0.205% (among regions with >1 state), backed by Karnataka and Kerala which have the lowest statewide pooled IFR_{2} of 0.067-0.083% and 0.144%, respectively (Figure 5). In sum, when considering regions with 1 state, pooled regional IFR_{2} estimates range from 0.168% for Northeast to 1.061% for Central.

A detailed forest plot of the state, city and district-specific seroprevalences of SARS-CoV-2 in India, as well as the accompanying 95% confidence interval, belying the computed regional and statewide IFRs, as directly reported in the included studies, can be found in Supplementary Appendix H Figure 1.

### Key Findings

✤

**Regional pooled IFR1 estimates within India range from a low of 0.062% in the South to a high of 0.136% in the East. Regional pooled IFR2 estimates range from [0.191 – 0.205%] in the South to [0.477%] in the North.**✤

**Considerable within region heterogeneity is also observed, as pooled state specific IFR1 and IFR2 estimates are evidenced to vary within regions.**

We present a funnel plot of the standard error of the effect sizes of the included studies in Figure 6 as a standard method for examining publication bias. Although the results show asymmetry, in the context of seroprevalence proportions from serosurveys, which comprise the bulk of the study designs, this does not necessarily indicate the presence of publication bias (*79*). Instead, we suspect that the asymmetry is due to heterogeneity in the actual effect sizes of the prevalence of SARS-CoV-2 among the populations captured by the included studies. A detailed assessment of publication bias is presented in Supplemental Appendix J. The results of the Joanna Briggs Institute (JBI) assessment for risk of bias within each included study can be found in Supplemental Appendix I.

Data files and code used to produce the above results are available on GitHub at the following repository: https://github.com/lzimmermann4/india_c19_ifrs.

### Updated Model-based estimates of Wave 1 and Wave 2 Fatalities

Since wave 1 is defined analogously as in the latest report (*13*) and the underlying data are the same, the wave 1 estimates presented herein are unchanged. Briefly, when adjusting the denominator by the estimated cases underreporting factor of 11.11 (95% CrI 10.71 – 11.47), the nationwide IFR_{1} estimate (defined as observed cumulative deaths/estimated total cumulative infections) for wave 1 is 0.129% (95% CrI 0.125 – 0.134). When further adjusting the numerator by the estimated death underreporting factor of 3.56 (95% CrI 3.48 – 3.64), we obtain a nationwide IFR_{2} estimate (i.e., estimated cumulative total deaths/estimated cumulative total infections) for wave 1 of 0.461% (95% CrI 0.455 – 0.468). The wave 1 IFR_{2} estimate is largely congruent to the estimated effect sizes from the latest nationwide serosurvey in India (*59*) in that it falls within the bounds of the computed IFR_{2} herein for the 4^{th} nationwide serosurvey of 0.241-0.651%, when applying a death underreporting factor range of 4.4 (1.79 million excess deaths (*21*)) to 11.9 (4.9 million excess deaths (*23*)).

Since wave 2 data extend to June 30, 2021, using the same compartmental epidemiological model we estimate case and death underreporting factors of 13.3 (95% CrI 11.4 – 14.6) and 3.46 (95% CrI 3.13 – 3.69), respectively. As such, the nationwide IFR_{1} estimate for wave 2 is 0.095% (95% CrI 0.086 – 0.109), which is slightly higher than the previously reported estimate of 0.032% (95% CrI 0.029% – 0.035%) as of May 15, 2021 (i.e. near the peak of wave 2 in India) (*13*). The nationwide IFR_{2} estimate for wave 1 is 0.326% (95% CrI 0.316 – 0.342), which is higher than our latest reported estimate of 0.183% (95% CrI 0.180 – 0.186) as of May 15, 2021 (*13*). The current estimates of the IFR for wave 2 have increased in magnitude in comparison to than past estimates due to having a time period that better reflects the trajectory of the second wave and data that encompass both the ascent and descent of cases and deaths. Moreover, the updated underreporting factors for cases and deaths are half the size of previous underreporting factors for wave 2 (i.e. 26.73 (95% CrI 24.26 – 28.81) and 5.77 (95% CrI 5.34 – 6.15), respectively, as of May 15) (*13*) and have largely converged in magnitude to the underreporting factors for wave 1.

Figure 6 contains the case and death underreporting factors and estimated number of infections and deaths for waves 1 and 2, respectively, for India. State-level estimates of IFR_{1} and IFR_{2} for waves 1 and 2 are reported in ** Supplementary Appendix N** Figure 2, which shows the between state variation in these measures for wave 1 and wave 2, respectively, for the 20 states with high cases-deaths within India.

### Combined estimates across the waves

As shown in Figures 6 above, the total cases (reported and unreported) is estimated to be 381.6 million and the total deaths (reported and unreported) to be 1.4 million, across waves 1 and 2 for India. Subsequently, the combined infection fatality rate (IFR) across both waves is estimated to be 0.101% for IFR_{1}, and 0.367% for IFR_{2} (as previously defined). State-level case and death underreporting factors for India are included in ** Supplementary Appendix N** Tables 1-3

**Figure 3.**

*and*We also include a state and country level comparison of the estimated case underreporting factors across both waves to the estimated case under counting factors presented by ICSM in the 4^{th} nationwide seroprevalence survey (as of May 31, 2021) for India (*59*) (see ** Supplementary Appendix N** Table 4

*)*. Briefly, the estimated case underreporting factors presented herein are lower at large than the case undercounting factors announced for the 4

^{th}nationwide seroprevalence survey (

*59*), with the exceptions of Andhra Pradesh, Chhattisgarh, Jammu and Kashmir, and Kerala where the case underreporting factors that we estimate are larger, as well as Maharashtra where the case underreporting factor is the same. As such, we present estimated case underreporting factors that are conservative with respect to the recent nationwide serosurvey under counting estimates. The true extent of the magnitude of infections absent in reported case counts is wagered to lie somewhere in between.

### Key Findings

✤

**As of 30**^{th}June 2021, nationwide URF (C) and URF (D) are estimated to be 12.5 and 3.5, respectively by the SEIR model.✤

**Nationwide SEIR model-based estimated total cumulative infections is 381.6 million and estimated total cumulative deaths is 1.4 million (as of 30**^{th}June 2021), whereas reported deaths were 412 thousand and cases were at 30.4 million.✤

**Nationwide model-based cumulative IFR1 and IFR2 estimates are 0.101% (95% CI: 0.097 – 0.116) and 0.367% (95% CI: 0.358 – 0.383), respectively.**✤

**The magnitude of these model-based estimates largely reconcile with the empirical findings. Across empirical data and model-based estimation, IFR2 is nearly 3.6 times the size of IFR**_{1}.

## DISCUSSION AND CONCLUSIONS

### Neighboring Countries

Despite there being insufficient data and sparse eligible studies in the neighboring countries, a brief snapshot into the SARS-CoV-2 situation within each country shows that the case fatality rate (unadjusted for underreporting) in neighboring countries is at par with India, if not higher. In Table 5, we provide a comparison of nationwide CFRs, as of 31 July 2021, within each of the countries included in this review (i.e., India, Bangladesh, Nepal, Pakistan, and Sri Lanka). Of these five countries, Pakistan stands out as having the highest CFR at 2.37% with 1.02 million cumulative reported infections and 24 thousand cumulative reported deaths (31 July 2021). Whereas India in fact has the lowest CFR of these 5 countries at 1.36%, but less surprisingly given its large population has the highest reported death and confirmed infection toll of 430 thousand and 31.6 million (31 July 2021), respectively. We recognize and caution that the CFR is an imperfect measure of the true SARS-CoV-2 fatality rate and that levels of underreporting of deaths and cases vary by country (*8*). In sum, Table 5 also presents a CFR across the countries included in this systematic review which is 1.42% to summarize the observed mortality rate in this region. The CFR among the neighboring countries alone is 1.96% which indicates that the SARS-CoV-2 observed mortality rate of the neighboring countries appears to be similar to that of India, if not slightly higher (whilst cautioning the fog of undercounting).

### Summary on fatality rates in India

For India, there were enough eligible studies to perform a meta-analysis of infection fatality rates. A key result of the meta-analysis concerning the nationwide IFR estimates for India is that both the CFR (i.e., not accounting for infection nor death underreporting) and IFR_{1} (i.e., not accounting for death underreporting) have decreased over time, whereas IFR_{2} appears to have remained same or even increased in some studies over time. In other words, when failing to take into account case and/or death underreporting (i.e., CFR and IFR_{1}), the SARS-CoV-2 mortality rate falsely appears to be decreasing. However, when correctly adjusting for both underreporting of infections and deaths (i.e., IFR_{2}), we do not observe that less deaths are resulting from the same number of infections through wave 2 compared to wave 1. This aligns with the catastrophic collapse of the healthcare system observed at large in India during the second wave of the pandemic (*2*). In sum, use of either the CFR or IFR_{1} (not accounting for death underreporting) alone may be deceptively misleading, and IFR_{2} should also be utilized in public health programming and policy decision making.

The narrative review of general population SARS-CoV-2 infection fatality rate or seroprevalence studies brought forth some trends concerning virus spread and fatality. In India, dense urban areas and slum dwellings have been found to have markedly higher prevalence rates in comparison to non-slum or rural locations (e.g., 58.4% (95% CI: 56.8– 59.9) among slums versus 17.3% (95% CI: 16.0–18.7) for Mumbai (*58*) among non-slums, and 57.9% (95% CI: 53.4–62.3) among a slum in Bengaluru (*51*)). Trends in the infection fatality rates (IFRs) are less obvious among the albeit limited number of slum population-focused seroprevalence studies in this region. A study in Mumbai found the IFR of SARS-CoV-2 in slums to be one-third the size of that of non-slums (i.e. 0.076% among slums versus 0.263% among non-slums (*58*)). Similarly, another study on a slum in Bengaluru estimated the IFR to be 2.94 deaths per 10,000 infections (or 0.029%) (*51*).

When estimating regional and statewide IFR estimates for India, the latest available studies for certain regions was during the outset of the pandemic (e.g., Central India with both input studies around August 2020). Such variation in study time periods highlights the need for updated seroprevalence studies in select regions. The findings from this regional meta-analysis also underscore the geographical differences in data collection/research spread across India. A large number of the included studies in this systematic review and meta-analysis were in the South and North, with substantially less coverage in the Central and Northeast regions. Moreover, no eligible serosurveys were available for the region Northeast India (which includes the following states: Arunachal Pradesh, Assam, Manipur, Meghalaya, Mizoram, Nagaland, Sikkim and Tripura). This leads us to the limitations of the meta-analysis presented herein.

### Limitations of Meta-analysis

We were not able to incorporate Bangladesh, Nepal, Pakistan, nor Sri Lanka in the meta-analysis presented in this paper as no eligible studies were available for these countries (aside from Pakistan, which had 2 included seroprevalence surveys), as previously discussed in the *Results* section. There were insufficient age-as well as sex-disaggregated IFRs for India (and for the neighboring countries) to examine heterogeneity in IFR estimates by either demographic. Furthermore, although age and sex-disaggregated seroprevalence estimates were extracted from the included studies, disaggregated deaths and cases by these demographics are not available for India nor for its states, cities, nor districts, at the time of this review. Additionally, there were insufficient data on excess deaths for select states (e.g., Jammu and Kashmir and Puducherry) that precluded our ability to compute IFR_{2} for studies encompassed in these states and, as such, we were unable to include these studies in the meta-analysis for the regional IFR_{2}. Additionally, we caution that several of the included studies in the quantitative synthesis are in the process of being peer-reviewed, and so multiple underlying datapoints have not yet been verified.

Lastly, we note that global studies that examine India along with multiple other countries (as part of a country level global analysis) but do not explicitly refer to India in the title/abstract would not be captured by our published search strategy. Therefore, a limitation of the meta-analysis is that these IFR estimates for India from global studies are not comprehensively reflected in the empirical results of IFRs and similarly in the excess deaths synthesis. For example, a recent study by Rahmandad et al. (2021) (*82*) that *appears* (exact estimates are not reported) to estimate for India a ratio of estimated to reported deaths on the log scale in the ballpark of 1.2-1.5 (i.e. 3.3-4.4 when exponentiated) and an IFR near 0.4-0.5% for India (as of 22 December 2020), as part of a global analysis of 92 nations, was not captured (no explicit reference to India in title/abstract). Hence, this is an example of model-based estimates not reflected in the empirical findings evidenced in this report.

### Recommendations moving forward

Based on the empirical and model-based findings evidenced in this report, we provide some high-level recommendations, primarily, as it concerns India. To reiterate, investigations aimed at estimating the true extent of COVID-19 attributed deaths in India have relied on a myriad of data sources (Table A). Such data sources do not come without their challenges (e.g., non-response bias and selection bias), and often lead to considerable variation in the ultimate results. For example, Ramachandran and Malani (2021) provide a higher estimate of excess deaths for India of 5.21 million (compared to their 3.36 million excess deaths estimate) for January 2020 to June 2021, when further considering households that do not respond consecutively (*30*). Given that no current all-cause mortality data exist for India at the time of this report with the latest release in 2010-2013, linking data sources in this manner is a common alternative. We encourage the release of timely and disaggregated data on SARS-CoV-2 cases and deaths, as are needed to assess stratified effects by age-sex-geography(*5*). Despite geographic heterogeneity, quality of case and death reporting remains of great concern for India and continues to mask the true fatality rate as well as impede ongoing epidemiological investigations. In the absence of all-cause and disaggregated COVID-19 mortality data, tools (e.g., a personal digital health identifier) (*5*) and fortified vital surveillance methods (e.g., monitoring inactive bank accounts) and practices (e.g., increased community engagement) (*5*) are needed to facilitate and validate the use of the previously mentioned data sources and others. Strengthening the nationwide vital surveillance system in these ways would enable timely and pointed interventions from public health and government officials to ultimately prevent further overload of healthcare systems and loss of life. To put the excess deaths synthesis into context, the range of death underreporting factor for India of 4.4-11.9 (based on 1.79-11.9 million excess deaths from (*21*) and (*23*), respectively) falls toward the higher end of the global death undercounting factor of 3.3 (95% CI: 2.1-4) estimated by the Economist (as of May 2021) (*83*). This indicates that the extent of death underreporting, as it pertains to SARS-CoV-2 remains particularly acute for India as compared to that of globally.

### Why death matters?

From an economic perspective, why is it necessary to do these mortality calculations? Death matters because life matters – prolonging life is a goal, not merely a means to generate economic value. The widely used human development index (HDI) of the World Bank takes a multi-dimensional approach to measuring welfare. Material wellbeing (GDP) is only one of its three components; the others involve knowledge (literacy rate) and life (life expectancy at birth).

Nevertheless, for the purpose of cost-benefit analysis and economic evaluation of public health measures, it is useful to be able to assign a monetary value to every life lost or saved. Measuring the value of a life by lost earnings or productivity is deeply flawed – it would, for example, have the absurd implication that the lives of non-working individuals have no value at all. The most commonly used approach to estimating the value of a statistical life (VSL) adopts the revealed preference or willingness-to-pay (WTP) principle. Hedonic wage regressions estimate the compensating differential (*84*) in market wages for jobs that involve a higher probability of fatal accidents – a measure of how much income the average worker is willing to sacrifice in order to reduce mortality risk. Sacrificed earnings are then aggregated over the lifetime using present discounted values and scaled up to a probability of 1 to arrive at VSL.

There are several estimates of VSL from developed countries^{1} but studies on developing countries including India are few and far between. For our purposes, we will use Majumder and Madheswaran (2018) (*85*) who estimate VSL for the average male industrial worker in India to be INR 44.69 million (US $0.61 million at current exchange rates)^{2}. This may introduce upward as well as downward biases into our calculations. For example, people dying of COVID-19 are, on average, older than the industrial workforce, which implies the mortality cost presented below is an overestimate. On the other hand, wages are only one part of a worker’s economic contribution. A proper accounting of the social value of a life must also account for profits, rents and consumers’ surplus generated by the economic activity of that individual, as well as non-pecuniary benefits conferred on relatives, friends and acquaintances. These are not reflected in private WTP and are therefore a source of underestimating VSL. The calculations presented below are ballpark figures that will reflect reality closely if the above-mentioned biases largely cancel each other out.

Table 6 below monetizes the death toll of COVID-19 in India till June 2021. Based on reported death figures alone, the economic cost of Covid-induced mortality comes to around 9% of India’s annual GDP. If we use the model-based estimates or the lower end of the range of excess death estimates, we incur a notional cost of 30-40% of annual GDP. Note that the official decline in India’s GDP in the fiscal year 2020-21 is 7.7%, which is a gross underestimate of the true loss once the value of lives is taken into account. By various estimates, the fiscal outlay of the government in its financial rescue package was about 1% of GDP. This is a very small fraction of what is at stake in this pandemic. Total healthcare spending in 2018 by the central and state governments taken together was only 1.28% of GDP, which again illustrates the serious under-investment in mitigating health shocks like COVID-19.

Other than national estimates of mortality and IFR, there is considerable value to measuring infections and deaths across gender, caste, income levels, regions, etc. These can be useful guides for policymakers, telling them where scarce healthcare resources and aid should be concentrated. The within-country heterogeneity in both the epidemiological and economic impact of the disease has been noted in studies from all parts of the world. Unfortunately, absence of disaggregated data has tied the hands of researchers in India. Nevertheless, our meta-analysis uncovers substantial regional variations in IFR_{2}, from a low of 0.191 – 0.205% in the South to a high of 0.477% in the North. These differences are plausibly created by socio-economic factors (like differences in quality of governance and healthcare) rather than biological ones and provide policymakers the information necessary for welfare arbitrage and efficient resource allocation.

In closing, premature mortality is not only an indicator of health but also of economic cost and productivity, as discussed above and as has been evidenced for other health conditions in India (*91*). Furthermore, unforeseen death within a family or community may disturb the livelihood and income generation structure, thereby pervading to the broader welfare of impacted individuals, although reverse causation is present (*92*). Aside from the health lens, premature mortality and infectious disease mortality’s consequences for economic stability and ensuing welfare policy further support the need for continued attention to assessing the mortality burden of COVID-19 within India.

## Data Availability

Data files and code used to produce the results herein are available on GitHub at the below repository.

## Funding

The research was sponsored by funding from the University of Michigan School of Public Health and Center for Precision Health Data Science.

## Author Contributions

Conceptualization: LZ, BM

Methodology: LZ, BM

Investigation: LZ, SB, SP, RK, RB, PG, BM

Supervision: BM

Writing – original draft: LZ, BM

Writing – review & editing: LZ, SB, SP, RK, RB, PG, BM

## Competing interests

Authors have no competing interests

## Acknowledgments

The authors thank the librarians from the University of Michigan Taubman Health Sciences Library for their instruction on developing the search strategy for this rapid review.

## Supplementary Appendices

## Appendix A. PRISMA checklist *[1]*

## Appendix B. Systematic review search procedure

To conduct the present systematic review, we systematically collected publications and preprints concerning infection fatality rates (IFRs) from SARS-CoV-2 in India and neighboring countries of Bangladesh, Nepal, Pakistan, and Sri Lanka. To identify relevant papers, we searched four databases: PubMed, Embase, Global Index Medicus, and isearch for preprints (encompassing bioRxiv, medRxiv, and SSRN). The search was conducted on July 3, 2021 and, as such, the results reflect published studies and preprints available from January 1, 2020 to July 3, 2021. Results were further verified through August 15, 2021 through reviewing media reports, government press releases, and manual search of preprints and publications. Data were extracted from the online search engines into the reference manager Zotero, deduplicated, and imported into Excel for screening.

The title/abstract screening, the full-text screening, and data abstraction were independently performed by two screeners to verify which studies met inclusion and exclusion criteria and to very data collected. When the two screeners disagreed on the marking for a citation, the screeners reached a consensus on whether to advance the citation to the next level of screening. Below we publish the full search strategies for each database.

**PubMed (National Library of Medicine)**

**Date searched:** 7/3/2021

**Number of results:** 2,940

**Date filter:** January 1, 2020 to [blank]

**Other filters applied:** None

1.

covid-19[tw] OR COVID19[tw] OR SARS-CoV-2[tw] OR SARS-CoV2[tw] OR severe acute respiratory syndrome coronavirus 2[tw] OR 2019-nCoV[tw] OR 2019nCoV[tw] OR coronavirus[tw] OR coronavirus[mh] OR covid-19[mh]

2.

india[text word] OR india[mesh] OR indian[text word] OR pakistan[text word] OR pakistani[text word] OR pakistan[mesh] OR bangladesh[text word] OR bangladeshi[text word] OR bangladesh[mesh] OR nepal[text word] OR nepal[mesh] OR “sri lanka”[text word] OR “sri lankan”[text word] OR “sri lanka”[mesh]

3.

IFR[text word] OR infection*[text word] OR CFR[text word] OR case*[text word] OR transmission*[text word] OR mortalit*[text word] OR mortality[mesh] OR fatalit*[text word] OR lethalit*[text word] OR death*[text word] OR burden[text word] OR underreporting[text word] OR “under-reporting”[text word] OR seroprevalence[text word] OR serosurvey[text word] OR serology[text word] OR serology[mesh] OR seroconversion[text word] OR seroconversion[mesh] OR “serosurveillance”[text word] OR Seroepidemiologic studies[mesh] OR seroepid*[text word] OR seropositiv*[text word] OR antibod*[text word] OR

antibodies[mesh] OR surveillance[text word] OR SIR[text word] OR SEIR[text word] OR “susceptible-exposed-infected-removed”[text word] OR “susceptible-infected-removed”[text word]

(1 AND 2 AND 3)

**Embase (Elsevier)**

**Date searched:** 7/3/2021

**Number of results:** 1,119

**Date filter:** 2020 to 2021

**Other filters applied:** Embase only and not Medline (as Medline is included in PubMed)

1.

covid-19:ti,ab,kw OR COVID19:ti,ab,kw OR SARS-CoV-2:ti,ab,kw OR SARS-CoV2:ti,ab,kw OR “severe acute respiratory syndrome coronavirus 2”:ti,ab,kw OR 2019-nCoV:ti,ab,kw OR 2019nCoV:ti,ab,kw OR coronavirus:ti,ab,kw OR ‘Coronavirinae’/exp OR ‘coronavirus disease 2019’/exp

2.

india:ti,ab,kw OR ‘india’/exp OR indian:ti,ab,kw OR pakistan:ti,ab,kw OR pakistani:ti,ab,kw OR ‘pakistan’/exp OR bangladesh:ti,ab,kw OR bangladeshi:ti,ab,kw OR ‘bangladesh’/de OR nepal:ti,ab,kw OR ‘nepal’/de OR “sri lanka”:ti,ab,kw OR “sri lankan”:ti,ab,kw OR ‘sri lanka’/de

3.

IFR:ti,ab,kw OR infection*:ti,ab,kw OR CFR:ti,ab,kw OR case*:ti,ab,kw OR transmission*:ti,ab,kw OR mortalit*:ti,ab,kw OR ‘mortality’/exp OR fatalit*:ti,ab,kw OR lethalit*:ti,ab,kw OR death*:ti,ab,kw OR burden:ti,ab,kw OR underreporting:ti,ab,kw OR under-reporting:ti,ab,kw OR seroprevalence:ti,ab,kw OR serosurv*:ti,ab,kw OR serology:ti,ab,kw OR ‘serology’/exp OR seroconversion:ti,ab,kw OR ‘seroconversion’/de OR ‘Seroepidemiology’/exp OR seroepid*:ti,ab,kw OR seropositiv*:ti,ab,kw OR antibod*:ti,ab,kw OR ‘antibody’/exp OR surveillance:ti,ab,kw OR SIR:ti,ab,kw OR SEIR:ti,ab,kw OR susceptible-exposed-infected-removed:ti,ab,kw OR susceptible-infected-removed:ti,ab,kw

(1 AND 2 AND 3)

**isearch (National Library of Medicine)**

**Date searched:** 7/3/2021

**Number of results:** 1,078

**Date filter:** January 1, 2020 to [blank]

**Other filters applied:** filtered to facets bioRxiv, medRxiv, SSRN; searched through title and abstract.

Note: Since isearch is curated to include COVID-19 related studies only, we translated the PubMed concept blocks 2 and 3.

(india OR indian OR pakistan OR pakistani OR bangladesh OR bangladeshi OR nepal OR “sri lanka” OR “sri lankan”) AND (IFR OR infection* OR CFR OR case* OR transmission* OR mortalit* OR fatalit* OR lethalit* OR death* OR burden OR underreporting OR “under-reporting” OR seroprevalence OR serosurvey OR serology OR seroconversion OR “serosurveillance” OR seroepid* OR seropositiv* OR antibod* OR surveillance OR SIR OR SEIR OR “susceptible-exposed-infected-removed” OR “susceptible-infected-removed”)

**Global Index Medicus--SEAR & EMR (World Health Organization)**

*Note:* We searched in IMSEAR (Index Medicus for the South-East Asia Region) for India, Bangladesh, Nepal, and Sri Lanka, and we searched in IMEMR (Index Medicus for the Eastern Mediterranean Region) for Pakistan.

**For IMSEAR:**

**Date searched:** 7/3/2021

**Number of results:** 35

**Date filter:** 2020 to [blank]

**Other filters applied:** Index filtered to “IMSEAR (South-East Asia)”; searched through title, abstract, subject.

(tw:(covid-19) OR tw:(COVID19) OR tw:(SARS-CoV-2) OR tw:(SARS-CoV2) OR tw:(“severe acute respiratory syndrome coronavirus 2”) OR tw:(2019-nCoV) OR tw:(2019nCoV) OR tw:(coronavirus)) AND (tw:(india) OR tw:(indian) OR tw:(bangladesh) OR tw:(bangladeshi) OR tw:(nepal) OR tw:(“sri lanka”) OR tw:(“sri lankan”)) AND (tw:(IFR) OR tw:(infection*) OR tw:(CFR) OR tw:(case*) OR tw:(transmission*) OR tw:(mortalit*) OR tw:(fatalit*) OR tw:(lethalit*) OR tw:(death*) OR tw:(burden) OR tw:(underreporting) OR tw:(“under-reporting”) OR tw:(seroprevalence) OR tw:(serosurvey) OR tw:(serology) OR tw:(seroconversion) OR tw:(“serosurveillance”) OR tw:(seroepid*) OR tw:(seropositiv*) OR tw:(antibod*) OR tw:(surveillance) OR tw:(SIR) OR tw:(SEIR) OR tw:(“susceptible-exposed-infected-removed”) OR tw:(“susceptible-infected-removed”))

**For IMEMR:**

**Date searched:** 7/3/2021

**Number of results:** 2

**Date filter:** 2020 to [blank]

**Other filters applied:** Index filtered to “IMEMR (Eastern Mediterranean)”; searched through title, abstract, subject.

(tw:(covid-19) OR tw:(COVID19) OR tw:(SARS-CoV-2) OR tw:(SARS-CoV2) OR tw:(“severe acute respiratory syndrome coronavirus 2”) OR tw:(2019-nCoV) OR tw:(2019nCoV) OR tw:(coronavirus)) AND (tw:(pakistan) OR tw:(pakistani)) AND (tw:(IFR) OR tw:(infection*) OR tw:(CFR) OR tw:(case*) OR tw:(transmission*) OR tw:(mortalit*) OR tw:(fatalit*) OR tw:(lethalit*) OR tw:(death*) OR tw:(burden) OR tw:(underreporting) OR tw:(“under-reporting”) OR tw:(seroprevalence) OR tw:(serosurvey) OR tw:(serology) OR tw:(seroconversion) OR tw:(“serosurveillance”) OR tw:(seroepid*) OR tw:(seropositiv*) OR tw:(antibod*) OR tw:(surveillance) OR tw:(SIR) OR tw:(SEIR) OR tw:(“susceptible-exposed-infected-removed”) OR tw:(“susceptible-infected-removed”))

## Appendix C. Summary of included articles

## Appendix D. List of excluded articles

## Appendix E. Summary of excluded articles

## Appendix F. Meta-analysis methodology

The aims of the meta-analysis are two-fold (1) estimate a nationwide IFR_{1} and IFR_{2} with lower and upper bounds based on nationwide excess deaths, and (2) estimate regional IFR_{1} and IFR_{2} with lower and upper bounds based on state/city/district-specific excess deaths.

### Data collection and preparation

A description of the data collection and preparation of the datafile for the meta-analysis is provided in the ** Methods** section. Here we elaborate on aspects that require further explanation and clarification.

For included studies with a pre-calculated infection fatality rate, the IFR_{1} and/or IFR_{2}, along with the 95% confidence interval, are directly extracted from the included study. For studies that provide a pre-calculated IFR_{1} but no IFR_{2} (i.e., do not further account for death underreporting), we compute the IFR_{2} through multiplying the numerator of the pre-calculated IFR_{1} by the appropriate range of excess deaths estimates. For studies that report a pre-calculated IFR_{2} but no IFR_{1} (i.e., do not provide preliminary infection fatality estimate without accounting for death reporting), we compute the IFR_{1}, using the seroprevalence estimate quoted within the included study and following the same steps described below to compute IFR_{1}.

For studies without a pre-calculated IFR_{1} and/or IFR_{2}, IFR_{1} and/or IFR_{2} is computed, as given in the below formulas

Where

and

For the denominator in formulas (*a*) and (*b*), the seroprevalence estimate corresponding to the general study population, as well as the 95% confidence interval, are directly extracted from the included study. We retrieved seroprevalence estimates that were adjusted (within the serosurvey design) for the test performance and weighted to be representative of the study’s general population (typically, for the demographics age and sex, among others, such as rural versus urban), if available. For studies that did not both weight and account for test performance, we extracted the solely weighted or solely test performance adjusted seroprevalence estimate, as provided. For studies that did not report either a weighted or a test performance adjusted seroprevalence estimate, we retrieved the provided crude seroprevalence estimates. The age-adjusted population estimate is calculated as the 2019 projected population estimate on the 2011 census website multiplied by the proportion of the population above the age-cutoff of the included study (e.g., proportion of the population of Karnataka aged ≥ 18 years), as obtained from the age composition for the study area from the 2011 census. As noted in the ** Results** section, for select cities and districts, no 2019 projected population estimate was available and as such we use the 2011 census population estimates for the following cities and districts: Ahmedabad, Chennai, Bangalore Rural District, Indore, Ujjain. We note that this may lead to an overestimation of IFR

_{1}nor IFR

_{2}from these studies, and in turn slightly inflated pooled estimates for the regions containing these study locations.

For the numerator in formulas (*a*) and (*b*), COVID-19 reported cumulative deaths are sourced from *covid19india.org* and collected 14 days after the study end date to account for delay in death from SARS-CoV-2 symptom onset. In practice, reported (i.e., observed) deaths for the target population are obtained some specified number of days after the end date of the serological study that varies between studies (e.g., from 2 days [35] to 21 days [14] [18], among others). Levin et al., 2021 [148] perform a simulation-based sensitivity analysis to derive an appropriate fatality delay, and propose and adopt in their systematic review of age-specific infection fatality rates a fatality delay of 4 weeks after the midpoint of the serosurvey. As previously discussed, reported deaths were not available for select cities or districts in *covid19india.org* and so we were not able to compute IFR_{1} nor IFR_{2} for the cities or districts-level studies Berhampur, Bhubaneswar, Pimpri-Chinchiwad, Rourkela and Devarajeevana Halli slum in Bengaluru, and thereby were not able to include these studies in the quantitative meta-analysis (as listed in Appendix G).

For the numerator in formula (*b*), the death under reporting factor is either directly extracted from media reports and excess deaths studies available at the time of this report (as are listed in the ** Methods** section) or calculated using excess cumulative deaths estimates provided within these sources. For the latter, URF (D) is computed as the provided excess deaths estimate divided by the COVID-19 reported deaths 14 days after the end date of the study end date, as previously reasoned. As previously noted, excess deaths were not available for the following states at the time of this study and so we were unable to compute IFR

_{2}, as well as include in the regional analysis, the following states and corresponding studies: Jammu and Kashmir (ref) and Puducherry (ref).

Being that the infection fatality rate (IFR) measure is understood to be a rate and that upon inspection its distribution was heavily right skewed, a log transformation is applied to the sampling data to approximate a normal distribution.

For the 95% confidence interval for *IFR*_{1}, first we obtain the standard error for seroprevalence from the directly provided 95% confidence interval from each included study as

Now that we have the standard error for seroprevalence, the standard error for the log of IFR_{1} can be obtained as IFR_{1} relies on *Seroprevalence* as detailed in formula (*a*) above. First, notice that

By rules of logarithmic operations, it follows that

By the definition in formula (*a*) and assuming *Reported Deaths* do not contribute to variability and are thereby fixed, it follows that

where *n* is the study sample size and is the seroprevalence estimate from the included study.

By rules of logarithmic operations, we have that

Assuming *n* does not contribute to variability and fixing at some constant

Therefore, . Let us consider the variance of *log (IFR _{1})*.

Substituting in for *log (IFR _{1})* from above, we have

From the Taylor Series expansion, it follows that

Then, the standard error for the log of IFR_{1} is given by

where *se _{sero}* is defined as in formula (

*c*) above and is the directly provided seroprevalence estimate.

Then, letting denote the estimate of *log (IFR _{1})*, the asymptotic approximate 95% confidence interval for

*log (IFR*is as follows:

_{1})The resulting confidence intervals are then exponentiated to back-transform from the logarithmic scale.

For the 95% confidence interval for *IFR _{2}*, as the upper and lower bounds of the 95% confidence interval for the associated

*URF*(

*D*) are often not available (i.e., the range of uncertainty associated with excess deaths estimates is not consistently available), the asymptotic approximate 95% confidence interval for

*IFR*is obtained by multiplying the 95% confidence interval for

_{2}*IFR*by the associated

_{1}*UR*F (

*D*).

### Meta-analysis framework

A random effects model is used with the DerSimonian-Laird (DL) estimator for *τ*^{2} (also denoted as *tau*^{2}), the variance of the true effect sizes. The DL estimator, is given by

where *Q _{W}* denotes the appropriate test statistic with

*k-1*denoting the degrees of freedom and

*w*

_{i}denotes the sampling weight for the

*C*

^{tℎ}included study datapoint.

The inverse variance approach is then used to obtain the pooled estimates (nationwide, regional, and state-specific within India). This means that the weighting in the random effects model is the inverse of the sampling variance, as follows

where i denotes the *i*^{tℎ} included study datapoint, is the standard error from the *i*^{tℎ} included study estimate, and is the DL estimated random effects variance component, as defined in (*e*) above.

Using a random effects model with inverse variance method and DL estimator, the estimate for the pooled effect size is then given as follows:

As previously mentioned, a log transformation is applied to the sampling data to approximate a normal distribution. In other words, we log transform both IFR_{1} and IFR_{2} in the meta-analysis and appropriately back-transform the resulting point estimates and standard errors by exponentiating the log-transformed values. Hence, in (*e*) above in this context is and, similarly, .

Then, to estimate the nationwide pooled infection fatality rates (IFR_{1} and IFR_{2}) for India, countrywide IFR estimates (pre-calculated or computed) among included studies (as verified through August 15, 2021) are pooled, as provided in (*e*) above using the random effects framework detailed. The nationwide pooled infection fatality estimate includes the computed IFRs from each of the four nationwide seroprevalence surveys conducted consecutively for India, as stratifying by time periods (i.e., the nationwide first and second waves of SARS-CoV-2 in India) is of particular interest.

To estimate the regional pooled infection fatality rates (IFR_{1} and IFR_{2}) in India, IFR estimates (pre-calculated or computed) from included studies within a state are pooled, as provided in (*e*) above with the same random effects approach outlined. Then, the regional IFR is estimated as the pooled IFR across pooled state level IFRs, as provided in (*e*) except where *i* now denotes the *i*^{tℎ} state. Since the regional analysis does not involve stratifying by waves (and thereby time points), for serial (or repeated) serosurveys for which multiple seroprevalence estimates are provided for a given study location at various time points, the most recent estimate is considered and included in the regional analysis.

Using the *meta* package in R, pooled effect sizes are estimated, as well as 95% confidence intervals, following the methodological framework above.

## Appendix G. Summary of excluded articles from quantitative summary

## Appendix H. Summary of seroprevalence estimates within India

Below is a forest plot with the seroprevalence estimates used to compute IFR_{1} and in turn IFR_{2} (except for studies for which IFR_{2} was pre-calculated) in the meta-analysis of nationwide and regional IFRs.

## Appendix I. Risk of bias assessment across included articles

See supplementary file *Supplementary_RiskofBias.xlsx* for results from the risk of bias assessment among the included studies in the meta-analysis, using the Joanna Briggs Institute (JBI) tool. Responses to each question in the JBI approach, as well as the cumulative score and rank of risk of bias, are detailed for each of the 19 studies (i.e. 15 serosurvey studies and 4 other study designs).

## Appendix J. Assessment of publication bias

To formally test for funnel plot asymmetry, the Egger’s test (i.e., linear regression) is performed with a resulting p-value of 0.0501. Seeing as the significance level of the funnel plot intercept in the Egger’s test nearly meets the benchmark of 0.05, we further conduct the Begg’s test (i.e., rank correlation test) and with a p-value<0.0001, conclude that the funnel plot is asymmetric. As discussed within the ** Results** section, despite the results of these diagnostic tests, we do not suspect that publication bias is the driving factor behind the observed asymmetry. Firstly, this is because the bulk of the included studies are seroprevalence studies, which are inherently large studies with rigorous study designs, thereby tending toward high precision (i.e., low standard errors). Secondly, there may be heterogeneity in the true effect size between the included studies for reasons such as geographic variation that may be attributing to the largely horizontal dispersion of the standard errors in Figure 6 (the funnel plot) in the

**section, and funnel plots assume a single true effect size.**

*Results*## Appendix K. SEIR-fansy model framework

*(Note: Explanation below is unchanged from Supplementary Materials in previous submission ^{1}.)*

### Introduction

Here we are using the SEIR-fansy model^{1,2} and software package^{3} which uses a compartmental model accounting for false negative rates and preferential diagnostic testing for SARS-CoV-2 infections. The SEIR-fansy model can be represented by the compartmental model in Figure S1.

1. Purkayastha S, Kundu R, Bhaduri R, Barker D, Kleinsasser M, Debashree R, Mukherjee B. Estimating the wave 1 and wave 2 infection fatality rates from SARS-CoV-2 in India*. BMC Res Notes*. 14, 262 (2021). doi:10.1186/s13104-021-05652-2.

2. Bhaduri R, Kundu R, Purkayastha S, Kleinsasser M, Beesley LJ, Mukherjee B. *Extending the Susceptible-Exposed-Infected-Removed (SEIR) model to handle the high false negative rate and symptom-based administration of Covid-19 diagnostic tests: SEIR-fansy.* medRxiv [Preprint]. 2020 Sep 25:2020.09.24.20200238. doi: 10.1101/2020.09.24.20200238. PMID: 32995829; PMCID: PMC7523173.

3. Ritwik Bhaduri, Ritoban Kundu, Soumik Purkayastha, Lauren Beesley and Bhramar Mukherjee (2020). *SEIRfansy: Extended Susceptible-Exposed-Infected-Recovery Model.* R package version 1.1.0. https://CRAN.R-project.org/package=SEIRfansy

### Mathematical framework

The following differential equations summarize the transmission dynamics being modeled.

Using the Next Generation Matrix Method (28), we have calculated the basic reproduction number

where *S*_{0} = λ/*μ* = 1 since we have assumed that natural birth and death rates are equal within this short period of time. In this setting, both *β* and *R* are time-varying parameters which are estimated using the Metropolis-Hastings MCMC method. To estimate the parameters, we at first need to solve the differential equations, which is difficult to perform in this continuous-time setting. It is also worth noting that we do not require the values of the variables for each time point. Instead, we only need their values at discrete time steps, i.e., for each day. Thus, we approximate the above set of differential equations by a set of recurrence relations. For any compartment *X*, the instantaneous rate of change with respect to time *t* (given by ) is approximated by the difference between the counts of that compartment on the (*t* + 1)^{tℎ} day and the *t*^{tℎ} day, that is *X*(*t* + 1) − *X*(*t*). Starting with an initial value for each of the compartments on the Day 1 and using the discrete-time recurrence relations, we can then obtain the solutions of interest. Some examples of these discrete-time recurrence relations are presented below.

The rest of the differential equations can each be similarly approximated by a discrete-time recurrence relation.

### Likelihood assumptions and estimation

We use Bayesian estimation techniques and Markov chain Monte Carlo (MCMC) methods (namely, Metropolis-Hastings method with Gaussian proposal distribution) for estimating the parameters. First, we approximated the above set of differential equations using a discrete time approximation using daily differences. So, after we started with an initial value for each of the compartments on the day 1, using the discrete time recurrence relations we can find the counts for each of the compartments on the next days. To proceed with the MCMC-based estimation, we specify the likelihood explicitly. We assume (conditional on the parameters) the number of new confirmed cases on day *RR* depend only on the number of exposed individuals on the previous day. Specifically, we use multinomial modeling to incorporate the data on recovered and deceased cases as well. The joint conditional distribution is

A multinomial distribution-like structure is then defined,

*Note:* the expected values of *E*(*t* − 1) and *P*(*t* − 1) are obtained by solving the discrete time differential equations as described earlier.

### Prior assumptions and MCMC

For the parameter *r*, we assume a *U*(0,1) prior, while for *β*, we assume an improper non-informative flat prior with the set of positive real numbers as support. After specifying the likelihood and the prior distributions of the parameters, we draw samples from the posterior distribution of the parameters using the Metropolis-Hastings algorithm with a Gaussian proposal distribution. We run the algorithm for 200,000 iterations with a burn-in period of 100,000. Finally, the mean of the parameters in each of the iterations are obtained as the final estimates of *β* and *r* for the different time periods. To obtain confidence intervals of various estimates we predict the number of individuals in each compartment given a set of parameters which are drawn using MCMC. This is done for 100,000 iterations. Using these values, we obtain the 95% Bayesian Credible Intervals of the estimates (such as infection fatality rates and underreporting factors)

### Estimation of parameters of interest

Our main parameters of interest here are Underreporting factors for cases and deaths and Infection Fatality rate. Underreporting factors (URF) for cases and deaths are defined as follows:

Here, total cumulative cases refers to all Cumulative cases including both reported and unreported cases. Similarly total cumulative deaths includes both reported and unreported deaths. Since we are unable to observe unreported cases or deaths we estimate total cumulative cases and deaths as follows:

Total Cumulative cases at time

*t*=*P(t)+U(t)+F(t)+RR(t)+RU(t)+DR(t)+DU(t)*Total Cumulative deaths at time

*t*=*DR(t)+DU(t)*

To estimate the true fatality rate of COVID-19, we calculate 2 different infection fatality rates IFR1 and IFR2 as defined in formulas (*a*) and (*b*) respectively, in Supplementary Appendix F.

We also calculate the Case fatality rate as defined in the ** Methods** section.

Now *Cumulative Deaths* follows a *Bin*(*Observerd cumulative cases, CFR _{true}*) distribution, with the estimate of

*CFR*given by CFR, making CFR is a binomial proportion. Let and

_{true}*n*=

*Observerd cumulative cases*. So the asymptotic approximate 95% confidence interval is given by

The estimates of infection fatality rates and underreporting factors are based on Bayesian credible intervals constructed from the exact posterior draws, as described before.

## Appendix L. Data source and model-based results

The data has been sourced from *covid19india.org*. We used daily case-recovery-death count data from April 1, 2020 to January 31, 2021 for wave 1 and from February 1, 2021 – June 30, 2021 for wave 2. The predicted number of reported and total cases and deaths for January 31, 2021 (wave 1) and June 30 (for wave 2 and waves 1 and 2 combined) are shown in Tables S1, S2, and S3 respectively.

The mean estimates and the 95% CrI’s of underreporting factors for cases and deaths on January 31, 2021 are shown in Figure S2. Relevant wave 2 values are presented in Figure S3.

## Footnotes

↵1 The paucity of data in poorer countries has prompted some economists to adopt the benefit transfer method (BMT) – VSL estimates from richer countries are taken off-the-shelf and linearly scaled down to produce an estimate for poorer countries using the per capita GDP ratio. As Majumder and Madheswaran (2018) (

*85*) note, this is likely to generate a significant underestimate since available data suggest the income elasticity of the demand for safety is considerably less than one, i.e., the poor are willing to pay a much higher proportion of their incomes to increase the chance of staying alive, in similar spirit to Engel’s Law for food.↵2 Other estimates of VSL in India are as follows: INR 14 - 19 million (

*86*), INR 6.4 - 15 million (*87*), INR 14.8 – 15.4 million (*88*), INR 10.28 million (*89*).

## REFERENCES

- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.↵
- 48.↵
- 49.↵
- 50.↵
- 51.↵
- 52.↵
- 53.↵
- 54.↵
- 55.↵
- 56.↵
- 57.↵
- 58.↵
- 59.↵
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.↵
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.
- 79.↵
- 80.
- 81.
- 82.↵
- 83.↵
- 84.↵
- 85.↵
- 86.↵
- 87.↵
- 88.↵
- 89.↵
- 90.
- 91.↵
- 92.↵

## References

- [1].
- [2].
- [3].
- [4].
- [5].
- [6].
- [7].
- [8].
- [9].
- [10].
- [11].
- [12].
- [13].
- [14].
- [15].
- [16].
- [17].
- [18].
- [19].
- [20].
- [21].
- [22].
- [23].
- [24].
- [25].
- [26].
- [27].
- [28].
- [29].
- [30].
- [31].
- [32].
- [33].
- [34].
- [35].
- [36].
- [37].
- [38].
- [39].
- [40].
- [41].
- [42].
- [43].
- [44].
- [45].
- [46].
- [47].
- [48].
- [49].
- [50].
- [51].
- [52].
- [53].
- [54].
- [55].
- [56].
- [57].
- [58].
- [59].
- [60].
- [61].
- [62].
- [63].
- [64].
- [65].
- [66].
- [67].
- [68].
- [69].
- [70].
- [71].
- [72].
- [73].
- [74].
- [75].
- [76].
- [77].
- [78].
- [79].
- [80].
- [81].
- [82].
- [83].
- [84].
- [85].
- [86].
- [87].
- [88].
- [89].
- [90].
- [91].
- [92].
- [93].
- [94].
- [95].
- [96].
- [97].
- [98].
- [99].
- [100].
- [101].
- [102].
- [103].
- [104].
- [105].
- [106].
- [107].
- [108].
- [109].
- [110].
- [111].
- [112].
- [113].
- [114].
- [115].
- [116].
- [117].
- [118].
- [119].
- [120].
- [121].
- [122].
- [123].
- [124].
- [125].
- [126].
- [127].
- [128].
- [129].
- [130].
- [131].
- [132].
- [133].
- [134].
- [135].
- [136].
- [137].
- [138].
- [139].
- [140].
- [141].
- [142].
- [143].
- [144].
- [145].
- [146].
- [147].
- [148].