An estimate of the COVID-19 infection fatality rate in Brazil based on a seroprevalence survey

We infer the infection fatality rate (IFR) of SARS-CoV-2 in Brazil by combining three datasets. We compute the prevalence via the population-based seroprevalence survey EPICOVID19-BR, which tested 89000 people in 3 stages over a period of 5 weeks. This randomized survey selected people of 133 cities (accounting for 35.5% of the Brazilian population) and tested them for IgM/IgG antibodies making use of a rapid test. We estimate the time delay between the development of antibodies and subsequent fatality using the public SIVEP-Gripe dataset. The number of fatalities is obtained using the public Painel Coronavirus dataset. The IFR is computed for each survey stage and 27 federal states. We infer a country-wide average IFR of 1.05% (95% CI: 0.96-1.17%) and find evidence for its increase starting in June 2020.

The infection fatality rate (IFR) is one of the most important quantities of any new disease. An accurate estimate of both the case fatality rate (CFR) and IFR is thus usually a challenge before the end of a pandemic. 1 It is, nevertheless, a very important endeavor as it has direct implications on the amount of resources and effort that should be allocated to prevent the spread of the disease and help steer policy-making in general. For instance using the United States as reference, Perlroth et al. 2 concluded that a CFR below 1% makes schoolclosures and social distancing not cost-effective.
In order to estimate the IFR one needs not only an estimate of the number of deaths, but also of the total infected population, and then to compare both within the same time period. It is, therefore, a difficult task as many cases are asymptomatic or develop only mild symptoms and are often unaccounted for. It is also hampered due to the lack of testing in many countries. 3 The total number of deaths during an epidemic can be biased by the mislabeling of undiagnosed fatalities. To circumvent this possibility, one can rely on statistical estimates from the study of the excess deaths in a given period of time. In the case of COVID-19 this method is being pursued by many groups, 4-6 including the mainstream media, 7-9 as a method which is complementary to the officially reported numbers. However, this approach invariably suffers from important modeling uncertainties. 5 This may be especially true during the current pandemic which has seen an unprecedented amount of disruption of economic activity and social behavior, which includes a large fraction of the population undertaking social distancing measures. 10 One of the first detailed analysis of the IFR of COVID-19 was based on around 70 thousand clinically diagnosed cases in China. After adjusting for demography and under-ascertainment Verity et 15 On the other hand, a report by the group at Imperial College London estimated much higher values for the 16 Brazilian states they considered, 16 which, combined, suggest an overall IFR of 0.9%. The incompatible estimates above highlight the in-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted August 21, 2020.   herent uncertainty in modeling a new disease that has caused such an unprecedented change in lifestyle worldwide. This is the main reason why one should rely on seroprevalence estimates in order to estimate the IFR of COVID-19. In a recent study, relying on antibody screening of blood donors, the IFR was estimated to be much lower, less than 0.21% at 95% CL. 17 Such an approach is, however, limited by the fact that blood donors may not be representative of the population. In particular all donors are younger than 70 and healthy. The ideal approach to circumvent the limitations above is to conduct random serology studies in the population. One such study -conducted in Geneva, Switzerland, with 2766 participants -found that for every reported COVID-19 case there were another 10.6 unreported ones, 18 a large discrepancy which again stresses the difficulties that models have to deal with. The same group reported an IFR of 0.64% (95% CI: 0.38-0.98%). 19 A much larger survey with 61075 participants was conducted in Spain, but IFR estimates were not reported. 20 A meta-analysis of 36 seroprevalence studies performed by Ioannidis 21 found that the IFR values ranged from 0.00% to 1.31%, and among 32 different locations the median IFR was 0.24%. Another meta-analysis of 25 IFR studies found an IFR of 0.68% (95% CI: 0.53-0.82%). 22 These results hint at a possible large variation in IFR values around the globe, although data from different countries were reported to be highly heterogeneous.
In Brazil, a large random seroprevalence study was performed by the EPICOVID19-BR team 23 which aimed to test 250 individuals in each of the 133 selected large sentinel cities. It has so far been carried out in 3 stages using the Wondfo lateral flow test for immunoglobulin M and G antibodies against SARS-CoV-2. The first stage was conducted between May 14 and 21, 2020, but did not reach its target number of samples, and in only 90 of the 133 cities at least 200 tests were performed. The total number of tests in all cities was 25025. Round 2 was conducted from June 4 to 7 and reached over 200 tests in 120 cities. Considering all cities a total of 31165 individuals were tested. Round 3 was performed between June 21 and 24 and made over 200 tests in all 133 cities for a total of 33207 tests. The total number of tests in all rounds was 89397, see Figure 1.
The COVID-19 pandemic has strongly affected Brazil. 24 The federal government response has been heavily criticized, 25 and in August 2020 the number of confirmed cases and deaths crossed 3 million and 100 thousand, respectively, second only to the USA in raw numbers. Furthermore, strong ethnic end regional variations in hospital mortality were found, casting doubts on the availability of public health care for the sections of society that cannot afford private care. 26 This daring situation motivates even further the need for an estimation of the IFR which is as accurate as possible in order to trigger an adequate political response to the crisis.
As summarized by Figure 1, in order to estimate the IFR we make use of three complementary datasets. We compute the percentage p a (t) of Brazilians that have been infected by SARS-CoV-2 at the city, state and Brazilian levels via the EPICOVID19-BR data. We robustly correct for false positive and negative rates and combine prevalences from different cities without neglecting the non-Gaussian nature of the distributions (details in the Supplementary Materials). The result is shown in Figure 2 and in Table I (the federal state acronyms explanation and full numerical tables can be found in the Supplementary Materials). We note a sharp increase in prevalence between rounds 1 and 2, and a subsequent stabilization between rounds 2 and 3. The state of Pará (PA) exhibits an unexpected sharp decrease in prevalence in the last round, possibly due to a heterogeneity in the sampled population.
We obtain the number of fatalities via the public Painel Coronavírus dataset. 27 Painel Coronavírus is the Brazilian reference to keep track of the pandemic at the federal level and provides the deaths by COVID-19 with their geographic location.
We cannot compute the IFR directly via the ratio of p d and p a because, at a given timet, there are patients that developed antibodies but did not die yet from the disease. 28 In order to estimate the time delay τ ad between the development of antibodies and subsequent fatality we use the public SIVEP-Gripe dataset ("Sistema de Informação da Vigilância Epidemiológica da Gripe"), . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 21, 2020. . https://doi.org/10.1101/2020.08.18.20177626 doi: medRxiv preprint a prospectively collected respiratory infection registry data that is maintained by the Ministry of Health for the purposes of recording cases of Severe Acute Respiratory Syndrome (SARS) across both public and private hospitals. The SIVEP-Gripe dataset contains the dates of symptoms onset and death for patients with SARS-CoV-2 positive RT-PCR test, together with their geographic location, which allow us to estimate the time delay τ sd between the development of symptoms and subsequent fatality. We also make use of an empirical distribution between the first symptoms and the development of antibodies 29 to estimate the mean time-delay τ sa between both events. Together, these estimates allow us to obtain the time-delay τ ad τ sd − τ sa . For the whole Brazil we find τ ad 10.3 days. Table II summarize all the estimated time-delays which are used in our calculations (details in the Supplementary Materials).
Using this combined information we can then compute the IFR at the state and country levels: wheret is the time of a given EPICOVID19-BR phase.
The results for Brazil are given in Table I and Figure 3, the ones for the states (combining all rounds) in Figure 4. We see in Figure 3 that round 3 exhibits a considerably higher IFR, which we explore below. We note significant statistical tensions in the data of two states: Pará (PA) and Roraima (RR). We therefore consider their IFR estimates unreliable, but due to their small population they have insignificant impact on the IFR estimates at the country-level. The numerical results for all the states and for the three rounds separately can be found in the Supplementary Materials. The confidence interval was computed by combining the statistical sources of error and including the non-Gaussian nature of the distributions.
Our overall estimate of the IFR of 1.05% (95% CI: 0.96-1.17%) is in agreement with some, but not all, of the previous world estimates discussed earlier. In particular, at the country level, our combined estimate agrees qualitatively with the one by the Imperial College COVID-19 Response Team, 16 even though their result falls outside our 95% CI. At the state level we also find some disagree- ment between their values and our 95% CIs, see Figure 4. Our estimate is also very precise: the statistical error is smaller than the aforementioned similar studies. However, it may suffer from a number of systematic biases related to each of the three datasets, which we now discuss. First, we are assuming that SARS-CoV-2 antibodies remain present in the patients. Indeed, there are reports that IgG levels fade in recovered patients on a timescale of a few months. 30 Even if confirmed, this effect should have only a small impact on our results since the last round of EPICOVID19-BR was performed on 22/Jun/2020, still early in the Brazilian epidemic progression. In fact, the number of confirmed cases in Brazil had increased over 20-fold in the preceding 2 months, which means that our IFR estimates must be dominated by recent infections. In any case, the fading of IgG levels leads to an underestimate of p a and an overestimate of the IFR.
One may speculate that the observed increase in IFR in round 3 may be a result of fading antibody levels-antibodies prevalences are indeed cumulative. We checked this hypothesis and found that the higher IFR of round 3 persists if we limit the number of deaths between the delayed timet + τ ad and 90 days earlier, which is equivalent to assume a sharp drop of IgG levels after such a time period. Consequently, we rule out this explanation and rather propose that the higher IFR is due to hospital bed saturation in June and July. 31 This may be particularly relevant in the context of the public health care, which serves 75% of the population but its total spending is similar to the that of the private health care, implying that, on average, a patient in a private hospital costs three times more than one in a public hospital. 32 Second, not all COVID-19 related deaths may be registered in Painel Coronavírus. One expects this to happen for out-of-hospital fatalities and be stronger in the poorest areas with a less present health care infrastruc-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted August 21, 2020.  Figure 4. Combined IFR using all 3 rounds (maximum likelihood and 95%CI). The black dots represent model-based results by the Imperial College COVID-19 Response Team. 16 The horizontal red line is the IFR estimate for Brazil given in Table I. Two states have unreliable combined IFR results and are shown in light gray: PA has a significant decrease in pa in round 3 that cannot be a simple fluctuation; RR has a very low IFR in round 2 which is in tension with the other rounds and artificially shrinks the CI.
ture. As we are analyzing the 133 large sentinel cities that entered the EPICOVID19-BR survey this bias is not expected to be sizable. Its effect is, nonetheless, the underestimation of the IFR.
A third potential bias comes from the fact that the time in Painel Coronavírus is not the actual time of death but rather the time of notification. In order to alleviate this issue and also average out oscillations due to weekends, we smooth the dn d /dt data according to a forward 7-day moving average (details in the Supplementary Materials).
Finally, the SIVEP-Gripe dataset is biased towards cases with severe symptoms. Indeed, there is a significant number of cases that are hospitalized when symptoms are notified (see Supplementary Materials). We took this into account via a delay parameter τ ∆ = τ sd − τ sivep sd = 2 ± 1 days (see Table II) which models the time that a patient takes to go from symptoms onset to severe symptoms (details in the Supplementary Materials). Had we set τ ∆ = 0, we would have obtained for the IFR in Brazil a value of 1.02% (95% CI: 0.93-1.13%), a 3% lower estimate.
It is important to stress that the overall IFR we computed is relative to the 133 large cities that were tested by the EPICOVID19-BR survey. These cities amount to 35.5% of the Brazilian population and one may speculate that the IFR may be different in smaller cities and rural or poorer areas.
As new medications and treatment protocols for the disease are discovered and become available it is hoped that the IFR will decrease. Since our data comes from the first months of the pandemic, our results therefore also set a baseline for future comparisons of the fight against COVID-19 in Brazil.
Concluding, we hope that our careful evaluation of the IFR in Brazil will help reinforce, at the federal, state and municipal levels, the seriousness of the COVID-19 pandemic and the urgency of taking the proper actions in order to reduce its societal and economic impact.