## Abstract

In this work we are going to use estimates of Infection Fatality Rate (IFR) for Covid-19 in order to predict the evolution of Covid-19 in Brazil. To this aim, we are going to fit the parameters of the SIR model using the official deceased data available by governmental agencies. Furthermore, we are going to analyse the impact of social distancing policies on the transmission parameters.

## 1 Introduction

Currently, the world is experiencing an unprecedented serious pandemic crisis, which has led to the collapse of health systems in several countries, with the aggravating lack of hospital beds and individual protection material, as well as insufficient testing supplies for the diagnosis of patients. In this sense, the data of infected patients does not always correspond to reality, since there is a much larger number of cases not reported by governmental health agencies, which is called under-reporting. The estimate of under-reporting in most countries has been 10 for each patient notified, based on the fatal infected rate (IFR) compared to this rate before the pandemic was established (Ferguson et al., 2020). Due to the limited information available, most parameters describing the dynamics of the disease spread involve significant uncertainties (Lachmann et al., 2020). Healthcare systems in most countries are not capable of monitoring the exponential growth of a virus in this manner. South Korea, as of writing, has one of the most extensive capabilities of testing individuals per capita, with a capacity of more than 20,000 tests a day. Hence, South Korea represents the best benchmark country in order to predict the COVID-19 CFR Lachmann et al. (2020). Another important factor to be considered is the effect of social distancing on the value of the transmission parameter of the disease, which is not considered in classic epidemiology models. This is the relevant aspect of the model that we are considering in this paper.

Severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) are two highly transmissible and pathogenic viruses that emerged in humans at the beginning of the 21st century (Cui et al., 2018). The recently emerged Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) pandemic, with first cases reported in Wuhan, China in late December, 2019, quickly spread to other countries and was declared by the World Health Organization (WHO) as of January 30, 2020, an Public Health Emergency of International Concern (PHEIC). PHEIC are extraordinary events which pose a large scale public health risk with international spreading and which, in general, require a coordinated response. In Brazil, national public health emergencies (NPHE) are defined according to Brazilian Ministry of Health (MoH) as events that represent risks to public health and that occur in situations of outbreaks or epidemics (as a result of unexpected agents or reintroduction of eradicated diseases or with high severity), disasters and of lack of assistance to the population, which go beyond the response capacity of the state (Croda et al., 2020).

The coronavirus 2019 (COVID-19) pandemic has been spreading globally for months, yet the infection fatality ratio (IFR) of the disease is still uncertain. This is partly because of inconsistencies in testing and death reporting standards across countries (Rinaldi and Paradisi, 2020). IFR is the ratio of deaths divided by the number of actual infections with SARS-CoV-2 (Condit, 2020). However for the coronavirus 2019 (COVID-19) infection with its broad clinical spectrum from asymptomatic to severe disease courses, the IFR is the more reliable parameter to predict the consequences of the pandemic. Case fatality rate (CFR) is the proportion of the number of deaths divided by the number of confirmed patients of a disease, which has been used to assess and compare the severity of the epidemic between countries. The rates can also be used to assess the healthcare capacity in response to the outbreak (Kim et al., 2020). According to Condit (2020), the IFR is likely to be significantly lower than the CFR because usually only people with severe symptoms are being tested but exist a large number of infections with SARS-CoV-2 result in mild or even asymptomatic disease.

The first case reported in Brazil was on February 26, 2020, in the city of São Paulo, of a 61-year-old man who had made a trip to Italy at that time SanarMED (2020). Until May 20, the state of São Paulo recorded 5,363 deaths from the new coronavirus, with 216 deaths confirmed in the last 24 hours. The state also totals 69,859 confirmed cases of COVID-19, with one or more people infected in 484 cities. At least one fatal victim was recorded in 223 municipalities. Another contribution to the understanding of this pandemic has been made by our research group Diniz (2020), in order to analyze the number of confirmed cases and number of deaths, proportionally the population of each state, that is necessary to view these number in the same scale, avoiding the serious mistake of comparing the absolute numbers. This methodology allowed to make a ranking with all the federate units, including Brazil that served for the national average.

In order to mitigate the effects of the pandemic in the health system, governments have been adopting several policies of social distancing world-wide. In general, these social distancing policies range from just avoiding overcrowding, to closing school, bars, restaurants and so on to complete lock-down. Furthermore, some countries and cities have been adjusting their social distancing policies over time, depending on to the pressure over the health system. On the other hand, people not always follows the rules imposed by governments. With this in mind, we are going to assume that the fluctuations on the levels of social distancing policies can be analyzed looking at changes in the transmission parameter of the SIR model. In order to analyze this changes, we are going to split the data into disjoint subsets, fit the model to the data in each one of this subset and carry out an analysis of variance on the parameters.

## 2 The SIR model

A classic mathematical model to describe the spread of an infectious disease on a population as function of time is the SIR model. In this model, individuals of a population are labeled in three categories: susceptible (S), infected (I) and recovered (R). Considering the proportion of each category in relation to the total population, such proportions evolve in time according to the following equation
in which *α* and *β* are both positive parameters Edelstein-Keshet (2005).

In the equation (1), parameter *α* is responsible for how often a susceptible-infected interactions results in a new infection by units of time. Thus, *α* proportional to the probability of transmitting disease between a susceptible and an infectious individual. Yet *β*, is a recovery rate, measured by units of time, so that 1*/β* is the the average duration of infection in the organism of individuals.

The behavior of SIR model solution depends on the values of *α* and *β*, as well as in the initial proportions *S*_{0}, *I*_{0} and *R*_{0}. And so, an important issue is this kind of model is in what conditions the infection spreads on the population. In order for this to happen, the derivative of *I*(*t*) must be positive, that is,

Since at the beginning almost all individuals are susceptible, then *S ≈* 1 and thus, last equation becomes

The ratio *α/β* is called *ℛ*_{0} and can be interpreted as the number of secondary infections from a single case in a completely susceptible population.

## 3 The fitting procedure and simulation study

In order to obtain the parameters to the SIR model we are going to use official data updated daily by the governmental agencies. Due to reliability, we are especially interested in the cumulative number of deaths over time. From now on, time *t* = 0 is at March 3rd of 2020 and *t* = *N* is at June 20 of 2020.

For the fitting procedure we are assuming that the recovery time *τ* is a random variable uniformly distributed on the interval [*τ*_{1}, *τ*_{2}] while the IFR for Covid-19 is a random variable uniformly distributed on the interval [*µ*_{1}, *µ*_{2}]. Therefore, if *µ ∈*[*µ*_{1}, *µ*_{2}] is the IFR for Covid-19 and *d*_{i} is the cumulative number of deaths at time *i* then, by definition, the number of recovered is given by .

Let *R*_{i} be the number of recovered at time *i* given by the solution of SIR model. As the actual number of infected are not known, to estimate the transmission parameter *α* we minimize the square error function given by

In all cases, we use *R*_{0} = *r*_{0} and *S*_{0} = 1 *− R*_{0} *− I*_{0}.

As mentioned before, we are considering uncertainties in both the IFR and the time to recovery *τ* so that we randomize over these parameters. So, the fitting procedure follows the steps:

Considering

*µ*and*τ*distributed over their domains, randomly select*µ ∈*[*µ*_{1},*µ*_{2}],*τ ∈*[*τ*_{1},*τ*_{2}], defining*β*= 1*/τ*and*r*_{i}=*d*_{i}*/µ*;Randomly select

*i*_{1},*i*_{2},*…, i*_{K}*∈ {*0, 3,*…, N}*, allowing repetition;Minimize function 2 using the data .

The steps 1

*−*3 are repeated*M*times.

For each one of the *M* iterations we set *S*_{0} = 1 *− I*_{0} *− R*_{0} and *R*_{0} = *r*_{0}. In the minimizing step 3, we are using the *fminsearch* function from Matlab, and thus, we consider the minimizing steps reaches a local minimum in cases in which the options *Tolfun* and *TolX* are less than 1*e −* 5.

### 3.1 Capturing social distancing effects

As described in the Introduction, we want also to capture the effects of social distancing over time. In this case, we are assuming that social distancing policies confer changes in the value of the transmission parameter *α* and, consequently, in the reproductive number of the disease. Thus, looking at different subsets of the data are going to produce a different fitting and we can compare the changes in the parameters.

In this case, we adapt the previous fitting procedure performing simulations as the following:

Considering

*µ*and*τ*distributed over their domains, randomly select*µ ∈*[*µ*_{1},*µ*_{2}],*τ ∈*[*τ*_{1},*τ*_{2}], defining and ;For each

*i ∈ {*1, 2,*…, N −*1*}*, minimize the function 2 using the data*r*_{i−1},*r*_{i}and*r*_{i+1}, setting iterations we set*S*_{0}= 1*− I*_{i}*− R*_{i}and*R*_{0}=*r*_{i}.The steps 1

*−*2 are repeated*M*times.

Each one of these *M* simulations produces a value of *α* at some *t* = *i, i ∈ {*1, 2, *…, N −* 1*}*. That is, each of these *M* simulations finds a SIR solution that approaches better to the data *r*_{i−1}, *r*_{i} and *r*_{i+1}. Thus, in order to compare these values we calculate the reproductive number *ℛ*_{t} at *t* = *i* by

Next section we are going to use the procedures describe in this section in order to analyze the dynamics of the pandemic.

### 3.2 Analysis of the simulations

To analyse the results obtained in the simulations we calculate the averages of the estimates and the confidence intervals (95%), the root-mean-square errors and the empirical standard error (ESE). For ** θ** = (

*α, ℛ*

_{t}), the simulated root-mean-square errors for the

*M*simulations is obtained from the expression

The empirical standard errors (ESE) are obtained from the root of variance for the *M* simulations given by
in that and represents the estimated of *θ*_{j} in the *i*-th simulation.

## 4 Results

In this section, we present and discuss the results obtained using the methods described in last section.

### 4.1 Under-reporting

To the results of this section we set *M* = 100 and *K* = 65, which corresponds to 75% of the data available. Furthermore, we consider in which *τ* is a random variable uniformly distributed over [9, 19] and *µ* is a random variable uniformly distributed over [0.005, 0.01].

To measure the under-reporting we consider the mean simulation results for cumulative cases as described in Section 3 and calculate the ratio over the official cumulative cases reported by governmental agencies. That is, we have been considering *C*_{i} = *R*_{i} + *I*_{i} the mean cumulative cases predicted by the SIR model and *c*_{i} the reported number of total cases, we compute the under-reporting ratio by

In Figure 1 we can see the evolution of the under-reporting ratio over time.

Using the expression given in equation (3), we calculate the daily underreporting rates for Brazil considering the period from March 28 to June 22. With these values, we generate Figure 1 and calculate the average underreporting for Brazil that was approximately 18.16 with a standard deviation of approximately 7.52 and a coefficient of variation of approximately 41.39%. That is, each reported case corresponds to a 1*/*18.16 of the actual number of people that have contact with the virus.

However, it is worthy to note that the under-reporting have been decreasing in last days. In Figure 3 we can see the under-reporting ratio is about 10 (9.78 is the average in last 10 days) meaning that each reported case represent (1/10) of actual of people that have contact with the virus.

### 4.2 Social distancing in Brazil

In this section we present the results for reproductive number *ℛ*_{t} and six distance measurements available at Google LLC (2020).

To the results of the reproductive number *ℛ*_{t} of this section we set

*M* = 10000 and *K* = 65, which corresponds to 75% of the data available.

Furthermore, we consider in which *τ* is a random variable uniformly distributed over [9, 19] and *µ* is a random variable uniformly distributed over [0.005, 0.01].

The data available at Google LLC (2020) contain movement trends over time by geography, across different categories of places such as retail and recreation, groceries and pharmacies, parks, transit stations, workplaces, and residential. These measurements are:

*M*_{1} : Mobility trends for places like restaurants, cafes, shopping centers, theme parks, museums, libraries, and movie theaters.

*M*_{2} : Mobility trends for places like grocery markets, food warehouses, farmers markets, specialty food shops, drug stores, and pharmacies.

*M*_{3} : Mobility trends for places like national parks, public beaches, marinas, dog parks, plazas, and public gardens.

*M*_{4} : Mobility trends for places like public transport hubs such as subway, bus, and train stations.

*M*_{5} : Mobility trends for places of work.

*M*_{6} : Mobility trends for places of residence.

The measurements (*M*_{i}, *i* = 1, …, 6) shows how visits and length of stay in different locations change compared to a reference value. The changes for each day are compared with a reference value corresponding to the same day of the week. The reference value is the median of the corresponding day of the week, during the period of five weeks from January 3 to February 6, 2020 (Google LLC, 2020).

According to Google LLC (2020), these measurements is intended to be used to mitigate the impact of Covid-19. It shouldn’t be used for medical diagnostic, prognostic, or treatment purposes. It also isn’t intended to be used for guidance on personal travel plans. Each Community Mobility Report data-set is presented by location and highlights the percent change in visits to places like grocery stores and parks within a geographic area. How to use this report. Location accuracy and the understanding of categorized places varies from region to region, so it isn’t recommended using this data to compare changes between countries, or between regions with different characteristics (e.g. rural versus urban areas).

To seek for a correlation between those measurements and their effects on the dynamics of the epidemic in Brazil we group the data by week considering average values. Thus, the first column in 1 shows the week number using 2020 as the reference year. Yet the values of *ℛ*_{t} are average values of the *ℛ*_{t} values, obtained by using the procedure described in subsection 3.1, in which the *ℛ*_{t} values are grouped by week. similarly, each *M*_{i} value shown in Table 1 represents a weekly averages of the values generated by the site Google LLC (2020).

By the dynamics of the reporting process and processing data from by authorities, we are assuming that there is a time-lag between time of the infection. That is, fluctuations in the measurements *M*_{i} are going to affect the reproductive number *ℛ*_{t} later on in time. Thus, to account for that time-lag we introduce a variable *τ ∈* [*−*2, 0], in which the lower bound *−*2 is chosen due to the data constraints in measurements *M*_{i}.

In this way, in order to find a correlation between the values of *ℛ*_{t} and *M*_{i} we calculated the Pearson’s correlation coefficient between (*ℛ*_{9}, *ℛ*_{10}, …, *ℛ*_{25}) and For a fixed *τ ∈* [*−*2, 0] the value of . is obtained by linear interpolation. The first row in Table 2 shows the time-lag in which the maximum value for is attainable. In the second row of Table 2 we see the value of at *τ*.

According to Table 2 we see that there is a moderate to high positive linear correlation between distance measures (*M*_{i}, *i* = 1, …, 5) and *ℛ*_{t}, that is, as the movement in locations specified by Google LLC (2020) increases, *ℛ*_{t}tends to increase. On the other hand, last measurement (*M*_{6}) is negatively correlated with *ℛ*_{t}. As we discuss previously, the measurement *M*_{6} refers to mobility trends for places of residence, that is, decreasing traffic on streets in residential neighborhoods there is also a decrease in *ℛ*_{t}.

### 4.3 Epidemiological curve

One of the most important issue in epidemiological dynamics is to determine the time in which the number of infected start to decrease. In Figure 2 we show the epidemiological curve for the number of infected people. In Figure 3 we show the cumulative number of cases in proportion to the total population.

According to the procedure adopted in this work, the number of infectious is expected to increase up to mid-September, after which the infectious start do decrease. Figure 3 shows us that about 50% of the population must be infected til April 2021.

The *ℛ*_{0} for the epidemiological curve in Figure 2 is, on average, approximately 1.36 with a standard deviation of approximately 0.07 and a coefficient of variation of approximately 5.11%

## 5 Conclusion

In this work we use the Infection Fatality Rate (IFR) in order to predict the evolution of the SARS-CoV-2 in Brazil. The main assumption to use the IFR is that we believe the accuracy of deaths is greater than the accuracy of the other data reported by governmental authorities.

By this approach we infer that the cumulative numbers of cases in Brazil is under-estimated. The ratio of under-reporting is about 18.16 meaning that each reported case corresponds to a 1*/*18.16 of the actual number of people that have contact with the virus. However, it is worthy to note that the under-reporting have been decreasing in last days. In Figure 3 we can see the that under-reporting ratio has been steady in about 10 (9.78 is the average in last 10 days) meaning that each reported case represents (1/10) of actual of number people that have contact with the virus.

Another important result in this work concerns about the effects of social distancing policies on the dissemination of the pandemic in Brazil. We infer that there is strong correlation between the reproductive number *ℛ*_{t} and social distancing as measured by Google LLC (2020) (presented in Table 1) in which the lowest value of *ℛ*_{t} occurred with the greatest adherence to social distance, and this conclusion is measured by the Pearson correlation coefficient as presented in Table 2. This is important result that can be used by authorities in order to measure how much must be the social distancing measurements in order to attain some target in the spread of the pandemic.

Finally, we believe that this approach can be helpful not just to predict the dynamics of the epidemic but also to guide policies that can control the spread of SARS-CoV-2.