## Abstract

In this short paper, the logistic growth model and classic susceptible-infected-recovered dynamic model are used to estimate the final size of the coronavirus epidemic.

## 1 Introduction

One of the common questions regarding an epidemic is its final size. To answer this question various models are used: analytical (Danby 1985, Brauer 2019a, b, Murray 2002), stochastic (Miller 2012), and phenomenological (Fisman D 2014, Pell et al. 2018).

In this note, we attempt to estimate the final epidemic size using the phenomenological logistic growth model (Pell et al. 2018, Chowell G 2014) and the classic susceptible-infected-recovered (SIR) model (Hethcote 2000). With both the models, we obtain a series of daily predictions. The final sizes are then predicted using iterated Shanks transformation (Shanks 1955, Bender and Orszag 1999). The data used for the calculations are taken from *worldmeters* ^{1}.

Before proceeding, we note that the final size of the epidemic in its early stage was discussed by Wu et al. (Wu, Leung, and Leung 2020) using the susceptible-exposed-infected-resistant model, by Xiong and Yan (Xiong and Yan 2020) using the exposed-infected-resistant model, by Nesteruk (Nesteruk 2020) using the SIR model, and by Anastassopoulou et al. (Anastassopoulou et al. 2020) using the SIR/death model. These early predictions range from 65000 to a million cases. Roosa et al recently gave short-term forecasts of the epidemic (Roosa et al. 2020).

## 2 Logistic growth model

The logistic growth model originates from population dynamics (Haberman 1998). The underlying assumption of the model is that the rate of change in the number of new cases per capita linearly decreases with the number of cases. Hence, if *C* is the number of cases, and *t* is the time, then the model is expressed as
where *r* is infection rate, and *K is* the final epidemic size. If *C* (0) = *C* _{0} is the initial number of cases, then the solution of (1) is
where . The growth rate, , reaches its maximum when . From this condition, we obtain that the growth rate peaks at time t_{p.}

At this time, the number of cases and growth rate are

Now, if *C*_{1}, *C*_{2}, …,*C*_{n} are the number of cases at times *t*_{1}, *t*_{2}, …,*t*_{n}, then the final size predictions of the epidemic based on these data are *K*_{1}, *K*_{2}, …, *K*_{n}. By using Shanks transformation, the predicted final epidemic size is

For the practical calculation of the parameters *K* and *r*, we use the MATLAB functions *lsqcurvefit* and *fitnlm*.

## 3 SIR model

The model equations are
where *t* is time, *S* (*t*) is the number of susceptible persons at time *t, I* = *I* (*t*) is the number of infected persons at time *t, R* (*t*) is the number of recovered persons in time *t, β* is the contact rate, and 1 *ϒ* is the average infectious period. From (1), (2), and (3) we obtain the total population size, *N*.

The initial conditions are *S* (0) = *S*_{0}, *I* (0) = *I* _{0}, and *R* (0) = *R*_{0}.

Eliminating *I* from (1) and (3) yields

In the limit *t* → ∞, the number of susceptible people left, *S*_{∞}, is
where *R*_{∞}is the final number of recovered persons. As the final number of infected people is zero, we have, using (4),

From this and (6), the equation for *R*_{∞}is

To use the model, we must estimate the model parameters *β, ϒ*, and the initial values *S*_{0} and *I*_{0} from the available data (we set *R* = 0 and *I*_{0} *= C*_{0}).

Now the available data is a time series of the total number of cases *C*, i.e.,

We can estimate the parameters and initial values by minimizing the difference between the actual and predicted number of cases, i.e., by minimizing

Where *C*_{t} = (*C*_{1},*C*_{2},…,*C*_{n}) are the number of cases at times *t*_{1}, *t*_{2}, …,*t*_{n} and are the corresponding estimates calculated by the model. For practical calculation, we use the MATLAB function ` fminsearch`. For the integration of the model equation, we use the MATLAB function

*ode45*.With a series of predicted final number of recovered persons, *R*_{∞,1}, *R*_{∞,2},…, *R*_{∞,n}, we can estimate the series limit by Shanks transformation.

## 4 Results

The results of logistic regression and the SIR model simulation are given in Tables 1 and 2, respectively. The comparison of the predicted final sizes is shown in the graph in Figure 1. We see that both methods converge and with more data, the discrepancy between the predicted values becomes less than 5%. From Table 1, we see that the peak of the epidemic was probably on 9 Feb, 2020.

In Figure 2, the time evaluation of the cases is shown, where we can see a good agreement between the models and the actual data. From Table 3, we see that the logistic regression model has a high coefficient of determination of 0.996, while the p-value (< 0.000) indicates that all the regression parameters are statistically significant.

In Tables 3 and 4, the iterated Shanks transformations for the predicted series of the final epidemic size are given. It appears that the predictions of the logistic model tend to the final size of 83231 cases, while the SIR model predictions converge to 83640 cases. Thus, the discrepancy is less than 0.5%.

## 5 Short term forecasting

The models used are data-driven, so they are as reliable as data are. Namely, as can be seen from the graph in Figure 2 at the beginning, we have exponential growth. Then until 11 Feb, one can predict the final epidemic size of about 55000 cases. However, the collection of data changes and we have a jump of about 15000 new cases on 12. Feb. On 20 Feb we have another change in trend; the data begin to shows almost linear trend (See Fig 3). While the above models show that the epidemic is slow down, the linear trend predicts about 873 new cases per day (see Table 5).

## 6 Conclusion

On the basis of the available data, we can now predict that the final size of the coronavirus epidemic using the logistic model will be approximately 83700 (±1300)?cases and that the peak of the epidemic was on 9 Feb 2020. A more optimistic final size of 83300 cases is obtained using the Shanks transformation. Similar figures are obtained using the SIR model, where the predicted size of the epidemic is approximately 84500, and the Shanks transformation lowers this number to about 83700 cases. Naturally, the degree of accuracy of these estimates remains to be seen.

In conclusion, qualitatively, both models show that the epidemic is moderating, but recent data show a linear upward trend. The next few days will, therefore, indicate in which direction the epidemic is heading.

PS. Today it is more or less clear that the predictions of the article apply only to China. By February 20, 99% of the case was from China. The linear trend in data from Feb 20 onward meant a decreasing number of infected in China and increasing infected elsewhere in the world. In other words, in China, the epidemic is slowing down, however, it is now developing elsewhere in the world. We note that the forecasting methods used in this article are inapplicable in the early stages of an epidemic.

## Data Availability

The data used are from https://www.worldometers.info/coronavirus/