Forecasting daily COVID-19 confirmed, deaths and recovered cases using univariate time series models: A case of Pakistan study

The increasing confirmed cases and death counts of Coronavirus disease 2019 (COVID-19) in Pakistan has disturbed not only the health sector, but also all other sectors of the country. For precise policy making, accurate and efficient forecasts of confirmed cases and death counts are important. In this work, we used five different univariate time series models including; Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Nonparametric Autoregressive (NPAR) and Simple Exponential Smoothing (SES) models for forecasting confirmed, death and recovered cases. These models were applied to Pakistan COVID-19 data, covering the period from 10, March to 3, July 2020. To evaluate models accuracy, computed two standard mean errors such as Mean Absolute Error (MAE) and Root Mean Square Error (RMSE). The findings show that the time series models are useful in predicting COVID-19 confirmed, deaths and recovered cases. Furthermore, MA model outperformed the rest of all models for confirmed and deaths counts prediction, while ARMA is second best model. The SES model seems superior to other models for prediction of recovered counts, however MA is competitive. On the basis of best selected models, we forecast form 4th July to 14th August, 2020, which will be helpful for decision making of public health and other sectors of Pakistan.


Introduction
the spread of COVID-19 using the case of Malaysia and scrutinized its linkage with some external factors e.g. inadequate medical resources and incorrect diagnosis problems. They have used epidemiological model and dynamical systems technique and observed that might misrepresent the evaluation on the severity of COVID-19 under complexities. In order to forecast agreement to the publicly available data, the work in [21] used Fractional time delay dynamic system (FTDD). The author in [22] used Generalized logistic model and found the pandemic growth as exponential in nature in China. The author in [23] used genetic programming (GP) models for confirmed cases and death cases in three highly COVID-19 affected states of India i.e. Maharashtra, Gujarat, Delhi and whole India. They have statistical validated the evolved models to find that the proposed models based on GEP use simple interactive functions and can be highly relied upon time series forecasting of COVID-19 cases in the context of India. Based on the spreading behaviour of the COVID-19 in the mass, [24] estimated three novel quarantine epidemic models. They found that isolation at home and quarantine in hospitals are the two most effective control strategies under the current circumstances when the disease has no known available treatment. In the work [25] using positive cases over 50 days of disease progression for Pakistan, analysed the graphical trend and using exponential growth forecasted the behaviour of disease progression for next 30 days. They assume different possible trajectories and projected estimated 20k-456k positive case within 80 days of disease spread in Pakistan.
Due to the mutated nature of the virus, the situation has become graver with little known about the cure, there remain greater uncertainty about the probable time-line of this disease. Hence, forecasting for short term is immensely important to get the clue for predicting the flattening of curve and revival of routine social and economic life [26]. Statistical models using evidence from real world data can help predict the location, timing, and the size of outbreaks, allowing governments to allocate resources more effectively, to conduct scenario and signal analysis, and to determine policy approaches. Epidemiological tools can then be applied to limit the scope and spread of outbreaks. However, these approaches are sensitive to the underlying assumptions and hence impact vary [27]. It is important to ensure oversight, check assumptions in modelling; and ensure the veracity, reliability, and accountability of these tools in order address bias and other potential harms. In this work, attempt to look at the projections for COVID19 infections of Pakistan, using a number different univariate time series methods.
The rest of article is arranged as: Section two described forecasting models and three disused the out-of-sample and forecasting results. Finally, Section four comprises of conclusion and discussion.
3 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Forecasting Models
In this work, we consider five different univariate time series models including; Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Nonparametric AutoRegressive (NPAR) and Simple Exponential Smoothing (SES). These models are described with detail in the following: 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2020. . https://doi.org/10.1101/2020.09.20.20198150 doi: medRxiv preprint

Autoregressive Process
A linear Autoregressive (AR) process describes a linear function of the previous n observations of M ( t), is defined as: where α and γ i (i = 1, 2, · · · , n) are the intercept and slope coefficients of the underlying AR process and t is the disturbance term. After, an examination graphical analysis (plotting the series residuals, ACF and PACF), fit an AR(2) M t to each time series.

Moving Average Model
Moving Average (MA) model is primarily remove the periodic fluctuations in the time series data, for example fluctuations due to seasonality. The Moving average model mathematically can be written as: α indicate the constant (intercept), j (j = 1, 2, · · · , s) are parameters of MA model and the j is white process. The values of s are revealing the order of the MA process.

NonParametric Autoregressive Model
The additive nonparametric counterpart of AR process leads to additive model, where the association between M t , and its previous lags have non-liner relationship, which may be describe as: where g i are showing smoothing functions and describe the association between M t and its previous values. In the recent case, functions g i are denoted by cubic regression splines. As in case of parametric form, we utilized 2 lags while estimating NPAR.

Autoregressive Moving Average Model
Autoregressive Moving Average (ARMA) model can be define as, the response variable M t is regressed on the previous n lags also with residuals (errors) as well. Mathematically, where α denotes intercept, γ i (i = 1, 2, · · · , n) and φ k (k = 1, 2, ·, m) are the parameters of AR and MA process respectively, and t is a Gaussian white noise series with mean zero and variance σ 2 . The ARMA model order selection is established through inspecting the correlograms (i.e. Partial and Auto-correlation function (P-ACF)). In our case, fit an ARMA (1, 1) model to each series M t .

5
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2020. . https://doi.org/10.1101/2020.09.20.20198150 doi: medRxiv preprint

Simple Exponential Smoothing Model
The Simple exponential smoothing (SES) model of forecasting allows the researchers to smooth the time series data and then use it for out of sample forecasting. SES model is applicable when the data is stationary i.e., no trend and no seasonal pattern but the data at level changing gradually over time.
where γ 1 is the smoothing constant, M t is showing the actual series,M t,k is representing the forecasted value of the underlying series for period t andM t+1,k is denoting the forecasted value for the period t + 1. This method assigns the weights in such a way that moving back from the recent value, the weights exponentially decreases. For the modelling purpose, a prime assumption of time series data is stationarity. A

At level At first difference Variables Constant with trend Constant with trend Conclusion
Cases  stationary process is defined as that the mean, variance and autocorrelation structure are time invariant. If the underlying series is nonstationary, it must be transform to stationary. In the literature, different techniques are used to achieve stationarity, for example, taking natural log, differencing the series or box-cox transformation etc [28]. In this work, the COVID-19 confirmed, deaths and recovered counts times series are plotted in Figure 1 (left-column) daily and Figure 1 (right-column) cumulative cases. Clearly seen, all the three daily time series having an upward increasing linear trend, which show that the series is non-stationery, hence need to make stationary using differencing method. Also, to check the unit root issue of the underlying series that are conformed, deaths and recovered cases, we apply Augmented Dickey Fuller test (ADF) test. The results are tabulated in Table1, which suggested that the all three series are non-stationary at level. However, taking first order difference, the series are turned out to be stationary. The first order differencing series of daily confirmed, deaths and recovered cases are piloted in Figure 2, where now the series do not contain any trend, hence its become stationary.

6
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

Results
In this paper, we used daily COVID-19 conformed, deaths, and recovered cases for Pakistan. The dataset was obtained by WHO[3], the each series ranges from 10, March 2020 to 3, July 2020. The complete dataset covers 116 days, of which data from 10, March 2020 to 19, May 2020 (71 days) were used for model training and from 21, May to 3, July 2020 (45 days) for one-day ahead post-sample (testing) predictions. For the predicting accuracy, two accuracy measures, Root Mean Square Error (RMSE) and 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted September 22, 2020. . Mean Absolute Error (MAE) for each model were computed as follows: where M t = Observed andM t = predicted values for t th day (t: 1, 2, · · · , 45).
To evaluate the best model of among the previously described models for each series, we computed two standard accuracy measures and presented the outcomes in Table 2 Figure 3 where the superiority of MA (confirmed and deaths cases)and SES (recovered cases) models can be evidently seen in both cases training and testing exercise.
The day-specific confirmed, deaths and recovered case are plotted in Figure 4, over the period of 21, March to 19, June 2020. From the Figure 4(left-column) can be observed that variation among the different weeks, while Figure 4(right-column) mean of days are plotted for conformed, deaths, and recovered cases. where clearly seen that the an increasing pattern Saturday to Friday, which is show that the effect of working and non-working days.
Once the best models assessed through the out-of-sample mean errors (RMSE, MAE), then we proceed for future forecasting with the superior model in each case. We used MA for confirmed and deaths cases and SES for recovered cases and forecast from 4, July to 14, August 2020 for both daily and cumulative cases. The forecasted values are seen in Figures 5, clearly revealing that deaths and recovered cases are monotonically increasing, while conformed counts are not. The confirmed cases on 14, August 2020 are expected 7,325 and cumulative cases 413,639, deaths during the end of mid August are expected 121 and cumulative counts are 9,279, and the recovered cases are 10,730 and cumulative are 455,661. Overall, the results suggested that, the increasing of confirmed case are gradually decreased, which was the outcome of Government imposed earlier steps such as cancelled conferences to disrupted supply chains, imposed travel restrictions, closing of borders, tremendously wedged travel industry, close flights and within country disrupted work, closing of shopping mall, school, colleges and universities. For awareness of peoples different TV programs, commercial and advertisements were organized. Face mask and sensitizer were used by each and every person.

8
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Conclusion
The main purpose of this work was to forecast confirmed, deaths and recovered cases of COVID-19 for Pakistan using five different univariate time series models including; Autoregressive (AR), Moving Average (MA), Autoregressive Moving Average (ARMA), Nonparametric Autoregressive (NPAR) and Simple exponential smoothing (SES) models. The dataset of confirmed, deaths and recovered cases ranges from 10, March to 03, July 2020 was used. For model estimation/training was used from 10, March 2020 to 19, May 2020 and 20, May to 3, July 2020 were used for one-day-ahead out-of-sample predictions. To check the predicting performance of all models, we use RMSE and MAE as mean errors. Moreover, MA model beat the rest of all models for confirmed and deaths counts prediction and SES appears to be superior as compare to other models for prediction of recovered cases. At the end, on the bases of these best models, we forecast future 4, July to 14, August 2020, which can help decision making in public health and other sectors for the entire country. Furthermore, this work may help in remembering present socio-economic and psychosocial misery affected by COVID-19 amongst the public in Pakistan.

11
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2020. . https://doi.org/10.1101/2020.09.20.20198150 doi: medRxiv preprint 12 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2020. . https://doi.org/10.1101/2020.09.20.20198150 doi: medRxiv preprint