Modeling of the COVID-19 Cases in Gulf Cooperation Council (GCC) countries using ARIMA and MA-ARIMA models

Coronavirus disease 2019 (COVID-19) is still a great pandemic presently spreading all around the world. In Gulf Cooperation Council (GCC) countries, there were 1015269 COVID-19 confirmed cases, 969424 recovery cases, and 9328 deaths as of 30 th Nov. 2020. This paper, therefore, subjected the daily reported COVID-19 cases of these three variables to some statistical models including classical ARIMA, k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA to study the trend and to provide the long-term forecasting of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries. The data analyzed in this study covered the period starting from the first case of coronavirus reported in each GCC country to Nov 30, 2020. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven’t specific models.

The main objective of this article is to model confirmed, recovery, and death cases of COVID-19 using classical ARIMA besides the three types of k th Moving Average-ARIMA (k th MA-ARIMA), including k th Simple Moving Average-ARIMA (k th SMA-ARIMA), k th Weighted Moving Average-ARIMA (k th WMA-ARIMA) and k th Exponential Weighted Moving Average-ARIMA (k th EWMA-ARIMA) in the GCC countries. This study starts from the first case of coronavirus reported in each GCC country to Nov 30, 2020. This article's main contribution is that it considers the only study that used the classical ARIMA together with k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA to model the three variables confirmed, recovery, and death cases of COVID-19 in the GCC countries.
The organization of the paper is as follows. Section 2 describes the study area and data collection. Section 3 briefs the methodology used in the study. The article ends with the results and discussion in Section 4, and conclusions in Section 5.

Study Area and Data collection
To achieve this study's objectives, all six countries within the GCC were included (Saudi Arabia, United Arab Emirates, Qatar, Kuwait, Bahrain, and Oman). The sample data consist of daily reported COVID-19 cases of 3 variables involving confirmed, recovery, and deaths in each country. The data cover the period starting from the first confirmed case of COVID-19 reported in each country to Nov 30, 2020. The data extracted from the WHO situation reports, Sehhty website, and Wikipedia.

Methodology
This paper's main goal is to model 3 variables involving daily confirmed, recovery, and death cases in GCC countries using classical ARIMA besides the three types of k th MA-ARIMA including k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA. Therefore, this section investigates each of these models, discussing model building and model evaluation.

ARIMA Model
ARIMA model, which was developed by Box and Jenkins (1994), is a statistical model that uses time series data to study the trend and generate future forecasting of time series data. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; For a given non-stationary time series , the classical ( , , ) model is defined as Where is the backward shift operator, (1 − ) is the difference filter, is a number of times, need to differentiate to make the data stationary, is the order of autoregression, is the order of moving average, ( ) = 1 − 1 − 2 2 − … − , ( ) = 1 + 1 + 2 2 + ⋯ + + and ∼ (0,1).
ARIMA model is a generalized model that integrates the autoregressive model ( ) and the moving average model ( ), ARIMA models that do not require differencing are considered as ARMA models, therefore model (1) can be expressed as polynomials of autoregressive ( ), residuals ( ), and a combination of them ( , ) as

The k th SMA-ARIMA Model
The k th SMA-ARIMA process of a time series and it is the corresponding back-shift operator are defined, respectively, by

The k th WMA-ARIMA Model
The k th WMA-ARIMA process of a time series and it is the corresponding back-shift operator are given, respectively, as ̂=

The k th EWMA-ARIMA Model
The k th EWMA-ARIMA process of a time series and it is the corresponding back-shift operator are computed, respectively, as

Model selection criteria
Model selection criteria are rules used to select a statistical model among a set of candidate models based on the observed data. The Akaike information criterion (AIC) is a widely used model selection tool due to its computational simplicity and effective performance in many modeling frameworks. The AIC is given as (Akaike, 1974) = −2 log + 2 (11) Where is the likelihood of the model and is the total number of estimated parameters in the model. A good model is the one that has the minimum AIC among all other models.

Measures of forecast accuracy
The most popular measure of forecast accuracy in univariate time series data is the Root Mean Square Error (RMSE) proposed by Hyndman and Koehler (2006). The RMSE is computed as where and ̂ are the actual and predicted values at time , respectively, and is a sequence of time points. The lower value of RMSE indicates better calibration and, therefore, better performance. reuse, remix, or adapt this material for any purpose without crediting the original authors.
preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint

Checking model's goodness of fit
After the ARIMA or k th MA-ARIMA model, which is considered appropriate among the alternatives, is put in place, it can be tested for a goodness fit, which entails testing its efficiency. The model is assumed to be a good fit if the residuals are approximately equal to the white noise. The essential tools are the plots of ACF and PACF. The Box-Ljung test is a diagnostic tool used to test the lack of fit of a time series model. This test is applied to the residuals of a time series after fitting an ARIMA or k th MA-ARIMA model to the data. The test examines autocorrelations of the residuals. The null and alternative hypothesis for this test is 0 : The model does not exhibit a lack of fit, or there is no serial correlation among lags 1 : The model exhibits a lack of fit, or the residuals are approximately equal to the white noise.

Results and Discussion
This section first demonstrates summary statistics for the three variables, confirmed, recovery, and death cases in each GCC country, then reports and discusses the results obtained from applying the ARIMA and k th MA-ARIMA models on these variables. Table 1 shows the summary statistics measures, including mean and standard deviation of the confirmed, recovery, and death cases of COVID-19 among the GCC countries.

Summary Statistics for COVID-19 confirmed, recovery and death cases
Moreover, Table 1 also demonstrates the prevalence of confirmed cases per 100000 population for the first four weeks. with a standard deviation of (1210.86), followed by UAE, Kuwait, and Oman; on the other side, Bahrain has the lowest mean (306.18) with a standard deviation of (215.08). For recovery cases, KSA has the highest mean, followed by UAE, Kuwait, Qatar, and Oman, but Bahrain has the lowest one. KSA has the highest mean of reported death cases, followed by Oman, Kuwait. On the other hand, Qatar has the lowest one. It can be also seen that in the first 4 weeks of COVID-19 outbreak, Qatar and Bahrain have the highest prevalence of confirmed cases of 18 and 16 infected persons per 1000000, respectively. In contrast, UAE and Oman have the lowest ones of 1 and 1.1 per 1000000, respectively (see Figure 1).

Prediction model for COVID-19 confirmed, recovery and death cases
This paper uses the time series, daily COVID-19 confirmed, recovery, and death cases in each GCC country. Therefore, we have a time series presented as follows: where represents the confirmed, recovery or death cases at day and 1 denotes the date of the first case of COVID-19 detected in a given country. The time-series plot of the daily COVID-19 confirmed, recovery, and death cases for GCC countries is presented in Figure   2, Figure 3, and Figure 4, respectively.    Oman reuse, remix, or adapt this material for any purpose without crediting the original authors.
preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share,

Prediction Method
To compute the best parameters estimates of ARIMA, k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models, these models were fitted for 90% of the available data in each country which is called the in-sample forecast or training data and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC of Eq. (11) was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE of Eq.(12) was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The calculations were performed using R studio version 1.2.5033 and EViews 10.

ARIMA model for COVID-19 confirmed, recovery and death cases
To check whether the daily COVID-19 confirmed, recovery and death cases time series in each country were stationary; we carried on ADF root test. The results of the ADF unit root test are demonstrated in Table A.1 in the Appendix. Based on Table A.1, we conclude that all variables are stationary with constant and trend at first differences throughout the study period; therefore, the ARIMA model can be done. After the stationarity of the confirmed, recovery, and death cases time series in each country were determined, the best ARIMA model that fit these 3 variables well for training data with the minimum AIC and lowest RMSE were selected. Table 2 summarizes the best ARIMA model for the confirmed recovery and death cases in each country and their corresponding RMSE and AIC.   Table 2 can be interpreted in the same manner.

k th MA-ARIMA model for COVID-19 confirmed, recovery and death cases
We can summarize the process of developing the k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models as follows: 1. Transforming the original time series into the new one ( , , ) for = 2,3, … , 5 by using Eq. (5), Eq. (7), and Eq. (9), respectively. 2. Checking the stationary of time series ( , , ) using the ACF test until we achieve stationarity. 3. Applying the classical ( , , ) for the , or determined in step 2, where + ≤ 5. After taking the first differences of the transformed data to make it stationarity, we fitted 72 models for each type of the 3 k th MA-ARIMA models (6 countries × 3 variables × 4 values of ( = 2,3,4, 5) ). The best 18 out of 72 different combinations of k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA models fitting the confirmed, recovery and death cases of COVID-19 well with the corresponding RMSE and AIC for each country are presented in Table 3, Table 4, and Table 5, respectively. reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ;   Depending on the results in Table 3, it can be concluded that the 2 nd SMA-ARIMA(2,1,3), 2 nd SMA-ARIMA(2,1,2), and 2 nd SMA-ARIMA(3,1,1) were selected as the best models to fit the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively. trend. The remaining results of Table 3 and the outputs in Table 4 and Table 5 can be interpreted in the same manner. Table 6 reviews the best models among the k th MA-ARIMA models based on the smallest RMSE. In contrast, Table 7 shows the best models among classical ARIMA besides the k th MA-ARIMA based on the smallest RMSE.  After identifying the best model within the classical ARIMA and k th MA-ARIMA models fitting confirmed, recovery and death cases for each country (see : Table 7), the next step is to check the pattern followed by residuals from the specific model by plotting the ACF of the residuals and conducting the Box-Ljung test to examine the goodness of fit for each models. Figures (5.a.1 to 5.c6) show ACF plots for all the best models located in Table 7, while Table 8 demonstrates the outputs of the Box-Ljung test. 16.04 0.982 reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ;                   By looking at the ACF plots in all sub-Figures of Figure 5, it is observed that for the first 30 lags, most of the autocorrelations are inside the 95% confidence interval bounds indicating that they are white noise and normally distributed except ACF of Figure a2 and Figure a4 which have deviated a little from normality and randomized. The outputs of the Ljung-Box reuse, remix, or adapt this material for any purpose without crediting the original authors. preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint test in Table 8 confirm that there is no autocorrelation left on the residuals for all models in Table 7 except the two models concerning confirmed cases in UAE and Qatar, and the null hypothesis that the residuals were white noise was not rejected and therefore, all models were exhibited goodness of fit. Thus, each model in Table 7 has passed the required checks and is ready for forecasting except the two models 5 th WMA-ARIMA(2,1,3) and ARIMA (2,1,3) corresponding to the confirmed cases in UAE and Qatar respectively. F statistic = 150.3*** Signif. codes: <0.001 "***" 0.001 "**" 0.01 "*" 0.05 ". " Therefore, the forecast values of confirmed cases in USA and Qatar shown in Table 9 were computed based on the cubic linear regression model.

Conclusions
Four important models including classical ARIMA, k th SMA-ARIMA, k th WMA-ARIMA, and k th EWMA-ARIMA have been considered in the prediction of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries, these models have been applied on the daily data from the first case reported in each country until Nov 30, 2020.
To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC reuse, remix, or adapt this material for any purpose without crediting the original authors.
preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven't specific models.

Patient consent
No written consent has been obtained from the patients as there is no patient identifiable data included in this study.

Conflicts of Interest
The authors declare that they have no conflicts of interest. reuse, remix, or adapt this material for any purpose without crediting the original authors.

APPENDIX
preprint (which was not certified by peer review) in the Public Domain. It is no longer restricted by copyright. Anyone can legally share, The copyright holder has placed this this version posted May 29, 2021. ; https://doi.org/10.1101/2021.05.27.21257916 doi: medRxiv preprint