Abstract
Coronavirus disease 2019 (COVID-19) is still a great pandemic presently spreading all around the world. In Gulf Cooperation Council (GCC) countries, there were 1015269 COVID-19 confirmed cases, 969424 recovery cases, and 9328 deaths as of 30th Nov. 2020. This paper, therefore, subjected the daily reported COVID-19 cases of these three variables to some statistical models including classical ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA to study the trend and to provide the long-term forecasting of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries. The data analyzed in this study covered the period starting from the first case of coronavirus reported in each GCC country to Nov 30, 2020. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven’t specific models.
1. Introduction
Coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first case was registered in Wuhan, China, in December 2019. It has since spread worldwide, leading to an ongoing pandemic. Most of the people around the world have been infected with the new COVID-19 virus nowadays; the people of the Gulf Cooperation Council (GCC) countries were no exception to that. GCC countries adopted many severe strategies and policies to face the new crisis, such as early diagnosis, isolated infected people, and social distancing. The first COVID-19 confirmed in the GCC area was reported in UAE on Jan 29, 2020, and flowed by Bahrain on Feb 21, 2020, Kuwait, and Oman on Feb 24, 2020, Qatar on Feb 27, 2020, and KSA on Mar 2, 2020. Confirmed, recovery, and death cases are recorded daily in each of these countries. Accordingly, the ultimate requirement is to create an efficient statistical methodology that can efficiently predict morbidity cases. Thus, it can aid in decision-making and logistical planning in healthcare systems for coming challenges. The statistical forecast models are always conducted to predict the disease’s behavior in the future, and in this way, it may decrease and restrain in the pandemic.
Recently, many studies of COVID-19 have been conducted in the GCC countries; some of these studies used measures of descriptive statistics to explain the disease prevalence, incidence, and underline prevention and control techniques applied in these countries; while others used mathematical models to analyze and predict the evolution of the disease. Alandijany et al. (2020) reviewed the status of COVID-19 in GCC countries, summarized the control measures taken by each government, and highlighted some future challenges. They recommended that these countries must take severe appropriate precautions to limit the spread of infection. Sharif et al. (2020) utilized the SIRD and smoothing spline regression models to predict the number of cases in the Eastern Mediterranean Region including Saudi Arabia, Iran, and Pakistan. They concluded that the cumulative infected cases were expected to grow exponentially during the study period from Jan 29 till Apr 14, 2020. Zuo et al. (2020) provided a brief comparison of the COVID-19 events, which involve the total confirmed cases, total deaths, total recovery, and active cases that have been reported in the Asian countries up to Apr 8, 2020. Moreover, they introduced a new family of statistical models and proposed a particular submodel called flexible extended Weibull distribution. Abuhasel et al. (2020) applied the classical SIR model besides the ARIMA model to forecast the prevalence and recovery rates of the COVID-19 pandemic; the two models were applied to the daily data from Mar 3 to Jun 30, 2020. Ayinde et al. (2020) used the statistical curve model to model the daily cumulative confirmed, discharged, and death cases in Nigeria for the period beginning from Feb 27, 2020, until Apr 30, 2020. They concluded that the Cubic Linear Regression with AR (1) models are the best ones. Elhassan and Gaafar (2020) employed ARIMA and logistic growth models to predict the cumulative confirmed, recovery, and death cases of COVID-19 in Saudi Arabia between 2nd March 2, 2000, and Jun 21, 2020. They inferred that ARIMA(0,2,0), ARIMA(1,2,0), and ARIMA(1,2,3) were the most useful models to fit the cumulative confirmed, recovery, and death cases, respectively. Singh et al. (2020) developed the ARIMA to model the daily confirmed cases reported in Malaysia using training data of observed cases from Jan 22 to Mar 31, 2020. Subsequently, they validated using data on cases from Apr 1 to Apr 17, 2020, deducing that the ARIMA(0,1,0) model produced the best fit to the observed data. Ding et al. (2020) analyzed the epidemic data from Feb 24 to Mar 30, 2020, in Italy, based on ARIMA model. They selected ARIMA(2,1,0) to fit the logarithmic sequence of cumulative diagnoses. Hernandez-Matamoros et al. (2020) constructed a model for 145 countries, which are distributed in 6 geographic regions using the ARIMA parameters, the population per 1M people, the number of cases, and polynomial functions, the study period was until Apr 25. Dawoud (2020) Applied classical ARIMA together with the kth MA - ARIMA to model the COVID-19 cumulative confirmed cases in Palestine from Mar 5, 2020, through Aug 27, 2020. He inferred that ARIMA (1,2,4) and 5th EWMA-ARIMA (2,2,3) were the best models. Duong et al. (2020) applied the ARIMA model for the total daily confirmed cases worldwide from Jan 21, 2020, to Mar 16, 2020. They found that ARIMA(1,2,1) could describe and predict the epidemiological trend of COVID-19. Roy et al.(2020) used ARIMA to fit the COVID-2019 disease from Jan 26, 2020, to May 9, 2020, in Indian. They selected ARIMA(1,0,2) model to fit the sequence of diagnoses. Verma et al. (2020) developed some models based on ARIMA and FUZZY time series methodology to forecast the COVID-19 infection, mortality, and recovery in India throughout the phase between Mar 2, 2020, to May 17, 2020. They deduced that the ARIMA and FUZZY time series models’ forecasts would be useful for the decision-makers.
The main objective of this article is to model confirmed, recovery, and death cases of COVID-19 using classical ARIMA besides the three types of kth Moving Average-ARIMA (kth MA-ARIMA), including kth Simple Moving Average-ARIMA (kth SMA-ARIMA), kth Weighted Moving Average-ARIMA (kth WMA-ARIMA) and kth Exponential Weighted Moving Average-ARIMA (kth EWMA-ARIMA) in the GCC countries. This study starts from the first case of coronavirus reported in each GCC country to Nov 30, 2020. This article’s main contribution is that it considers the only study that used the classical ARIMA together with kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA to model the three variables confirmed, recovery, and death cases of COVID-19 in the GCC countries.
The organization of the paper is as follows. Section 2 describes the study area and data collection. Section 3 briefs the methodology used in the study. The article ends with the results and discussion in Section 4, and conclusions in Section 5.
2. Study Area and Data collection
To achieve this study’s objectives, all six countries within the GCC were included (Saudi Arabia, United Arab Emirates, Qatar, Kuwait, Bahrain, and Oman). The sample data consist of daily reported COVID-19 cases of 3 variables involving confirmed, recovery, and deaths in each country. The data cover the period starting from the first confirmed case of COVID-19 reported in each country to Nov 30, 2020. The data extracted from the WHO situation reports, Sehhty website, and Wikipedia.
3. Methodology
This paper’s main goal is to model 3 variables involving daily confirmed, recovery, and death cases in GCC countries using classical ARIMA besides the three types of kth MA-ARIMA including kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA. Therefore, this section investigates each of these models, discussing model building and model evaluation.
3.1. ARIMA Model
ARIMA model, which was developed by Box and Jenkins (1994), is a statistical model that uses time series data to study the trend and generate future forecasting of time series data.
For a given non-stationary time series Xt, the classical ARIMA (p, d, q) model is defined as
Where β is the backward shift operator, (1 − β)d is the difference filter, d is a number of times, need to differentiate Xt to make the data stationary, p is the order of autoregression, q is the order of moving average, ϕp(β) = 1 − ϕ1β − ϕ2β2 − … − ϕpβp, ψq(β) = 1 + ψ1β + ψ2β2 + ⋯ + +ψqβq and ϵt ∼ N(0,1).
ARIMA model is a generalized model that integrates the autoregressive model AR(p) and the moving average model MA(q), ARIMA models that do not require differencing are considered as ARMA models, therefore model (1) can be expressed as polynomials of autoregressive AR(p), residuals MA(q), and a combination of them ARMA(p, q) as
3.2. The kth Moving Average-ARIMA Time Series Models
kth Moving average ARIMA model technique (kth MA-ARIMA) proposed by Shih and Tsokos (2008) and Tsokos (2010). This model is based on modifying a given time series into a new k-time moving average time series, and then developing the autoregressive integrated moving average (ARIMA) model using the Box and Jenkins method. Once the new time series’s forecasting model is built, a back-shift operator is applied to obtain estimates of the original phenomenon. The kth MA-ARIMA involves kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA.
3.2.1. The kth SMA-ARIMA Model
The kth SMA-ARIMA process of a time series xt and it is the corresponding back-shift operator are defined, respectively, by
3.2.2. The kth WMA-ARIMA Model
The kth WMA-ARIMA process of a time series xt and it is the corresponding back-shift operator are given, respectively, as
3.2.3. The kth EWMA-ARIMA Model
The kth EWMA-ARIMA process of a time series xt and it is the corresponding back-shift operator are computed, respectively, as
3.3. Model selection criteria
Model selection criteria are rules used to select a statistical model among a set of candidate models based on the observed data. The Akaike information criterion (AIC) is a widely used model selection tool due to its computational simplicity and effective performance in many modeling frameworks. The AIC is given as (Akaike, 1974)
Where L is the likelihood of the model and M is the total number of estimated parameters in the model. A good model is the one that has the minimum AIC among all other models.
3.4. Measures of forecast accuracy
The most popular measure of forecast accuracy in univariate time series data is the Root Mean Square Error (RMSE) proposed by Hyndman and Koehler (2006). The RMSE is computed as
where xt and
are the actual and predicted values at time t, respectively, and n is a sequence of time points. The lower value of RMSE indicates better calibration and, therefore, better performance.
3.5. Checking model’s goodness of fit
After the ARIMA or kth MA-ARIMA model, which is considered appropriate among the alternatives, is put in place, it can be tested for a goodness fit, which entails testing its efficiency. The model is assumed to be a good fit if the residuals are approximately equal to the white noise. The essential tools are the plots of ACF and PACF. The Box-Ljung test is a diagnostic tool used to test the lack of fit of a time series model. This test is applied to the residuals of a time series after fitting an ARIMA or kth MA-ARIMA model to the data. The test examines m autocorrelations of the residuals. The null and alternative hypothesis for this test is
H0: The model does not exhibit a lack of fit, or there is no serial correlation among m lags
H1: The model exhibits a lack of fit, or the residuals are approximately equal to the white noise.
4. Results and Discussion
This section first demonstrates summary statistics for the three variables, confirmed, recovery, and death cases in each GCC country, then reports and discusses the results obtained from applying the ARIMA and kth MA-ARIMA models on these variables.
4.1. Summary Statistics for COVID-19 confirmed, recovery and death cases
Table 1 shows the summary statistics measures, including mean and standard deviation of the confirmed, recovery, and death cases of COVID-19 among the GCC countries. Moreover, Table 1 also demonstrates the prevalence of confirmed cases per 100000 population for the first four weeks.
Mean and standard deviation of confirmed, recovery and death cases in GCC countries among the study periods and prevalence of cumulative confirmed cases.
Based on Table 1, it is observed that KSA has the highest mean of confirmed cases (1303.85) with a standard deviation of (1210.86), followed by UAE, Kuwait, and Oman; on the other side, Bahrain has the lowest mean (306.18) with a standard deviation of (215.08). For recovery cases, KSA has the highest mean, followed by UAE, Kuwait, Qatar, and Oman, but Bahrain has the lowest one. KSA has the highest mean of reported death cases, followed by Oman, Kuwait. On the other hand, Qatar has the lowest one. It can be also seen that in the first 4 weeks of COVID-19 outbreak, Qatar and Bahrain have the highest prevalence of confirmed cases of 18 and 16 infected persons per 1000000, respectively. In contrast, UAE and Oman have the lowest ones of 1 and 1.1 per 1000000, respectively (see Figure 1).
The prevalence of confirmed cases per 100000 population for the first four weeks in GCC countries
4.2. Prediction model for COVID-19 confirmed, recovery and death cases
This paper uses the time series, daily COVID-19 confirmed, recovery, and death cases in each GCC country. Therefore, we have a time series presented as follows:
where xt represents the confirmed, recovery or death cases at day t and T1 denotes the date of the first case of COVID-19 detected in a given country. The time-series plot of the daily COVID-19 confirmed, recovery, and death cases for GCC countries is presented in Figure 2, Figure 3, and Figure 4, respectively.
Time-series of daily COVID-19 confirmed cases in each country
Time-series of daily COVID-19 recovery cases in each country
Time-series of daily COVID-19 death cases in each country
4.3. Prediction Method
To compute the best parameters estimates of ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models, these models were fitted for 90% of the available data in each country which is called the in-sample forecast or training data and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC of Eq. (11) was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE of Eq.(12) was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The calculations were performed using R studio version 1.2.5033 and EViews 10.
4.4. ARIMA model for COVID-19 confirmed, recovery and death cases
To check whether the daily COVID-19 confirmed, recovery and death cases time series in each country were stationary; we carried on ADF root test. The results of the ADF unit root test are demonstrated in Table A.1 in the Appendix. Based on Table A.1, we conclude that all variables are stationary with constant and trend at first differences throughout the study period; therefore, the ARIMA model can be done. After the stationarity of the confirmed, recovery, and death cases time series in each country were determined, the best ARIMA model that fit these 3 variables well for training data with the minimum AIC and lowest RMSE were selected. Table 2 summarizes the best ARIMA model for the confirmed recovery and death cases in each country and their corresponding RMSE and AIC.
The best ARIMA model fitting confirmed, recovery and death cases and their corresponding RMSE, and AIC.
Based on the results in Table 2, we observed that ARIMA (2,1,3), ARIMA (1,1,1), and ARIMA (1,1,2) consider as the best models to fit the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively; these models have the minimum AICs (3607.47, 4326.68, 1448.66) and lowest RMSEs (52.226, 37.38, 2.175) values among all models. These results imply that ARIMA (2,1,3), ARIMA (1,1,1), and ARIMA (1,1,2) are more efficient than other ARIMA models with p + q ≤ 5 for in-sample forecasts. Consequently, a new confirmed, recovery, and death case can be interpreted based on the current case and the most recent change of the COVID-19 trend. The remaining results of Table 2 can be interpreted in the same manner.
4.5. kth MA-ARIMA model for COVID-19 confirmed, recovery and death cases
We can summarize the process of developing the kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models as follows:
Transforming the original time series xt into the new one (yt, zt, vt) for k = 2,3, …, 5 by using Eq. (5), Eq. (7), and Eq. (9), respectively.
Checking the stationary of time series (yt, zt, vt) using the ACF test until we achieve stationarity.
Applying the classical ARIMA(p, d, q) for the yt, zt or vt determined in step 2, where p + q ≤ 5.
Compute the AIC for each model, and choose the one with the smallest AIC.
Solve the estimates of the original time series (fitted values) by using the back-shift operator of Eq. (6), Eq. (8), and Eq. (10), respectively.
Computing the RMSE for each model, and choose one with the smallest RMSE.
After taking the first differences of the transformed data to make it stationarity, we fitted 72 models for each type of the 3 kth MA-ARIMA models (6 countries × 3 variables × 4 values of k (k = 2,3,4, 5)). The best 18 out of 72 different combinations of kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA models fitting the confirmed, recovery and death cases of COVID-19 well with the corresponding RMSE and AIC for each country are presented in Table 3, Table 4, and Table 5, respectively.
The best kth SMA-ARIMA models fitting confirmed, recovery and death cases and their corresponding RMSE and AIC in each country.
The best kth WMA-ARIMA models fitting confirmed, recovery and death cases and their corresponding RMSE and AIC in each country.
The best kth EWMA-ARIMA models fitting confirmed, recovery and death cases and their corresponding RMSE and AIC in each country.
Depending on the results in Table 3, it can be concluded that the 2nd SMA-ARIMA(2,1,3), 2nd SMA-ARIMA(2,1,2), and 2nd SMA-ARIMA(3,1,1) were selected as the best models to fit the confirmed, recovery, and death cases of COVID-19 in Saudi Arabia, respectively. These models have the minimum AICs (3223.7, 3942.94, 1072.40) and lowest RMSEs (50.96, 41.96, 2.25) values among all SMA-ARIMA models. These results imply that the selected models are more efficient than other SMA-ARIMA models with k = 2,3,4, 5 and p + q ≤ 5 for in-sample forecasts. Accordingly, a new confirmed, recovery, and death case can be interpreted based on the current case and the most recent change of the COVID-19 trend. The remaining results of Table 3 and the outputs in Table 4 and Table 5 can be interpreted in the same manner.
Table 6 reviews the best models among the kth MA-ARIMA models based on the smallest RMSE. In contrast, Table 7 shows the best models among classical ARIMA besides the kth MA-ARIMA based on the smallest RMSE.
The best kth MA-ARIMA models fit confirmed, recovery, and death cases in each country.
The best models among the classical ARIMA and kth MA-ARIMA fitting confirmed, recovery, and death cases in each country.
After identifying the best model within the classical ARIMA and kth MA-ARIMA models fitting confirmed, recovery and death cases for each country (see: Table 7), the next step is to check the pattern followed by residuals from the specific model by plotting the ACF of the residuals and conducting the Box-Ljung test to examine the goodness of fit for each models. Figures (5.a.1 to 5.c6) show ACF plots for all the best models located in Table 7, while Table 8 demonstrates the outputs of the Box-Ljung test.
The Box-Ljung test applied for the best models among ARIMA and kth MA -ARIMA models fitting confirmed, recovery and death cases in each country.
By looking at the ACF plots in all sub-Figures of Figure 5, it is observed that for the first 30 lags, most of the autocorrelations are inside the 95% confidence interval bounds indicating that they are white noise and normally distributed except ACF of Figure a2 and Figure a4 which have deviated a little from normality and randomized. The outputs of the Ljung-Box test in Table 8 confirm that there is no autocorrelation left on the residuals for all models in Table 7 except the two models concerning confirmed cases in UAE and Qatar, and the null hypothesis that the residuals were white noise was not rejected and therefore, all models were exhibited goodness of fit. Thus, each model in Table 7 has passed the required checks and is ready for forecasting except the two models 5th WMA-ARIMA(2,1,3) and ARIMA(2,1,3) corresponding to the confirmed cases in UAE and Qatar respectively.
Residuals’ ACF from the best models confirmed, recovery and death cases in each country.
Tables 9, 10, and 11, respectively, demonstrate the forecasting result of confirmed, recovery and death cases for COVID-19 in each GCC country from Dec. 1, 2020 to Dec. 10, 2020 (10 values) based on each corresponding model listed in Table 7 (expected confirmed cases in UAE and Qatar). On the other hand, the suitable model for the confirmed cases in UAE and Qatar is the cubic linear regression model, and the estimated model for confirmed cases in UAE is
while the estimated cubic linear regression model for confirmed cases in Qatar is
Forecast values for daily COVID 19 confirmed cases from 01/12/2020 to 01/12/2020 based on the best models in each country.
Forecast values for daily COVID 19 recovery cases from 01/12/2020 to 01/12/2020 based on the best models in each country.
Forecast values for daily COVID 19 death cases from 01/12/2020 to 01/12/2020 based on the best models in each country.
Therefore, the forecast values of confirmed cases in USA and Qatar shown in Table 9 were computed based on the cubic linear regression model.
5. Conclusions
Four important models including classical ARIMA, kth SMA-ARIMA, kth WMA-ARIMA, and kth EWMA-ARIMA have been considered in the prediction of the confirmed, recovery, and death cases of the novel COVID-19 pandemic in the GCC countries, these models have been applied on the daily data from the first case reported in each country until Nov 30, 2020. To compute the best parameter estimates, each model was fitted for 90% of the available data in each country, which is called the in-sample forecast or training data, and the remaining 10% was used for the out-of-sample forecast or testing the model. The AIC was applied to the training data as a criterion method to select the best model. Furthermore, the statistical measure RMSE was utilized for testing data, and the model with the minimum AIC and minimum RMSE was selected. The main finding, in general, is that the two models WMA-ARIMA and EWMA-ARIMA, besides the cubic linear regression model have given better results for in-sample and out-of-sample forecasts than the classical ARIMA models in fitting the confirmed and recovery cases while the death cases haven’t specific models.
Patient consent
No written consent has been obtained from the patients as there is no patient identifiable data included in this study.
Data Availability
The data that the findings of this study are openly available.
https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports
Data Availability
The data that the findings of this study are openly available, at the web addresses https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports. https://sehhty.com/
Conflicts of Interest
The authors declare that they have no conflicts of interest.
APPENDIX
ADF unit root test for confirmed, recovery and death cases in each country
Footnotes
Email: rahmtallay{at}yahoo.com