Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Prediction of COVID-19 Pandemic of Top Ten Countries in the World Establishing a Hybrid AARNN LTM Model

Padmabati Gahan, View ORCID ProfileMonalisha Pattnaik, Agnibrata Nayak, Monee Kieran Roul
doi: https://doi.org/10.1101/2020.12.31.20249105
Padmabati Gahan
1Dept. of Business Administration, Sambalpur University, Sambalpur, India-768019, Mail:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: pgahan7@gmail.com
Monalisha Pattnaik
2Dept. of Statistics, School of Mathematical Sciences, Sambalpur University, Sambalpur, India-768019, Mail:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Monalisha Pattnaik
  • For correspondence: monalisha_1977@yahoo.com monalisha_1977@yahoo.com
Agnibrata Nayak
3Principal Associate, Capital One LLC, Richmond, Virginia-USA, Mail:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: agninba@gmail.com
Monee Kieran Roul
4Process Engineer, Lam Research Corp., Portland-USA, Mail:
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: monee.roul@gmail.com
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The novel COVID-19 global pandemic has become a public health emergency of international concern affecting 215 countries and territories around the globe. As of 28 November 2020, it has caused a pandemic outbreak with a total of more than 6,171,5119 confirmed infections and more than 1,44,4235 confirmed deaths reported worldwide. The main focus of this paper is to generate LTM real-time out of sample forecasts of the future COVID-19 confirmed and death cases respectively for the top ten profoundly affected countries including for the world. To solve this problem we introduced a novel hybrid approach AARNN model based on ARIMA and ARNN forecasting model that can generate LTM (fifty days ahead) out of sample forecasts of the number of daily confirmed and death COVID-19 cases for the ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and also for the world respectively. The predictions of the future outbreak for different countries will be useful for the effective allocation of health care resources and will act as early-warning system for health warriors, corporate leaders, economists, government/public-policy makers, and scientific experts.

1. Introduction

The COVID-19 global pandemic is overwhelming hospitals and emergency response systems in many countries like, USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia in and is expected to influence provision of care years to come. Health + Hospitals (H+H) have played a crucial role in entire globe’s response to COVID-19. As more than a billion public health care system with more than billion acute-care hospitals and post-acute care facilities, global Health + Hospitals (H+H) has played a significant role in responding to the COVID-19 pandemic outbreak, particularly for the countries like USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and also for the World. We describe our experience developing a forecasting model of COVID-19 confirmed and death cases for top ten profoundly affected countries including world dataset.

In December 2019, Wuhan city of China became the hub of an outbreak of pneumonia of mysterious cause, latter named as COVID-19, which raised intense attention not only within China but globally (Chakraborty et al., 2020, Guan et al., 2020, Ordonez et al., 2019 and Jung et al., 2020). The COVID-19 pandemic is the most significant global crisis that affected almost all the countries of the globe (Boccaletti et al., 2020). As of 28 Nov. 2020, an outbreak of COVID-19 has resulted in a total of more than 6,171,5119 confirmed infections and more than 1,44,4235 confirmed deaths reported worldwide (Nishiura et al., 2020). On March 11, WHO publicly characterized COVID-19 as a “global pandemic”, and shortly after that, the United States declared COVID-19 outbreaks a national emergency. The COVID-19 has caused a great threat to the health and safety of people all over the world due to its widespread and prospective harm. Thus, the studies of the novel COVID-19 pandemic and its future development trend has become a state-of-the-art research topic at this moment. We are therefore stimulated to ask: Can we generate real-time forecasts of daily new COVID-19 confirmed and death cases for the top ten profoundly affected countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and also for the World? To answer this question, we study classical and modern forecasting techniques for which the prediction accuracy largely depends on the availability of data (Petropoulos et al., 2020). In outbreaks of COVID-19 pandemic, there are limited data available, making predictions generally uncertain. From previous studies, it was evident that the timing and location of the outbreak facilitated the rapid transmission of the virus within a highly mobile population (Roosa et al., 2020, Boldog et al., 2020 and Russel et al., 2020). In most of the affected countries, the governments implemented a strict lockdown in subsequent days of initial transmission of the virus and within hospitals; patients who fulfil clinical and epidemiological characteristics of COVID-19 are immediately quarantined. The constant increase in the global number of COVID-19 confirmed cases is putting a substantial burden on the health care system for these top ten countries. To anticipate additional resources to combat the pandemic, various mathematical and statistical forecasting tools (Li et al., 2020 and Wu et al., 2020) and outside China (Fanelli et al., 2020, Kucharski et al., 2020) were applied to generate short-term and long-term forecasts of reported cases. These model predictions have shown a wide range of fluctuations. Since the time series datasets of COVID-19 confirmed and death cases contain nonlinearity, nonstationary and non-Gaussian patterns, therefore, making decisions based on an individual model would be critical. In this study, we propose a hybrid modeling approach to generate LTM out-of-sample forecasts for multiple countries. In traditional time series forecasting, the ARIMA model is used predominantly for forecasting linear time series (Box et al., 2015). Machine learning models such as ANN, ARNN and WBF (Paul a al., 2017, Nury et al., 2017, Aminghafari et al., 2007, Beraouda et al., 2006 and Fay et al., 2007) are proved to perform well for nonlinear data structures in time series data. COVID-19 data sets are neither purely linear nor nonlinear. They usually contain both linear and nonlinear patterns. If this is the case, then the individual ARIMA or ARNN or ANN or WBF is not enough to model such situations. Therefore, the combination of linear and nonlinear models can be well suited for accurately modeling such complex autocorrelation structures (Percival et al., 2020 and Chakraborty et al, 2020) which reduce the bias and variances of the prediction residuals of the component models. Several hybrid methodologies were discussed in literature to solve a variety of time series problems arose in economic parameters, economic growth, GDP and wind speed (Tumer et al., 2018, Kordanuli et al. 2017, Khashei et al., 2011, Milacic et al., 2017, Cadenas et al., 2010, Maleki et al. 2018 and Chakraborty et al., 2019, James et al., 2013 and Pattnaik et al. 2018). Zhang’s hybrid ARIMA-ANN model (Zhang et al., 2003) has gained popularity due to its capacity to forecast complex time series accurately. The pitfall of hybrid ARIMA-ANN model lies in the selection of the number of hidden layers in the ANN architecture that involves subjective judgment. To ignore this drawback, we took recourse to the ARNN model approach. ARNN fits a feed-forward neural network model with only one hidden layer to a time series with lagged values of the series as inputs. The advantage of ARNN over ANN and WBF is that ARNN is a nonlinear autoregressive model, and it provides less complexity, easy interpretability and better prediction as compared to others in many situations. Due to this, ARNN is getting more attention in recent literature of non-stationary time series forecasting (Hydmann, 2018). In the absence of vaccines or antiviral drugs for COVID-19, these estimates will present an insight into the resource allocations for the extremely influential countries to keep this global pandemic under control. Besides detaching light on the dynamics of COVID-19 dissemination, the practical intent of this real-time data-driven analysis is to provide health warriors, corporate leaders, economists, government/public- policy makers, and scientific experts with realistic estimates for the magnitude of the pandemic for policy-making. The main inspiration of this paper is to handle the dynamism of the characteristics of the time series COVID-19 data. Taking a final decision in policy making based on a component model may be risky in such a severe problem like COVID-19 pandemic confirmed and death cases forecasting where one frequently observes changes in the dynamic properties of the variable being measured. Hybridization of two or more models is the most common solution to this problem (Oliveira et al., 2014). Even from the practitioners’ point of view, hybrid models are more effective when the complete data characteristics are not known (Kuncheva, 2004). Motivated from these discussions, this paper proposes a novel hybrid ARIMA-ARNN (AARNN) model that captures complex data structures and linear plus nonlinear behavior of COVID-19 data sets. In the first phase of our proposed model, ARIMA catches the linear patterns of the data set. Then the ARNN model is employed to capture the nonlinear patterns in the data using residual values obtained from the base ARIMA model. The proposed model has easy interpretability, robust predictability and can adapt seasonality indices as well. Through experimental evaluation, we have shown the excellent performance of the proposed hybrid AARNN model for the COVID-19 global pandemics forecasting for profoundly affected top ten countries including world data sets.

The rest of the paper is structured as follows: In Section 2, country-wise COVID-19 confirmed and death cases datasets, preliminary of data analysis and performance evaluation metrics are presented. In Section 3, we discuss the proper formulation of the proposed hybrid AARNN model. We discuss the experimental evaluations and results for LTM out-of-sample forecasts of COVID-19 global pandemic for the top ten countries including for the world in Section 4. The discussions about the results and practical implications are presented in Section 5. Finally, in Section 6, concluding remarks of this study are presented.

2. Data and preliminary analysis

We explore on the daily confirmed and death cases of COVID-19 pandemic for ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and for the World respectively. The datasets are retrieved by the Global Change Data Lab1. The starting date of all the countries is different but the ending date is same for all i.e. November 28, 2020. Present section includes the brief discussion about the datasets, the development of the proposed hybrid model and finally, the application of the proposed model to generate long-term-memory (LTM) out-of-sample forecasts of the COVID-19 confirmed and death cases for the above ten countries including the world.

2.1 COVID-19 datasets

Eleven univariate time series datasets of both confirmed and death cases of COVID-19 are collected for the real-time prediction purpose for the ten countries including the world. Different earlier studies have forecasted future COVID-19 confirmed cases only for different countries using mathematical and traditional time series models [].We try to investigate accurate prediction both of COVID-19 confirmed and death cases of top ten profoundly affected countries. Initial day of daily laboratory-confirmed and –death cases for all the countries are different but the last day of observation for all the countries is considered same timeline i.e. 28.11.2020. Table 2 shows the description of COVID-19 data sets of confirmed cases and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and for world respectively. The dataset of confirmed cases for the USA, India, Russia, France, Spain, UK and world contains a total of more than 300 observations whereas the dataset for Brazil, Argentina and Colombia contains less than 300 observations. The dataset of death cases for all the studied countries are in between a total of 205 and 300 observations except for world.

View this table:
  • View inline
  • View popup
Table 1:

Descriptions of COVID-19 data sets of confirmed cases and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World respectively

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 2:

Algorithm of Proposed Novel Hybrid AARNN Model

2.2 Preliminary data analysis

A summary of the COVID-19 data sets of confirmed and death cases are shown in Table 1. From p-values it indicates that all the eleven COVID-19 data sets of both confirmed and death cases are not normally distributed. Table 3 and 4 (A-1 & A-2) show the daily confirmed and death cases of COVID-19 pandemic of ten countries including world respectively. The epidemic curves still not showing the spiky declining nature. We bound our attention for trend and non-seasonal models. We consider a novel approach which assumes that the trend will tend to infinity in the future. To fit the traditional ARIMA model, hybrid ARIMA-ANN, WARIMA and proposed AARNN models of COVID-19 confirmed and death cases, the parameters namely p and q are specified by plotting ACF and PACF plots (see Table 3 and 4).

View this table:
  • View inline
  • View popup
Table 3:

Training COVID-19 data sets of confirmed cases and corresponding ACF, PACF plots for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

View this table:
  • View inline
  • View popup
Table 4:

Training COVID-19 data sets of death cases and corresponding ACF, PACF plots for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

2.3 Performance evaluation metrics

The performance of different forecasting models is evaluated based on RMSE and MAE for the COVID-19 data sets (Ahmed et al., 2010 and Chakraborty et al., 2020). The mathematical expressions of these performance evaluation metrics are as follows. Embedded Image where, yi is the actual output, ŷi is the predicted output, and denotes the number of data points. By definition, the lower the value of these performance metrics, the better is the performance of the concerned forecasting model.

3. Methodology

To forecast COVID-19 confirmed and death cases, we adopt hybrid time series forecasting approaches combining ARIMA and ARNN techniques. The proposed AARNN hybrid model overcomes the deficiencies of the individual linear and nonlinear and other hybrid time series models. Before describing the methodology, we give a brief description of the individual traditional and advanced models to be used in the hybridization.

3.1 ARIMA Model

The autoregressive integrated moving average (ARIMA) model, developed by (Box and Jenkin 2015), is a linear regression model indulged to eliminate linear tendencies in stationary time series data. The model is expressed as ARIMA (p,d,q) where p, d, and q are integer parameter values that decide the structure of the model. More specifically, p and q are the order of the AR model and the MA model respectively, and parameter d is the level of differencing applied to the time series data. The mathematical expression of the ARIMA model (Chakraborty et al. 2019) is as follows: Embedded Image where yt is the actual value, εt is the random error at time t, φi and θjare the coefficients of the model.

It is assumed that εt−1 (εt−1 = yt−1 − ŷ t−1)has zero mean with constant variance, and satisfies the i.i.d condition. The methodology consists of four iterative steps:

  1. Model identification and model selection

  2. Parameter estimation of the model parameters

  3. Model diagnostics checking (namely, residual analysis) are performed to find the ‘best’ fitted model

  4. Model forecasting

In the model identification and model selection step, differencing is applied once or twice to achieve stationarity for non-stationary data. As stationarity condition is satisfied, the ACF plot and the PACF plot are examined to select the AR and MA model types. The parameter estimation step involves an optimization process utilizing metrics such as the AIC and/or the BIC. Finally, in the model checking step, the residual analysis is carried out to finalize the ‘best’ fitted ARIMA model. ARIMA model is a data-dependent approach that can adapt to the structure of the data set. But it has the major disadvantage that any significant nonlinear data and non-Gaussian set can restrict the ARIMA model. Therefore, some individual and hybrid models uses ARNN, ANN and WBF models to deal with the nonlinear data patterns for forecasting intricate time series structure.

3.2 ARNN Model

ANN is a widely used supervised learning model, highly useful for sophisticated nonlinear time series forecasting. Neural nets are based on simple mathematical models of the brain, used for intricate nonlinear forecasting. Any neural net architecture can be described as a network of “neurons”, arranged in layers, namely the input layer, hidden layer, and output layer. The information from one layer to another layer is passed using weights that are selected using a risk minimization based ‘learning algorithm’ (Chakraborty et al. 2020). The ARNN model is a modified neural network model especially designed for time series datasets which uses a pre-specified number of hidden neurons in its architecture (Faraway and Chatfield 1998). It uses lagged values of the time series as inputs to the model. ARNN (p, k) is a nonlinear feed-forward neural net model with one hidden layer (having p lagged inputs) and k hidden units Embedded Image for non-seasonal data in one hidden layer (Hyndman and Athanasopoulos 2018). It also uses AIC as the criterion for comparing different models created by ARNN. Here ŷt is computed using selected past observations yt−1, yt−2, …… yt−p as the inputs. Weights of the ARNN model are trained using a gradient descent back propagation algorithm (Rumelhart et al. 1985).

3.3 Proposed AARNN Model

ARIMA model is one of the traditional statistical models for linear time series prediction. On the other hand, the advanced ARNN model can capture nonlinear trends in the data set. So, the two models are consecutively combined to encompass both linear and nonlinear tendencies in the model (Aladag et al., 2009). A hybrid strategy that has both linear and nonlinear modeling abilities is a good alternative for forecasting dengue cases. Both the ARIMA and the ARNN models have different capabilities to capture data characteristics in linear or nonlinear domains. Thus, the hybrid approach can model linear and nonlinear patterns with improved overall forecasting performance. There exists numerous time series models in the literature, and several research shows forecast accuracy improves in hybrid models. The aim of developing a novel hybridization is to harness the advantages of individual models and reduce the risk of failures of these models. The underlying assumption of the hybrid approach based on linear and nonlinear model assumption is that the relationship between linear and nonlinear components are additive. The strength of individual models for hybridization is very important, and this selection is essential to show the consistent improvement over single models. This paper presents a novel hybridization of ARIMA and ARNN (AARNN) to overcome the limitation of individual models and utilize their strengths. With the combination of linear and nonlinear models, the proposed methodology can guarantee better performance as compared to the component models.

The hybrid model (Zt) can be represented as follows: Embedded Image where Yt is the linear part and Nt is the nonlinear part of the hybrid model. Both Yt and Nt are estimated from the data set. Let, Ŷt be the forecast value of the ARIMA model at time t and εt represent the residual at time t as obtained from the ARIMA model; then. εt = Zt−Ŷt

The residuals are modeled by the ARNN model and can be represented as follows Embedded Image where f is a nonlinear function modeled by the ARNN approach and ξt is the random error. Therefore, the combined forecast is Embedded Image, where, Embedded Image is the forecast value of the ARNN model. The rationale behind the use of residuals in the diagnosis of the sufficiency of the proposed hybrid model is that there is still autocorrelation left in the residuals which ARIMA could not model. This work is performed by the ARNN model which can capture the nonlinear autocorrelation relationship. In summary, the proposed hybrid AARNN model works in two phases. In the first phase, an ARIMA model is applied to analyze the linear part of the model. In the next stage, an ARNN model is employed to model the residuals of the ARIMA model. The hybrid model also reduces the model uncertainty which occurs in inferential statistics and forecasting time series. The algorithmic representation of the proposed hybrid AARNN model is also given in Table 2 (Chakraborty et al., 2020).

4. Experimental evaluations and results

It is a challenging task to study the characteristics and nature of the limited COVID-19 data sets. With the availability of COVID-19 datasets of both confirmed and death cases for ten different countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina and Colombia including world, we consider one significant problem relevant to ongoing COVID-19 global pandemic. This problem deals with the real-time data-driven forecasts of the daily COVID-19 confirmed and death cases of the above top ten profoundly affected countries. We propose a novel hybrid ARIMA-ARNN (AARNN) model to forecast daily COVID-19 confirmed and death cases of these countries. This proposed novel hybrid model can be treated as the best to notify the health practitioners, corporate leaders, economists, government/public-policy makers, and scientific experts to fight against the COVID-19 global pandemic.

Eleven univariate time series COVID-19 datasets of daily confirmed and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, and Colombia including world are considered for training the proposed novel hybrid AARNN model and the traditional (ARIMA), advanced (ARNN, ANN and WBF) and other hybrid models (ARIMA- ANN and WARIMA). The datasets are nonlinear, nonstationary, and non-Gaussian in nature and statistical tests confirmed this (see Section 2.2 and refer to Table 1). We experimentally evaluate the performances of ARIMA, ARNN, ANN, WBF, hybrid ARIMA-ANN, WARIMA model (Chakraborty et al., 2020), in comparison with our proposed novel hybrid AARNN model (Zhang, 2003 and Chakraborty et al., 2020) for all these eleven COVID-19 datasets. We have used RMSE and MAE, to evaluate the predictive performance of the models used in this study (James et al., 2013 and Chakraborty et al., 2020). Since the number of data points of COVID-19 pandemic datasets is limited thus implementing the advanced deep learning techniques will simply over-fit the datasets (Hastie et al., 2009 and Chakraborty et al., 2020).

We start the experimental evaluation for all the eleven datasets with the ARIMA model using ‘forecast’ (Hyndman et al., 2020) statistical package in R statistical software. We fit the ARIMA model using auto.arima function. As the ARIMA model is fitted, predictions are generated for 50-time steps ahead for LTM forecasts for all the eleven COVID-19 confirmed and death cases datasets respectively. We also compute training data predicted values and calculate the residual errors using the traditional individual model ARIMA model. Plots of the predicted and residual series of both confirmed and death cases of COVID-19 pandemic are given in Table 5 (A-3). As the ARIMA model is fitted on the residual time series, predictions of both confirmed and death cases are generated for the next fifty (29th November 2020 to 17th January 2021) time steps respectively. Further, both the ARIMA, ANN, WBF and proposed ARNN residual forecasts are added together to get the final out-of-sample forecasts of both confirmed and death cases COVID-19 outbreaks using the traditional hybrid ARIMA-ANN, WARIMA and the proposed hybrid AARNN model for the next fifty (29th November 2020 to 17th January 2021) days respectively. The proposed hybrid AARNN model for actual and fittings (training data) of COVID-19 confirmed and death cases for these ten countries including world data respectively are displayed in Table 6 (A-4). The real-time LTM out of sample forecasts (50 days ahead) of COVID-19 confirmed and death cases using ARIMA, ARNN and proposed hybrid AARNN model for these ten countries including world are displayed in Table 7 and 8 (A-5 & A-6) respectively.

View this table:
  • View inline
  • View popup
Table 5:

Plots of predicted and residuals of COVID-19 confirmed cases and death cases using ARIMA model for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

View this table:
  • View inline
  • View popup
Table 6:

Actual and Predicted values of COVID-19 confirmed cases and death cases using hybrid ARIMA-ARNN (AARNN) model for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

View this table:
  • View inline
  • View popup
Table 7:

Real-time out of sample forecasts (50 days ahead) of COVID-19 confirmed cases using different forecasting models for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

View this table:
  • View inline
  • View popup
Table 8:

Real-time out of sample forecasts (50 days ahead) of COVID-19 death cases using different forecasting models for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

For comparison purposes, we applied traditional and advanced individual models like ARIMA, ARNN, ANN, WBF along with the hybrid models like ARIMA-ANN and WARIMA models (Chakraborty et al., 2020) the non-linearity, non-stationarity and non- Gaussian COVID-19 confirmed and death cases datasets of these ten countries including world respectively. All the experimental results are reported in Table 9. The performance of the proposed hybrid AARNN model is superior as compared to all the traditional, advanced individual and hybrid models under study. In comparison to other individual and hybrid models, nine for confirmed cases and seven for death cases out of eleven datasets of COVID- 19 respectively, our proposed hybrid AARNN model outperformed all the hybrid, traditional and advanced individual models in the significant edge. The consistency and adequacy in experimental results empirically approves the same. Thus, the efficacy of the proposed methodology of the proposed hybrid model is experimentally validated.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 9:

Quantitative measures of performance for different forecasting models on eleven time series (training data sets only) of COVID-19 confirmed cases and death cases for USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia and World

5. Practical implications and discussion

In this study, a novel hybrid AARNN model is established that performs considerably well for both confirmed and death cases of COVID-19 forecasting for the top ten countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia including world respectively. The proposed novel hybrid AARNN model filters linearity using the ARIMA model and on the whole, it can explain the linearity, nonlinearity and nonstationary tendencies present in both the confirmed cases and death cases of COVID-19 data sets of ten different countries including world better as compared to the traditional, advanced individual and hybrid models. The proposed hybrid AARNN model also yields better forecast accuracy than various traditional, advanced individual and hybrid models like ARIMA, ARNN, ANN, WBF, ARIMA-ANN and WARIMA for nine and seven out of eleven data sets of both confirmed and death cases of COVID-19 respectively considered in this study. The proposal will be useful in decision and policy makings for government officials and policymakers to allocate adequate health care resources for the coming days in responding to the crisis. Although any epidemics time series can oscillate heavily because of various epidemiological factors and these fluctuations are generally challenging to be fittingly captured for precise forecasting. However, the proposed novel hybrid model may still predict with better accuracy. LTM of fifty days ahead out of sample forecasts of both confirmed and death cases of COVID-19 separately are provided for all the ten countries including world. The proposed model can be used as a premature word of warning to struggle against the COVID-19 global pandemic. A list of suggestions based on the results of the real-time forecasts using a novel data-driven hybrid methodology is presented below:

  1. Since we presented a real-time forecast system unlike ex-post analysis, thus one can regularly update both the actual daily confirmed and death cases and update the predictions, just like it happens in weather forecasting or other epidemics forecasting.

  2. The forecasts on the whole follow oscillating behaviour for the out of sample forecast of 50 days ahead respectively for the ten different countries namely USA, India, Brazil, Russia, France, Spain, UK, Italy, Argentina, Colombia including world and reflect the impact of the broad spectrum of social distancing, wearing masks, containment zone, lock down, shutdown, quarantine and sanitizing properly measures implemented by the governments, which likely promoted stable the epidemic.

  3. The LTM out of sample forecasts of confirmed cases show oscillatory behaviour with upward trend and don’t show any stiff decay sooner except France and Spain and stiff amplify behaviour with upward trend for Russia. The LTM out of sample forecasts of death cases show oscillatory behaviour with both upward and downward trend and don’t show any stiff decay sooner except Russia. India, Spain and Argentina don’t show any oscillatory behavior. All the six countries including world except France, Spain, UK and Italy are going to face unlike uplifts in the number of new confirmed cases of COVID-19 pandemic.

  4. Followed by the LTM out of sample forecasts reported in this paper, the wearing masks, sanitization, immunity recreation, social distancing, containment zone, self precaution etc. can be implemented accordingly to tackle the uncertain and robust situations of COVID- 19 global pandemic.

6. Conclusions

From the data-driven experimental analysis, it is established that all the traditional and advanced individual and hybrid models like ARIMA, ARNN, ANN, WBF, ARIMA-ANN and WARIMA models are unable to completely capture the behaviour of the nonlinearity, non-stationarity and non-Gaussian of the stochastic time series datasets containing intrinsic random shock or moving average component. This indicates the need to develop a novel hybrid model with AARNN to predict the subsequent both the confirmed and death cases of COVID-19 outbreaks accurately and respond to global pandemics more powerfully.

Data Availability

https://ourworldindia.org/coronavirus

Conflict of Interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix

A-1

A-2

A-3

A-4

A-5

A-6

Footnotes

  • ↵1 https://ourworldindia.org/coronavirus

Abbreviations

COVID-19
Coronavirus Disease 2019
ARIMA
Auto Regressive Integrated Moving Average
ARNN
Auto Regressive Neural Network
AARNN
ARIMA-ARNN
LTM
Long Term Memory
ANN
Artificial Neural Network
WBF
Wavelet Based Forecasting
USA
United States of America
UK
United Kingdom
JB
Jarque Bera test
RMSE
Root Mean Square Error
MAE
Mean Absolute Error
ACF
Auto- correlation function
PACF
Partial Auto-correlation function
AIC
Akaike Information Criterion
BIC
Bayesian Information Criterion
WARIMA
WBF-ARIMA
AR
Auto Regressive
MA
Moving Average

References

  1. 1.↵
    Aladag, C.H., Egrioglu, E., Kadilar, C. Forecasting nonlinear time series with a hybrid methodology. Appl Math Lett, 2, 22(9), (2009),1467–1470.
    OpenUrl
  2. 2.↵
    Aminghafari, M., Poggi, J.M. Forecasting time series using wavelets. Int J Wavelets Multiresolut Inf Process, 5(5), (2007) 709–724.
    OpenUrl
  3. 3.
    Benaouda, D., Murtagh, F., Starck, J.L., Renaud, O. Wavelet-based nonlinear mul tiscale decomposition model for electricity load forecasting. Neurocomputing, 70(1–3), (2006), 139–154.
    OpenUrl
  4. 4.↵
    Boccaletti, S., Ditto, W., Mindlin, G., Atangana, A. Modeling and forecasting of epidemic spreading: The case of covid-19 and beyond. (2020).
  5. 5.↵
    Boldog, P., Tekeli, T., Vizi, Z., Dénes, A., Bartha, F.A., Röst, G. Risk assessment of novel coronavirus covid-19 outbreaks outside China. J Clin Med, 9(2), (2020) 571.
    OpenUrl
  6. 6.↵
    Box, G.E., Jenkins, G.M., Reinsel, G.C., Ljung, G.M. Time series analysis: forecasting and control. John Wiley & Sons, (2015).
  7. 7.
    Cadenas, E., Rivera, W. Wind speed forecasting in three different regions of Mexico, using a hybrid ARIMA–ANN model, Renew. Energy, 35 (12), (2010), 2732–2738.
    OpenUrl
  8. 8.↵
    Chakraborty, T., Chattopadhyay, S., Ghosh, I. Forecasting dengue epidemics using a hybrid methodology. Physica A (2019), 121266.
  9. 9.
    Chakraborty, T., Ghosh, I. Real-time forecasts and risk assessment of novel coronavirus (COVID-19) cases: A data-driven analysis Chaos, Solitons & Fractals, 135, (June 2020) 1-10. 109850; https://doi.org/10.1016/j.chaos.2020.109850
    OpenUrl
  10. 10.↵
    Chakraborty, T.,Chakraborty, A.K., Biswas, M., Banerjee, S., Bhattacharya, S. Unemployment Rate Forecasting: A Hybrid Approach; Computational Economics. (2020). https://doi.org/10.1007/s10614-020-10040-2.
  11. 11.
    Chatfield, C. The analysis of time series: an introduction. Chapman and Hall/CRC, (2016).
  12. 12.
    Fanelli, D., Piazza, F. Analysis and forecast of covid-19 spreading in China, Italy and France. Chaos Solitons Fractals, 134, (2020), 109761.
    OpenUrl
  13. 13.↵
    Fay, D., Ringwood, J. A wavelet transfer model for time series forecasting. Int J Bifurcation Chaos, 17(10), (2007), 3691–3696.
    OpenUrl
  14. 14.
    Firmino, P.R.A., de Mattos Neto, P.S., Ferreira, T.A. Correcting and combining time series forecasters. Neural Network, 50, (2014), 1–11.
    OpenUrl
  15. 15.↵
    Guan, W.J., Ni, Z.Y., Hu, Y., Liang, W.H., Ou, C.Q., He, J.X. et al. Clinical characteristics of coronavirus disease 2019 in China. N top N Engl J Med, (2020).
  16. 16.↵
    Hastie, T., Tibshirani, R., Friedman, J. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, (2009).
  17. 17.↵
    Hyndman, R.J., Athanasopoulos, G. Forecasting: principles and practice. OTexts, (2018).
  18. 18.↵
    Hyndman, R.J., Athanasopoulos, G., Bergmeir, C., Caceres, G., Chhay L., O’Hara-Wild M., et al. Package forecast. Online] https://cran.r-project.org/web/packages/forecast/forecast.pdf, (2020).
  19. 19.↵
    James, G., Witten, D., Hastie, T., Tibshirani, R. An introduction to statistical learning, 112. Springer, (2013).
  20. 20.↵
    Jung, S.M., Akhmetzhanov, A.R., Hayashi, K., Linton, N.M., Yang, Y., Yuan, B., et al. Real time estimation of the risk of death from novel coronavirus (covid-19) infection: inference using exported cases. J Clin Med, 9(2), (2020) p, 523.
    OpenUrl
  21. 21.↵
    Khashei, M., Bijari, M., A novel hybridization of artificial neural networks and ARIMA models for time series forecasting, Appl. Soft Comput. 11(2), (2011), 2664–2675.
    OpenUrlCrossRef
  22. 22.
    Khashei, M., Hajirahimi, Z. A comparative study of series arima/mlp hybrid models for stock price forecasting. Commun Stat-SimulComput, (2018),1–16.
  23. 23.↵
    Kordanuli, B., Barjaktarović, L., Jeremić, L., Alizamir, M., Appraisal of artificial neural network for forecasting of economic parameters, Physica A, 465, (2017), 515–519.
    OpenUrl
  24. 24.↵
    Kucharski, A.J., Russell, T.W., Diamond, C., Liu, Y., Edmunds, J., Funk, S., et al. Early dynamics of transmission and control of covid-19: a mathematical modelling study. Lancet Infect Dis, (2020).
  25. 25.↵
    Kuncheva, L.I. Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, (2004).
  26. 26.↵
    Li, Q., Guan, X., Wu, P., Wang, X., Zhou, L., Tong, Y., et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N top N Engl J Med, (2020).
  27. 27.↵
    Maleki, S., Nasseri, M.S., Aminabad, M., Hadi, Comparison of ARIMA and NNAR models for forecasting water treatment plant’s influent characteristics, KSCE J. Civ. Eng. 22 (9), (2018), 3233–3245.
    OpenUrl
  28. 28.
    Milacić, L., Jović, S., Vujović, T., Miljković, J. Application of artificial neural network with extreme learning machine for economic growth estimation, Physica A 465, (2017), 285–288.
    OpenUrl
  29. 29.↵
    Nishiura, H., Linton, N.M., Akhmetzhanov, A.R. Serial interval of novel coronavirus (covid-19) infections. Int J Infect Dis. (2020).
  30. 30.↵
    Nury, A.H., Hasan, K., Alam, M.J.B. Comparative study of wavelet-arima and wavelet-ann models for temperature time series data in northeastern Bangladesh. J King Saud Univ-Sci, 29(1), (2017), 47–61.
    OpenUrl
  31. 31.↵
    Oliveira, M.R., Torgo, L., Ensembles for time series forecasting, J. Mach. Learn. Res. 39, (2014), 360–370.
    OpenUrl
  32. 32.
    Ordóñez, C., Lasheras, F.S., Roca-Pardiñas, J., de Cos Juez, F.J. A hybrid arima–svm model for the study of the remaining useful life of aircraft engines. J Comput Appl Math, 346, (2019), 184–191.
    OpenUrl
  33. 33.
    Pattnaik, M., Gahan, P. Stock market-growth link in Asian emerging countries: evidence from granger causality and co-integration tests, Advances in Growth Curve and Structural Equation Modeling, (2018), 21–44.
  34. 34.
    Paul, R.K., Samanta, S., Paul, M.R.K., LazyData, T. Package waveletarima. Seed, 500, (2017),1–5.
    OpenUrl
  35. 35.
    Percival, D.B., Walden, A.T. Wavelet methods for time series analysis, 4. Cambridge university press; (2000).
  36. 36.↵
    Petropoulos, F., Makridakis, S. Forecasting the novel coronavirus covid-19. PLoS ONE, 15(3), (2020), e0231236.
    OpenUrl
  37. 37.↵
    Roosa, K., Lee, Y., Luo, R., Kirpich, A., Rothenberg, R., Hyman, J., et al. Real-time fore-casts of the covid-19 epidemic in china from february 5th to february 24th, Infect Dis Modell. 5, (2020), 256–263.
    OpenUrl
  38. 38.
    Russell, T.W., Hellewell, J., Jarvis, C.I., van Zandvoort, K., Abbott, S., Ratnayake, R., et al. Estimating the infection and case fatality ratio for covid-19 using age-adjusted data from the outbreak on the diamond princess cruise ship. medRxiv (2020).
  39. 39.
    Tümer, A.E., Akkuş, A. Forecasting gross domestic product per capita using artificial neural networks with non-economical parameters, Physica A 512, (2018), 468–473.
    OpenUrl
  40. 40.↵
    Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model, Neurocomputing, 50, (2003), 159–175.
    OpenUrlCrossRef
Back to top
PreviousNext
Posted January 05, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Prediction of COVID-19 Pandemic of Top Ten Countries in the World Establishing a Hybrid AARNN LTM Model
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Prediction of COVID-19 Pandemic of Top Ten Countries in the World Establishing a Hybrid AARNN LTM Model
Padmabati Gahan, Monalisha Pattnaik, Agnibrata Nayak, Monee Kieran Roul
medRxiv 2020.12.31.20249105; doi: https://doi.org/10.1101/2020.12.31.20249105
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Prediction of COVID-19 Pandemic of Top Ten Countries in the World Establishing a Hybrid AARNN LTM Model
Padmabati Gahan, Monalisha Pattnaik, Agnibrata Nayak, Monee Kieran Roul
medRxiv 2020.12.31.20249105; doi: https://doi.org/10.1101/2020.12.31.20249105

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (76)
  • Allergy and Immunology (202)
  • Anesthesia (55)
  • Cardiovascular Medicine (495)
  • Dentistry and Oral Medicine (91)
  • Dermatology (57)
  • Emergency Medicine (170)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (217)
  • Epidemiology (5744)
  • Forensic Medicine (3)
  • Gastroenterology (220)
  • Genetic and Genomic Medicine (881)
  • Geriatric Medicine (89)
  • Health Economics (233)
  • Health Informatics (776)
  • Health Policy (399)
  • Health Systems and Quality Improvement (256)
  • Hematology (105)
  • HIV/AIDS (186)
  • Infectious Diseases (except HIV/AIDS) (6576)
  • Intensive Care and Critical Care Medicine (396)
  • Medical Education (119)
  • Medical Ethics (28)
  • Nephrology (94)
  • Neurology (856)
  • Nursing (45)
  • Nutrition (143)
  • Obstetrics and Gynecology (166)
  • Occupational and Environmental Health (265)
  • Oncology (520)
  • Ophthalmology (168)
  • Orthopedics (44)
  • Otolaryngology (107)
  • Pain Medicine (48)
  • Palliative Medicine (22)
  • Pathology (150)
  • Pediatrics (256)
  • Pharmacology and Therapeutics (147)
  • Primary Care Research (116)
  • Psychiatry and Clinical Psychology (989)
  • Public and Global Health (2263)
  • Radiology and Imaging (380)
  • Rehabilitation Medicine and Physical Therapy (175)
  • Respiratory Medicine (313)
  • Rheumatology (109)
  • Sexual and Reproductive Health (83)
  • Sports Medicine (83)
  • Surgery (118)
  • Toxicology (25)
  • Transplantation (34)
  • Urology (42)