Predicting Trends of Coronavirus Disease (COVID19) Using SIRD and Gaussian-SIRD Models

Eruption of COVID-19 patients in 215 countries worldwide have urged for robust predictive methods that can detect as early as possible size and duration of the contagious disease and also providing precision predictions. In many recent literatures reported on COVID-19, one or more essential parts of such investigation were missed. One of crucial elements for any predictive method is that such methods should fit simultaneously as many data as possible; these data could be total infected cases, daily hospitalized cases, cumulative recovered cases and deceased cases and so on. Other crucial elements include sensitivity and precision of such predictive methods on amount of data as the contagious disease evolved day by day. To show importance of these aspects, we have evaluated the standard SIRD model and a newly introduced Gaussian-SIRD model on development of COVID-19 in Kuwait. It is observed that SIRD model quickly pick up main trends of COVID-19 development; but Gaussian-SIRD model provides precise prediction at longer period of time.

I. INTRODUCTION COVID-19 outbreak was first reported as contiguous disease created by novel corona virus in Wuhan, China in late December 2019. Soon after, the world has faced with the most rapidly transmitting pandemic disease which halted our normal way of living with 11,312,265 infected and 531,257 deceased cases worldwide on 5th July 2020 [1].
Quick and precise prediction of endemic/pandemic contagious diseases dynamics is very important for health authorities and governments to tackle the disease in the most efficient and economical way. COVID-19 have caused many businesses broken down and several million people worldwide out of their jobs. SIR model is well known and widely used method introduced by Kermack andMcKendrick [2] in epidemiological studies. SIR method is based on 3-set of ordinary differential equations expressing susceptible, infected, and removed populations in a community with constant total population. SIR model assumes that sex, age, social behaviour, and similar factors has no effects on development of a contagious disease. Exact solutions to SIR model may improve predictive capabilities of this method by employing optimization techniques. However, there are a limited exact solution to SIR model in simplified forms. Bailey [3] provided a simplified form of SIR equations. Bohner et al. [4] reported exact solution to Bailey's model of SIR equations. In his work, recovered population is not considered in susceptible and infection equations. Harko et al. [5] considered birth and death rates in an exact solution to SIR model; however, the validity of their method was not evaluated against actual epidemic data. Maliki [6] and Shabbir et al. [7] reported separately exact solutions to some simplified SIS and SIR equations. These simplified models do not include the removed populations and only consider susceptible and infected populations.
In the present work, SIRD model consist of susceptible, infected, recovered, and deceased populations correspond to 4set of ordinary differential equations (ODE) are considered. In the new Gaussian-SIRD model, a Gaussian function is assumed for infected population; hence, 4-set of explicit analytical solutions are found for SIRD populations. MATLAB optimization are applied to fit the SIRD and Gaussian-SIRD models with COVID-19 data in Kuwait. Goodness of fit functions for both methods were evaluated using the coefficient of determination (R2). Sensitivity and precision of both methods are compared for COVID-19 development in Kuwait and advantageous and pitfalls of each method are discussed and conclusions of this study are drawn.

II. MATERIALS AND METHDOS
To tackle a pandemic, correct evaluation of transmission and growth rates of disease and affected populations are essential. SIR model is the widely used method in epidemiology studies. In SIR type models, total population size is considered constant during an endemic/pandemic. SIR model does not consider human factors including sex of patients, age and social behaviour of patients, and also location of infected cases. These are shortcomings of SIR model for predicting pandemics such as COVID-19 for which strong dependency observed in individual age or sex and social behaviour [8,9].

A. SIRD Model
A modified version of SIR model to include deceased population referred as SIRD model here. SIRD model consist of 4-set of ordinary differential equations on susceptible, infected, recovered, and deceased population as follows [10,11]: It is assumed that initial total population (N) is constant during an endemic/pandemic: = + + + = .
(5) In equations (1)(2)(3)(4)(5), is the transmission rate, is the recovery rate, is the death rate in an endemic/pandemic. (= + ) expresses the removing rate of recovered and deceased populations from susceptible population. An important factor in studying epidemiological field is called the reproduction number 0 (= ⁄ ), which indicates severity of an outbreak. If 0 value is larger than one then it is expected that infection will be spread in susceptible population. The larger 0 will indicate harder to control of spread of infection.

B. Gaussian-SIRD Model
Normal or Gaussian distribution function is widely used in statistics to deal with continuous probability of real value of a random variable [12]. The normal probability density function is often used in studying natural and social subjects when real value of the random variables is not known. Gaussian distribution function of two parameter family ( , ) may be reformulated for infected population of a pandemic as follows [13]: In equation (6), ( ) is the Gaussian normal distribution function of daily infected population, is the total population, μ is the mean (or expectation) of the distribution and is the standard deviation. The , and are model parameters and can be obtained by fitting Gaussian equation (6) with infectious population from a pandemic data.
Exact solutions to recovered (R) and deceased (D) is simply obtained by plugging equation (6) into equations (3) and (4) to obtain cumulative solution as follows [13]: In equations (6) and (7), the error function erf( ) is a special function and is evaluated numerically. Substituting equation (6) into equation (1), one may obtain a solution for susceptible population as follows: By changing the total susceptible population (N) in SIRD model, the recovered and deceased will be massively changed. To fine-tune expected recovered and deceased cases, we propose to use variable recovery rate ( ) and deceased rate ( ) as follows: The power factor (n) is any number and can be adequately fine-tuned to best fit the recovered and deceased populations data.

C. Evaluation on goodness of fit: Regression coefficient
Regression coefficient (R 2 ) is a statistical measure to compare predicted values (y) from SIRD or Gaussian-SIRD models here against actual data (x) for COVID-19. The evaluation of R 2 is done separately for each population: susceptible, infected, recovered, and death as follows [16]: ̅ in equation (14) is the average of actual COVID-19 data values. Bette fit functions indicated the regression coefficient (R 2 ) close to unity.

D. Sensitivity analysis
To measure sensitivity of a predictive method, a comparison of actual data values (x) and predicted values (y) can be expressed in percentage of error as follows [16]: Error percentage values closer to zero provides a guide on sensitivity and precisions of these methods.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 30, 2020. ;

III. RESULTS
We have investigated sensitivity of SIRD and Gaussian-SIRD models on predicting population size and peak day of infectious using 20, 40, 60, 80, 100, and 116 subsequent days of COVID-19 data in Kuwait [17]. Results of optimized SIRD [18] and Gaussian-SIRD models are presented and compared in Fig. 1 for susceptible, infected, recovered, and deceased populations.
As seen in Figs.1a-1d, the peak day of infection is overpredicted by Gaussian-SIRD model. Gaussian-SIRD model cannot predicts correctly the peak of infection day until 60 days after outbreak. Figs. 1a-1h indicate that SIRD model have started from an overpredicted size of active infected cases until it merges to smaller size; whilst Gaussian-SIRD behaviour is opposite started from small population size to more realistic size after 80 days.   is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted November 30, 2020. ; Figs. 1a-1j also indicate that SIRD model have predicted a nearly fixed population size for recovered cases; but Gaussian-SIRD model overpredicted the recovered cases until 100 days from the outbreak. Figs. 1k-1l, shows that both methods have nearly predicated same results after 116 days from outbreak; although better fit is observed using Gaussian-SIRD model to COVID-19 data. Table 1 shows accuracy and optimized coefficients of SIRD and Gaussian-SIRD equations for 20, 40, 60, 80, 100, and 116 consequent days of COVID-19 data in Kuwait. COVID-19 data were obtained from the ministry of health (MOH) web-link in Kuwait. From Table 1, it is observed that for 20 days data SIRD model cannot provide any meaningful value for regression coefficient; however, Gausian-SIRD model cannot provide only for death cases (zero case data). Also, in SIRD model, the total population of N=50,000 is used; but Gaussian-SIRD model requires larger total population size of N=700,000 to properly fit COVID-19 data. As seen in Table 1, SIRD model gives a large re-production number at start of the pandemic; therefore, it is a best candidate for detecting severity of a pandemic. Gaussian-SIRD method is somehow insensitive in terms of detecting severity of the pandemic and cannot be a good indicator. It gives the reproduction number greater than one after 100 days of outbreak. Figure 2 compares results of SIRD and Gaussian-SIRD models on predicting peak day of infection of COVID-19 in Kuwait. Fig. 2a shows that SIRD model started with low susceptible, high active infected, low recovered, high deceased and high total infected cases using 20 days of COVID-19 data. The values are gradually moderated as days of pandemic passed by. All high values are giving true warning on spread of disease; although not realistic. In contrast, Gaussian-SIRD model in Fig.  2b started with least alarming results except with large number of total infected cases with zero death cases on peak day of infection. Fig. 2c shows that how SIRD model predictions are moderated gradually by adding more COVID-19 data, whilst Fig. 2d shows that Gaussian-SIRD model have picked up true dynamics of the pandemic after 60 days. Fig. 2e indicates that predicted SIRD populations are widely changes as time evolved; however, Fig. 2f indicate that Gaussian-SIRD model provide more precise prediction on population size after 60 days. Fig. 2g indicates that percentage error of SIRD model remain high for deceased population except after 116 days of data; although other population size prediction errors are not converging to zero after 116 days of data. This is maybe due to high sensitivity of SIRD model to data values than Gaussian-SIRD model (see Fig. 2h). The Gaussian-SIRD model converged to error values below 10% except for deceased population after 116 days. Table 2 compares percentage error on prediction of population sizes on peak day of infection using both methods. Gaussian-SIRD model offers more precise fit after 116 days of COVID-19; yet too late for any practical use.