Adaptive short term COVID-19 prediction for India

In this paper, a data-driven adaptive model for infection of COVID-19 is formulated to predict the confirmed total cases and active cases of an area over 4 weeks. The parameter of the model is always updated based on daily observations. It is found that the short term prediction of up to 3-4 weeks can be possible with good accuracy. Detailed analysis of predicted value and the actual value of confirmed total cases and active cases for India from 1st June to 3rd July is provided. Prediction over 7, 14, 21, 28 days has the accuracy about 0.73% {+/-} 1.97%, 1.92% {+/-} 2.95%, 4.34% {+/-} 3.91%, 6.40% {+/-} 9.26% of the actual value of confirmed total cases. Similarly, the 7, 14, 21, 28 days prediction has the accuracy about 1.24% {+/-} 6.57%, 3.04% {+/-} 10%, 6.33% {+/-} 16.12%, 10.2% {+/-} 24.14% of the actual value of confirmed active cases.


Introduction
The accurate predictions of the spread of the novel coronavirus  are essential for planning and management of medical resources as well as lockdown strategy. A good prediction can avoid the shortage of critical medical resources, economical loss associated with unnecessary lockdowns. Mathematical time windows such as 7 days, 14 days, 21 days, and 28 days. In case of 2 days time window, It is found that error in prediction is within 6.40 % ± 9.26 % and 10.22 % ± 24.14 % for the prediction of total cases and active cases.
The rest of the paper is described as follows. The basic model is described 60 in Section 2. The model parameters are estimated in Section 3. The proposed model is validated with COVID-19 statistics of India in Section 5. Expected total and active cases for a month are predicted in Section 7. The prediction results are discussed and concluded in Discussions and Conclusions Sections respectively. due to non-development of symptoms in the infected person. In Fig. 1, different categories of different cases are shown. We will consider the population as   Fig. 2). Infectious cases are divided into two categories, active cases(A) and active unreported cases A ur . Similarly, recovered and deceased cases are also classified as those who are reported and unreported cases. So, if N is the total population,then, where, R, R ur are the reported recovered and unreported recovered cases re-75 spectively. Similarly, D, D ur are the reported and unreported deceased cases 4 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint respectively. Also, As per standard SIR model, If β is the infection rate, then, Also, we can write, where γ(t) and µ(t) are the rate of recovered and death among the reported 80 cases; and γ ur and µ ur are the rate of recovered and death cases among unreported cases.

Parameter estimation
The different parameter of the above model is estimated from the observed data and dynamically adjusted using the daily observations. The active cases, 85 recovered and deceased from the reported cases are readily available; whereas, the similar statistics for the unreported cases need to be estimated from random antibody tests or serological tests. The confirmed active cases at each day are obtained by the following equation 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 21, 2020.
From the previous reported daily statistics of daily new infected cases, recovered cases, and deceased cases; the rate of growth of the total number of cases, recovered cases and deceased cases are obtained.
The rate of growth of total cases are obtained by taking derivative of the variable T. T can be represented by different functions. In the case of India, it 95 is observed that T can be approximated using a fourth-order polynomial. Let consider T be, The coefficients (a T , b T , c T , d T , e T ) of the function T is obtained using the weighted least squares method by minimizing the error between the predicted value and the observed daily values. The weighted least squares method is used 100 to provide more weightage on the recent values which in turn reflects the status of lockdown, social distancing measured followed by the citizens. The following cost function is used.
where y t is the observed cumulative total cases, T is expressed as T = Xζ, and ζ are the coefficients of the polynomial. The weights (w t are selected from where W is the diagonal matrix consist of weights (w t ). It is observed that reported recovered and deceased cases can be approximated using the fourthorder and second-order polynomial respectively.

Prediction
Prediction values are updated based on the current observations. The data 115 of total cases recovered cases, and deceased cases from 14 th March to 3 rd July for India is considered in the analysis. The data is collected from the  Table 1. 7 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Total case prediction validation
The difference between the predicted value and the actual value is compared to check the efficiency of the prediction algorithm. The predicted value on the 7 th , 14 th , 21 th and 28 th days back is considered for comparison. The predictions from June 1 st to July 3 rd are considered for detailed analysis. The total case 130 prediction over a 7 days time window and the corresponding actual value is shown in Table 2. In 26 th June, the total cases prediction for 3 rd July was 61130 and the actual value reported is 627065. So, the error in prediction is 2.54 % of the actual value of 627065. From Table 2, over the complete duration of June 1 st to July 3 rd , the error between the actual and predicted value is 135 0.73 % ± 1.97 % of the actual value.
The predicted and actual values over a 14 days prediction window from June 1 st is tabulated in Table 3. As per Table 3 In case of prediction on 14 days duration, the error between the actual and the predicted value is found to be 1.92 % ± 2.95 % of the actual value. Similary, prediction over 21 days and 28 The error in prediction from mid-April to July 3 rd is plotted in Figure 5a. In is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

11
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint 13 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020.  Table 6, Table 7, Table 8, and Table 9 respectively. The difference between the predicted active case and the actual active case is found to be 1.24 % ± 6.57 %, 15 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint  The prediction is good in the smaller time window and error between the actual and predicted case grows with larger time window prediction. The prediction error from June 1 st to 3 rd July is tabulated in Table 10. The prediction is better in case of total cases compared to active cases. The reason behind the higher value of error bound could be the frequent change in discharge policy by 165 the various state governments to adjust the growth of active patients in hospitals, which in turn affected the recovery rates. Also sometimes the deceased cases are adjusted in a single day after detailed accounting which caused a high 20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint jump in the deceased curve. Discharge policy by the governments and reporting of deceased cases have caused the large variation in the actual value of active 170 cases over daily-basis. However, the reporting of the total case is smooth ( Figure   3a); therefore, the bound of prediction error is also less.

Future prediction
Future predictions of total cases of India from the proposed algorithm over different time windows are provided in Table 11, Table 12, Table 13, and Table   175 14. We will compare this table in future to further validate our algorithm.
Similar predictions for the active cases are tabulated in Table 15, Table 16, Table 17, and Table 18. The prediction for next 28 days based on the upto date observations are also included in Table 19 for reference. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint In this paper, a data-driven prediction model is proposed to predict the total cases and active cases of an area. The model can apply to an area. It is observed that the error between the actual value and predicted value is 4.34 % ± 3.91 % and 6.33 % ± 16.12 % for total cases and active cases respectively over the prediction window of 21 days. We have also provided the prediction for the next 185 28 days from today for further validation of the proposed algorithm. The short term prediction can be used in the allocation of scar medical resources among different units, optimal lockdown planning.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted July 21, 2020. . https://doi.org/10.1101/2020.07.18.20156745 doi: medRxiv preprint