COVID-19 in India: State-wise Analysis and Prediction

Coronavirus disease 2019 (COVID-19), a highly infectious disease, was first detected in Wuhan, China, in December 2019. The disease has spread to 210 countries and territories around the world and infected (confirmed) more than two million people. In India, the disease was first detected on 30 January 2020 in Kerala in a student who returned from Wuhan. The total (cumulative) number of confirmed infected people is 17615 till now across India (19 April 2020). Most of the research and newspaper articles focus on the number of infected people in entire India. However, given the size and diversity of India, it may be a good idea to look at the spread of the disease in each state separately, along with the entire country. For example, currently, Maharashtra has more than 2500 confirmed cumulative infected cases, whereas West Bengal has less than 300 confirmed infected cases (16 April 2020). The approaches to address the pandemic in the two states must be different due to limited resources. In this article, we will focus the infected people in each state (restricting to only those states with enough data for prediction) and build three growth models to predict infected people for that state in the next 30 days. The impact of preventive measures on daily infected-rate is discussed for each state.


Introduction
The world is now facing an unprecedented crisis due to the novel coronavirus, first detected in Wuhan,

Highlights of the Analysis
• Data considered for analysis: up to 16 April 2020.
• One model can mislead us. Here, we consider the exponential, the logistic and the SIS models along with daily infection-rate (DIR). We interpret the results jointly from all models rather than individually.
• We expect DIR to be zero or negative to conclude that COVID-19 is not spreading in a state. Even a small positive DIR (say 0.01) indicates virus is spreading in the community. The virus can potentially increase the DIR anytime.
• The states without a decreasing trend in DIR are Maharashtra, Delhi, Gujarat, Madhya Pradesh, Rajasthan, Uttar Pradesh and West Bengal.
• The states with an almost decreasing trend in DIR are Kerala, Andhra Pradesh, Haryana, Jammu and Kashmir, Karnataka, Punjab, Tamil Nadu and Telangana.
• States with non-decreasing DIR need to do much more in terms of the preventive measures immediately to combat the COVID-19 pandemic. On the other hand, the states with decreasing DIR can maintain the same status to see the DIR to become zero or negative for consecutive 14 days to be able to declare end of the pandemic.
Many news agencies are repeatedly saying or questioning whether India is now at stage 3 [ 10, 12,13 ]. Different Indian states are or will be at various stages of infection at different points in time. Labeling a COVID-19 stage at pan India level is problematic. It will spread misinformation to common people. Those states, which are at stage 3, require more rapid action compared to others. On the other hand, states that are in stages 1 and 2, need to focus on stopping the community-spreading of COVID-19. In this article, first, we discuss the importance of state-wise consideration, considering all the states together. Then, we will focus on the infected people in each state (considering only those states with enough data for prediction) and build growth models to predict infected people for that state in the next 30 days.
Why State-wise consideration?
India is a vast country with a geographic area of 3,287,240 square kilometers, and a total population of about 1. 3 billion [ 14 ]. Most of the Indian states are quite large in the geographic area and population.
Analyzing coronavirus infection data, considering entire India to be on the same page, may not provide us the right picture. This is so because the first infection, new infection-rate, progression over time, and preventive measures taken by state governments and the common public for each state are different. We need to address each state separately. It will enable the government to utilize the limited available resources . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint optimally. For example, currently, Maharashtra already has more than 2500 confirmed infected cases, whereas West Bengal has less than 300 confirmed cases (16 April 2020). The approaches to addressing the two states must be different due to limited resources. One way to separate the state-wise trajectories is to look at when each state was first Infected.
In Figure 2, we present the first infection date along with the infected person's travel history in each of the Indian states. All the states and the union territories, except Assam, Tripura, Nagaland, Meghalaya and Arunachal Pradesh, observed their first confirmed infected case from a person who has travel history from one or more already COVID-19 infected countries. The Indian government imposed a complete ban on international flights to India on 22 March 2020 [ 15 ]. Figure 1 justifies government action to international flight suspension. Had it been taken earlier, we could have restricted the disease only in few states compared to the current scenario.  Maharashtra having more than 2500 cases. Kerala, the first state to have the COVID-19 confirmed case, seems to have restricted the growth-rate under control. There are few states with cumulative infected people number in the range of 100-500. Depending on how those states strictly follow the preventive measures, we may see a rise in the confirmed cases.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020  . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint

Statistical Models
In this article, we consider the exponential model, the logistic model, and the Susceptible Infectious can write the exponential model as Where is the cumulative confirmed case at a specific time (date), 0 is the initial population, μ is the maximum growth rate; time is the number of days from first confirmed infection [ 23 ].
Logistic Model: Some pandemics follow an S-shaped curve (sigmoid curve). In other words, the pandemic may start slowly; then, it will increase the growth-rate (infection-rate), and finally, it will flatten the growthrate over time. The following logistic model can capture that[ 23 ] where is the maximum population size; other parameters have the same meaning as in the exponential model.

Susceptible Infectious Susceptible (SIS) model:
The SIS model is used for a given closed population that is susceptible to a particular disease, is prone to be infected, and communicate the infection within the community [ 22 ]. It is a time dynamic model with the numbers of susceptible and infected people changing with time according to two different compartments which are characterized by two differential equations: . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint In the above two differential equations, we are trying to observe the rate of change of the susceptible (S) population � � towards the inflection, and also the rate of change of the infected (I) persons � �. The model assumes two parameters, namely , which is the average number of contacts per person per unit time, and , which is obtained as, = 1 , with being the recovery time (specifically, it is the time during which a particular patient can infect others). Here N denotes the total population size with N = S + I.
Using the above models in state-level data: The above three models will provide different prediction perspective for each state. The exponential model-based prediction will give a picture of what could be the cumulative number of infected people in the next two months if we do not take any preventive measures.
We can consider the forecast from the exponential model as an estimate of the upper bound of the total number of infected people in the next two months. The logistic model-based prediction will capture the effect of preventive measures that have already been taken by the respective State Governments as well as the Central Government. As pointed out earlier, the logistic model assumes that the infection rate will slow down in the future with an overall "S" type growth curve. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint The DIR takes a postive value when we see an increase in active COVID-19 cases from yesterday, the zero value in case of no new active cases from yesterday, and a negative value when the total number of active cases decreases from the previous day. A DIR value can be more than 1 also, particularly during initial days of infection in a state. For example, when the total number of active cases increases from 5 yesterday to 20 today, then the DIR value is  if the current part of the graph of observed active infected patients (red-line) is above the 75 th percentile line, then there is a major concern for that state. We may need to increase the lockdown period in a state if we do not see the declining graph of observed active infected patients (red-line).
India implemented a nationwide lockdown from 25 March 2020. We first considered the incubation period of novel coronavirus to study the effect of lockdown. The incubation period of an infectious disease is defined as the time between infection and the first appearance of signs and symptoms [ 26 ]. Using the incubation period, the health researchers can decide on the quarantine periods and halt a potential pandemic without the aid of a vaccine or treatment [ 27 ]. The estimated median incubation period for COVID-19 is 5.1 days (95% CI: 4.5 to 5.8 days), and 97.5% of those who develop symptoms will do so within 11.5 days (CI: 8.2 to 15.6 days) of infection [ 28 ]. The WHO recommends that a person with laboratory-confirmed COVID-19 be quarantined for 14 days from the last time they were exposed to the patient [ 29 ]. Therefore, if a person was infected before the lockdown (25 March 2020), they should not infect others except their family members if that person is entirely inside their house for more than 14 days. WHO also recommends common people to maintain a distance of at least 1 meter from each other in public place to avoid the COVID-19 infection. The effective implementation of social distancing can stop the spread of the virus from an infected person, even when they are outside for some essential work. However, given a highly dense population in most of India, particularly in cities, it may not always be possible to maintain adequate social distance.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

State-wise Analysis and Prediction Report
In this section, we depend on inputs from the exponential, logistic, and SIS model along with daily infection-rates for each state. Remembering the word of famous statistician George Box "All models are wrong, but some are useful," we interpret the results from different models jointly. We consider different states in descending order of the number of cumulative infected cases. For each state, we present four graphs. We have used the state level data till 16 April 2020. The first and second graphs are based on the logistic and the exponential models, respectively, with the next 30 days predictions. The third graph is the plot of daily infection-rate for a state. Finally, the fourth graph is showing the growth of the active infected patients using SIS model prediction ("pred") along with the observed active infected patients. We do not show the next 30 days prediction using SIS model to ensure the distinguishability of the different line-

Maharashtra:
The situation in Maharashtra is currently very severe with respect to the active number of cases. As of 16 April 2020, the total number of active cases is 2619. The logistic model indicates that in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint another 30 days from now, the state can observe around 7500 cumulative infected cases. The daily infectionrates for this state are constantly above 0.1 in the last few days, and it was more than 0.4 for two days at the beginning of April. The line-graphs from the SIS model are alarming as the observed active infected patients (red-line, 4 th panel) line is far above the predicted line with estimated infection-rate at the 80 th percentile (β = 0.26). It is apparent from the graphs that even after 20 days of lockdown, Maharashtra has not seen any decline in the number of active cases. This may also indicate that there could be a large number of people who are in the community without knowing that they are carrying the virus.
Delhi: Delhi, being a high population-density state, has already observed 1578 confirmed COVID-19 cases. Based on the logistic model, the predicted number of cumulative infected cases could reach around 2661 in the next 30 days. The daily infection-rate (DIR) has not seen a downward trend in the past few days. Except for the last two days, the line-graph (red-line, 4 th panel) of observed active infected patients is reflecting an exponential growth, especially after lockdown. The observed infection-rate is currently in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint between 0.23 to 0.36, which is quite high, considering the preventive measures are already in place. The high infection-rate may suggest that there could be many people who are in the community without knowing that they are already infected with the COVID-19. The state could be heading to community spreading of

COVID-19 (stage 3).
Tamil Nadu: The cumulative infected cases in Tamil Nadu is 1242. The state has observed a high daily infected-rate of more than 0.7 in some days in March. Tamil Nadu is one of the states where the effect of lockdown is visibly seen from the declining daily infected-rates in the last two weeks. The line-graph (redline, 4 th panel) of observed active infected patients is still showing an increasing trend. However, it is now far below the curve based on the estimated 75 th percentile of observed infected-rates (β = 0.47). Based on the logistic model, the cumulative infected cases may be saturated after reaching around 1400 cases.

Madhya Pradesh:
The state currently has 880 active COVID-19 cases. It may seem that situation is under control as the active cases are under a thousand for a large state like Madhya Pradesh. However, a closer . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint look may reveal a different picture. In the later part of the lockdown, after 10 April, the state observed a few days with infection-rate more than 0.4. Till now, there is no sight of a declining trend in the daily infected-rates. The same type of conclusion can be drawn from the line-graphs of the SIS model. The linegraph (red-line, 4 th panel) of observed active infected patients is in between the lines corresponding to 25 th -50 th percentiles line-graph. The same line-graph is exhibiting an exponential growth after 10 th April.
Notice that, for Madhya Pradesh, the 25 th percentile of observed infected-rates is 0.27, which is higher than the 50 th percentile of some other states. The high growth of active cases in the latter part of the lockdown is a major concern for this state. It could be a signal of a community spread of the COVID-19.

Rajasthan:
The western state of India, Rajasthan, reported 1023 cumulative infected COVID-19 cases. The logistic model indicates that in another 30 days from now, the state can observe around 2000 cumulative infected cases. The state has not observed a specific trend in the daily infected rates during the lockdown . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint period. The line-graph (red-line, 4 th panel) of observed active infected patients is increasing and is in between the curves of 50 th -75 th percentiles of observed infected-rates (0.18-0.33) using the SIS model. The current infection-rates for Rajasthan are still on the higher side despite the lockdown.

Gujarat:
The state is currently experiencing exponential growth with 871 as the cumulative number of COVID-19 cases. Unless the spread of the virus is controlled, the predicted cumulative number of cases could be more than one lakh as per the logistic model in the next 30 days, which could be an overestimation.
However, daily infected-rates has not shown any declining trend. The line-graph (red-line, 4 th panel) of observed active infected patients is close to the curve of the estimated 75 th percentile of observed infection-. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint a clear decreasing trend in the last two weeks. This may indicate that there are many unreported cases, considering the large size of Uttar Pradesh. In the absence of preventive measures, unreported cases can contribute to spreading the virus in the community.

Telangana:
The southern Indian state of Telangana has till now reported a cumulative number of COVID-19 cases of 698. The logistic model predicts that the number of cases for the state will be around 750 in the next few days, and it will stop increasing after that. In the fourth graph, the line-graph (red-line, 4 th panel) . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint shows that the active number of cases has continuously remained below the curve of the 75 th percentile of the observed infection rate (β = 0.26). There has also been a weak decreasing trend in the daily infection rate. It may indicate that in the absence of any preventive measure, the numbers could have increased manifold.

Andhra Pradesh:
The line-graph (red-line, 4 th panel) shows that the number of active cases is far below the 75 th percentile of the observed daily (β = 0.39) infection rate. This state has seen an apparent decreasing daily infection rate in the latter part of the lockdown. The logistic model is showing the maximum number . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint of cumulative infected people will be around 550, which may be unlikely to be true. However, with effective implementation of preventive measures, the state can control the COVID-19 spread.

Kerala:
The southern state of Kerala is one of the few states of India, where the effect of the lockdown is observed strongly. The state reported the first COVID-19 case in India. However, Kerala has been able to control the spread of the virus to a large extent to date. The cumulative number of cases reported until now is 388. Using the logistic model, the predicted number of cumulative confirmed cases could be around 400 in the next 30 days. It is also the only state where the line-graph (red-line, 4 th panel) of observed active . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint infected patients has started to go down, which shows that the lockdown and other preventive measures have been effective for this state. The daily infected-rate has declined steadily from positive to negative values. It can be expected that with the present scenario of the extended lockdown, the number of active cases will be few in the coming few days. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint infection-rate is on the lower side. We can observe the ups and downs of the daily infected-rate with upper bound as 0.2 from early April. However, the preventive measure needs to be maintained to control the spread of the virus.

Jammu and Kashmir:
The northernmost state of Jammu and Kashmir has seen 300 cumulative infected cases so far. The line-graph (red-line, 4 th panel) of observed active infected patients has been far below the 75 th percentile of the observed daily infected rate (β = 0.46). From 9 April, the daily infected-rate is decreasing. However, there could be many unreported cases, which are allowing infection to spread even during the lockdown period.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020 . In a similar duration from the first case, the USA reported more than 400,000; both Spain and Italy reported more than 150,000 confirmed COVID-19 cases.
To gain some more perspective, note that, the USA has around one-fourth of the Indian population.
Therefore, according to the reported data so far, India seems to have managed the COVID-19 pandemic better compared to many other countries. One can argue that India has conducted too few tests compared to its population size [ 30 ]. However, a smaller number of testing may not be the only reason behind the low number of COVID-19 confirmed cases in India so far. India has taken many preventive measures to combat COVID-19 in much earlier stages compared to other countries, including nationwide lockdown from 25 March 2020. Apart from lockdown, people have certain conjectures about possible reasons behind India's relative success, e.g., measures like the travel ban relatively early, use of BCG vaccination to combat tuberculosis in the population that may have secondary effects against COVID-19 [ 31,32 ], exposure to . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 29, 2020. However, as of now, there is no concrete evidence to support these conjectures, although some clinical trials are currently underway to investigate some of these [ 36 ].
Note that India may have seen fewer COVID-19 cases till now, but the war is not over yet. There are many states like Maharashtra, Delhi, Madhya Pradesh, Rajasthan, Gujrat, Uttar Pradesh, and West Bengal, who are still at high risk. These states may see a huge jump in confirmed COVID-19 cases in the coming days if preventive measures are not implemented properly. On the positive side, Kerala has shown how to effectively "flatten" or even "crush the curve" of COVID-19 cases. We hope India can be free of COVID-19 with a strong determination as already shown by the central and respective state Governments.
There are a few works that are based explicitly on Indian COVID-19 data. Das [ 37 ] has used the epidemiological model to estimate the basic reproduction number at national and some state levels. Ray et al. [ 38 ] used a predictive model for case-counts in India. They also discussed hypothetical interventions with various intensities and provided projections over a time horizon. Both the articles have used SIR (susceptible-infected-removed) model for their analysis and prediction. As we discussed earlier, considering the great diversity in every aspect of India, along with its vast population, it would a much better idea to look at each of the states individually. The study of each of the states individually would help decide further actions to contain the spread of the disease, which can be crucial for the specific states only.
In this article, we have mainly focused on the SIS model along with the logistic and the exponential models at each state (restricting to only those states with enough data for prediction). The SIS model takes into account the possibility that an infected individual can return to the susceptible class on recovery because hand, the states with decreasing DIR can maintain the same status to see the DIR to become zero or negative for consecutive 14 days to be able to declare end of the pandemic. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint

41.
WHO is investigating reports of recovered COVID patients testing positive again -Reuters.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 29, 2020. . https://doi.org/10.1101/2020.04.24.20077792 doi: medRxiv preprint