Forecasting COVID-19 and Analyzing the Effect of Government Interventions

Background : During the COVID-19 epidemic, governments around the world have implemented unprecedented non-pharmaceutical measures to control its spread. As these measures carry significant economic and humanitarian cost, it is an important topic to investigate the efficacy of different policies and accurately project the future spread under such said policies. Methods : We developed a novel epidemiological model, DELPHI, based on the established SEIR model, that explicitly captures government interventions, underdetection, and many other realistic effects. We estimate key biological parameters using a meta-analysis of over 190 COVID-19 research papers and fit DELPHI to over 167 geographical areas since early April. We extract the inferred government intervention effect from DELPHI. Findings : Our epidemiological model recorded 6% and 11% two-week out-of-sample Median Absolute Percentage Error on cases and deaths, and successfully predicted the severity of epidemics in many areas (including US, UK and Russia) months before it happened. Using the extracted government response, we find mass gathering restrictions and school closings on average reduced infection rates the most, at 29 . 9 ± 6 . 9% and 17 . 3 ± 6 . 7% , respectively. The most stringent policy, stay-at-home, on average reduced the infection rate by 74 . 4 ± 3 . 7% from base-line across countries that implemented it. We also further show that a reversal of stay-at-home policies in some countries, such as Brazil, could have disastrous results by end of July. Interpretation : Our findings highlight that among the widely implemented policies around the world, mass gathering restrictions and school closings appear to be the most effective policies in reducing the infection rate. Given the continued spread of the epidemic in many countries, we recommend these policies to continue to the extent that they can be feasibly implemented. Our results also show that under an assumption of R0 of 2.5-3 for COVID-19, stay-at-home policies appear to be the only effective policy that was widely implemented in reducing the R0 below 1. This implies that stay-at-home policies might be necessary, for at least the vulnerable population, if an uncontrolled second wave reemerges.


Introduction
Currently, the world is facing the deadliest pandemic in recent history -COVID-19.As of June 7th, there have been over 7.0 million con rmed cases of COVID-19 and the disease has taken over 400,000 lives.To stop the further spread of COVID-19, governments around the world have enacted some of the most wide-ranging non-pharmaceutical interventions in history.ese interventions, especially the more severe ones, carry signi cant economic and humanitarian cost.us, it is critical to understand the e ectiveness of such interventions in limiting disease spread.
However, there are many challenges in a empting to understand the e ect of government interventions in a speci c region or country.Di erent regions have implemented, o en concurrently, a variety of di erent policies, and worse, even the same interventions could produce largely di erent e ects in di erent societies, due to di erences in factors such as demographics, population density, and culture.us, in order to provide a sensible analysis of the e ect of policies across di erent countries, in early April we created a novel epidemiological model, DELPHI, to model the spread of COVID-19.DELPHI (Di erential Equations Lead to Predictions of Hospitalizations and Infections) extends a classic SEIR model to include many realistic e ects that are critical in this pandemic, including deaths and underdetection.Speci cally, we included an explicit nonlinear multiplicative factor on the infection rate to model the spread as it happened in di erent regions.
Such explicit characterization of government intervention allows us to understand the e ect of di erent non-pharmaceutical interventions as they have been implemented in various regions while accounting for regional population characteristics including baseline infection rate and mortality rate.Furthermore, we formulated DELPHI with data scarcity as a key consideration.e aforementioned innovations have allowed DELPHI to produce relatively accurate projections even during the early stages of the epidemic.A major hospital system in the United States planned its intensive care unit (ICU) capacity based on our forecasts.Our epidemiological predictions are used by a major pharmaceutical company to design a worldwide vaccine distribution strategy that can contain future phases of the pandemic.ey have also been incorporated into the US Center for Disease Control's core ensemble forecast.[1] DELPHI has been applied to 167 geographic areas (countries/provinces/states) worldwide, covering all 6 populated continents.Its results and insights have also been available since early April on www.covidanalytics.io.In this paper, we document the statistical innovations, quantitative results, and insights extracted from the DELPHI model.

e DELPHI Model
e DELPHI model is a compartment epidemiological model that extends the classic SEIR model into 11 states under the following 8 groups: • Susceptible (S): People who have not been infected.
• Exposed (E): People currently infected, but not contagious and within the incubation period.
• Infected (I): People currently infected and contagious.
• Undetected (U R ) & (U D ): People infected and self-quarantined due to the e ects of the disease, but not con rmed due to lack of testing.Some of these people recover (U R ) and some die (U D ).
• Detected, Hospitalized (DH R ) & (DH D ): People who are infected, con rmed, and hospitalized.Some of these people recover (DH R ) and some die (DH D ).
• Detected, arantine (DQ R ) & ( DQ D ): People who are infected, con rmed, and homequarantined rather than hospitalized.Some of these people recover (DQ R ) and some die (DQ D ).
• Recovered (R): People who have recovered from the disease (and assumed to be immune).
• Deceased (D): People who have died from the disease.
In addition to main functional states, we introduce auxiliary states to calculate a few useful quantities: Total Hospitalized (TH), Total Detected deaths (DD) and Total Detected Cases (DT).
e full mathematical formulation of the model with all the di erential equations can be found in the a ached Supplementary Materials.To limit the amount of data needed to train this model, only the parameters denoted with a tilde are being ed against historical data for each area (country/state/province); the others are largely biological parameters that are xed using available clinical data from a meta-analysis of over 190 papers on COVID-19 available at time of model creation.[2] A small selection of references for each parameter is given below.
• α is the baseline infection rate.
• γ(t) measures the e ect of government response and is de ned as: where the parameters t 0 and k capture, respectively, the timing and the strength of the response.e e ective infection rate in the model is αγ(t), which is time dependent.
• r d is the rate of detection.is equals to log 2 T d , where T d is the median time to detection ( xed to be 2 days).[3] • β is the rate of infection leaving incubation phase. is equals to log 2 T β , where T β is the median time to leave incubation ( xed at 5 days).[4] • σ is the rate of recovery of non-hospitalized patients.is equals to log 2 T σ , where T σ is the median time to recovery of non-hospitalized patients ( xed at 10 days).[5,6] • κ is the rate of recovery under hospitalization.is equals to log 2 T κ , where T κ is the median time to recovery under hospitalization ( xed at 15 days).[7,8] • τ is the rate of death. is captures the speed at which a dying patient dies, and thus inversely proportional to how long a dying patient stays alive.
• µ is the mortality percentage. is is the percentage of people who die from the disease in a particular region.Note this quantity is independent from the rate of death.
• p d is the (constant) percentage of infectious cases detected.is is set to 20%.[3,9,10] • p h is the (constant) percentage of detected cases hospitalized.is is set to 15%.[11,12] erefore, we t on 5 parameters from the list above ( α, µ, τ, t 0 , k).In addition, we introduce two additional parameters k 1 , k 2 to account for the unknown initial population in the infected (I) and exposed (E) states (see Supplementary Materials for details).We thus t seven parameters per area.
e parameters are ed by minimizing a weighted Mean Squared Error (MSE) metric with respect to the parameters.De ne DT (t) and DD(t) as the number of reported total detected cases and detected deaths, respectively, on day t.en, the loss function for a training period of T days is de ned as: where DT (t) and DD(t) are respectively the total detected cases and deaths predicted by DEL-PHI.e factor t gives more prominence to more recent data, as recent errors are more likely to propagate into future errors.e lambda factor λ = min DT (T ) 3•DD(T ) , 10 balances the ing between detected cases and deaths; this re-scaling coe cient was obtained experimentally.We only include historical data starting when the area recorded more than 100 cases; this allows us to exclude sporadic outbreaks that are not epidemics.Non-convex optimization methods, including trust-region methods [13] and the Nelder-Mead method [14], are utilized to carry out the process of minimization.
In the following subsections, we will detail the three key characteristics of the DELPHI model compared to the standard SEIR formulation.

Accounting for Under-detection
In the COVID-19 crisis, one of the key modeling di culties is the chronic underdetection of con rmed cases.is is both due to the lack of detection abilities in the early stages of the pandemic and also the similarity between a mild case of COVID-19 and the common u.us, to account for such signi cant e ect, we explicitly included the U R /U D states to model people who actually contracted COVID-19 (and are infectious), but were not detected.In particular, we assume that only p d of the total number of the cases were detected, while 1 − p d of the total cases ow to the U R /U D states.ere are two methods to gain information on the detection rate: treating p d as a parameter and t to the historical data, or recover p d from serological evidence.However, both methods were impractical during the creation of this model.In an early to mid stage pandemic, a wide range of detection percentages are consistent with the data but leads to vastly di erent predictions (see e.g.[15]), so historical data could not provide strong evidence.Furthermore, at the time of redaction, the serological data were largely limited to speci c sub-areas such as cities and counties (see [16,17,18,19] for examples), while regionwide surveys were largely limited to a few European countries (see [20,21] for examples and discussion) and only very sparsely available around the world.us, we instead x the detection percentage to be 20% based on various reports trying to understand the extent of underdetection in countries with earlier outbreaks [3,9,10].More recently, an independent study [22] has corroborated our assumption in the United States.

Separation of Recovery and Deaths
A large focus in many governments' response to the COVID-19 pandemic is to minimize the number of deaths, and thus in DELPHI, we included a death state.In most epidemiological models that extend to include the death state (see e.g.[3,23] for COVID-19 modeling examples), the death state (D) is shown to ow from the same active infectious state as the recovery state (R), with a schematic shown in Figure 1b.However, this modeling approach would cause the mortality percentage μ to be dependent on the rates of recovery and death (details are available in the Supplementary Materials).us, to resolve such mismatch, we explicitly separated out the μ fraction of the population infected that would eventually die (I D ) from the 1 − μ fraction that would recover (I R ), as illustrated in Figure 1c.
is allows the mortality rate μ to be independent from the rates of death and recovery.e nal DELPHI model further di erentiated the I R states into hospitalized (DH R ), quarantined (DQ R ), and undetected (U R ) states to account for the di erent treatments people received, and similarly with the I D states.

Modeling E ect of Increasing Government Response
One of the key assumptions in the standard SEIR model is that the rate of infection α is constant throughout the epidemic.However, in real epidemics such as the COVID-19 crisis, the rate of infection starts decreasing as governments respond to the spread of epidemic, and induce behavior changes in societies.To account for such e ect, we model the e ect of government measures with a sigmoid-like function γ(t) (speci cally the inverse tangent).
e concave-convex nature of an arctan curve models three phases: e early, concave part of the arctan models limited changes in behavior in response to early information, while most people continue business-as-usual activities.e transition from the concave to the convex part of the curve quanti es the sharp decline in infection rate as policies go into full force and the society experiences a shock event.e la er convex part of the curve models a a ening out of the response as the government measures reach saturation, representing the diminishing marginal returns in the decline of infection rate.An illustration of such three phases is included in the Supplementary Materials.
Parameters t 0 and k control the timing of such measures and the rapidity of their penetration.is formulation allows us to model, under the same framework, a wide variety of policies that di erent governments impose, including social distancing, stay-at-home policies, quarantines, etc. is modeling captures the increasing force of intervention in the early-mid stages of the epidemic.In Section 3.3, we would show how this model can be extended to provide insights on the relaxation of measures.Furthermore, Table 1 reports the median Mean Absolute Percentage Error (MAPE) on the observed total cases and deaths in each area of the world using parameters obtained on April 28th, and evaluated on the 15 days period up until May 12th.Overall, our model seems to predict the epidemic progression relatively well in most countries with < 10% MAPE on reported cases, and < 15% MAPE on reported deaths.Additionally, the areas with the highest errors are o en those that have the fewest deaths.is stems from the fact that DELPHI-like all SEIR-based models-is not designed to perform well on areas with small populations and interactions.e e ect is further exacerbated by the choice of the metric, as MAPE inherently heavily penalizes errors on small numbers.Further detailed results for each country/region that we predict, and examples of areas with high MAPE, can be found in the Supplementary Materials.

E ect of Government Interventions
A natural application of the DELPHI model is policy evaluation.For that, we can extract the normalized ed government response curve γ(t) in each area, and utilize it to understand the impact of speci c government policies that have been implemented.In particular, we aim to understand the average e ect of each policy on γ(t) during the period of implementation.To this end, for all countries except US, we collect data from the Oxford Coronavirus Government Response Tracker for historical data on government policies [24], during the period between January 1st 2020, and May 19th 2020.For the US, we collect the policy data from the Institute for Health Metrics and Evaluation [25] during the same period.

Median MAPE Cases
Median  Table 2 shows the number of area-days that each policy was implemented around the world and its e ect.We further report the standard deviation of such estimate treating each geographical area as an independent sample.We see that each selected policy was enacted for at least hundreds of Area-Days worldwide, while the stringent stay-at-home policy was cumulatively implemented the most.In particular, we see that mass gathering restrictions generate a large reduction in infection rate, with the incremental reduction between travel and work restrictions compared to mass gathering, travel, and work restrictions is 29.9 ± 6.9%. is is further sup-ported by the large residual infection rate of 88.9 ± 4.5% when travel and work restrictions are implemented, but mass gatherings are allowed.Additionally, we observe that closing schools also generate a large reduction in the infection rate, with an incremental e ect of 17.3 ± 6.6% on top of mass gathering and other restrictions.Stay-at-home orders produced the strongest reduction in infection rate across the di erent countries, with a residual infection rate of just 25.6 ± 3.7% compared to when no measure was implemented.
If COVID-19 has an average basic reproductive number R 0 of 2.5-3 ( [26,27]), then on average, only the strongest measure (Stay-at-Home orders) are su cient to control a COVID-19 epidemic in reducing R 0 to be less than 1.

Extension: Evaluating Reopening Strategies
e DELPHI model provides insights into the e ect of government policies through the residual infection rates p i under each policy.
A natural extension is to utilize the p i in creating what-if scenarios on the e ect of li ing restrictions in di erent countries by reverting the e ect of each policy on γ(t) at the time of the hypothetical policy relaxation.Speci cally, suppose that we are considering a policy easing from policy i to j at time t c in some area.en for all times t ≥ t c , we correct the government response as follows: Di erential in policy e ect between policy i and j , ∀t ≥ t c .
Essentially, we apply a correction term that is proportional to the fractional di erence in policy e ect between policy i and j (which is p j − p i > 0 as it is an easing).e multiplicative factor min 2−γ(t c ) 1−p i , γ(t c ) We observe di erent levels of risk for the same re-opening strategies across di erent countries.For example, Figure 3c predicts that loosening measures in Brazil on June 16th would result in a second wave of infections with up to 6.8 million additional cases by July 15th, while even a stay-at-home order would lead to almost 1.9 million additional cases.Such alarming numbers can be understood through Figure 3d where we compute a rolling average of the weekly incidence of cases per 100K people.We can see that Brazil is still on a steep ascending curve, and that any kind of loosening could be catastrophic.Such behaviour stands in sharp contrast with France's situation.Figure 3b demonstrates that the peak has long passed in France and the epidemic has mostly died out.us, as we can see in Figure 3a, loosening policies (like France has already started doing) is likely to only minimally a ect the number of infections.
To further understand the disparate impact of the policies across countries, we made predictions for the situation around the world assuming a policy that involves mass gathering, travel, and work restrictions was universally implemented on 06/16.Figure 4a shows three clusters of countries for July 15th: • Countries with a large number of cumulative cases, but that are in a late stage of the pandemic, with relatively few new cases, mainly in Western and Northern Europe (e.g. the United Kingdom, Italy, France and Finland).
• Countries where the pandemic has had a large impact with a large number of cumulative cases, and where the situation will still be worsening at an alarming rate.ese include the United States, India and Brazil.A close-up of these countries is presented in Figure 4b, where we see that DELPHI predicts Brazil would be severely hit by July, with up to 8% of the entire population infected, if the hypothetical policy above is implemented.
is suggests that in these countries, such hypothetical policy could be inadequate for controlling the epidemic, and a stronger policy (such as Stay-at-Home orders) is needed.

Limitations
One fundamental limitation of this analysis is its observational nature.us, despite the exible parameters in DELPHI accounting for many state-dependent e ects, there are many other potential confounders and second-order e ects that could a ect this analysis.For example, one e ect that is not considered in DELPHI is a time-varying mortality rate caused by changing treatment procedures designed to best help COVID-19.Including such e ect could  sharpen the analysis further, though at the expense of increased ing di culty and data requirements.
is analysis also assumes, in analyzing government interventions, that the same nominal policy (e.g.Mass gathering restrictions) could be compared across countries.In reality, di erent countries have implemented variants (though largely similar) of restrictions under the same name, and this could further impact the validity of the analysis.
In the reopening analysis, we have assumed that the e ect of government interventions imposed at the start of the epidemic is indicative of the e ect when it is removed.is is potentially a ected by a permanent change in social behavior during the epidemic.For example, if a signi cant portion of the population adapts social distancing measure even a er the o cial restrictions are li ed, this could lead to a smaller resurgence of infections than what is predicted in the analysis.

Conclusions
We introduced DELPHI, a novel epidemiological model that extended SEIR to include many realistic e ects critical in this pandemic.DELPHI was able to accurately predict the spread of COVID-19 in many countries, and aid planning for many organizations worldwide.Furthermore, the explicit modeling on government intervention allowed us to understand the e ect of government interventions, and help inform how societies could reopen.

Figure 1a depicts a
Figure 1a depicts a ow representation of the model, where each arrow represents how individuals can ow between di erent states.e underlying di erential equations are governed by 11 explicit parameters which are shown on the appropriate arrows in Figure 1a and

3. 1 .
Forecasting Results DELPHI was created in early April and has been continuously updated to re ect new observed data.Figures 2a and 2b show our projections of the number of cases in Russia and the United Kingdom made on three di erent dates, and compare them against historical observations.ey suggest that DELPHI achieves strong predictive performance, as the model has been (a) United Kingdom (b) Russia

Figure 2 :
Figure 2: Cumulative number of cases in the UK (a) and Russia (b) according to our projections made at di erent points in time, against actual observations.Note there predicted curves largely overlap with the actual curve.consistently predicting, with high accuracy, the overall spread of the disease for several weeks.Notably, DELPHI was able to anticipate, as early as April 17th, the dynamics of the pandemic in the United Kingdom (resp.Russia) up to May 12th.At a time when 100-110K (resp.30-35K) cases were reported, the model was predicting 220-230K (resp.225-235K) cases by May 12th-a prediction that became accurate a month later.
di erence so that the resulting γ (t c ) is constrained within the initial range [0, 2].en, we would replace γ(t) with γ (t) in the DELPHI model to forecast the epidemic under the updated policy.Using this correction factor, we predict what would happen in di erent areas under various future policies.Figure3shows results for France and Brazil respectively, under policy change implemented on June 16th (four weeks a er the last historical value on May 19th).Further results for other countries are contained in the Supplementary Material.

Figure 3 :
Figure 3: Forecasts of total detected cases and weekly incidence per 100K for France and Brazil under various policies (a) Weekly Incidence of Cases (per 100K) in the rst half of July against fraction of population infected for multiple countries (b) Predictions for total cumulative cases (normalized by the population) vs new cases (per 100K) for countries which are predicted to be highly impacted and still worsening at an alarming rate by July 15th

Figure 4 :
Figure 4: World Predictions for Early July under Mass Gathering, Travel and Work Restrictions

Table 1 :
Median country-level Mean Absolute Percentage Error (MAPE) of the predicted number of cases and deaths in each continent (projections made using data up to 04/27 for the period from 04/28 to 05/12).
At each point in time, we categorize the government intervention data based on whether they restrict mass gatherings, schools, travel and work activities.We group travel restrictions and work restrictions together due to their tendency to be implemented simultaneously.From January 1st to May 19th, the 167 areas in total implemented 5 combinations of such interventions.Speci cally, these are: (1) No measure; (2) Restrict travel and work only; (3) Restrict mass gatherings , travel and work; (4) Restrict mass gatherings, schools, travel and work; and (5) Stay-at-Home.e detailed correspondence between raw policy data and our categories are contained in the Supplementary Materials.Other potentially feasible combinations were not implemented by the countries.en for each policy category i = 1, • • • , 5, we extract the average value of γ(t), γi , across all time periods and areas for which policy i was implemented.en we calculate the residual fraction of infection rate under policy i, p i , compared to the baseline policy of no

Table 2 :
Implementation Length and E ect of each policy category as implemented across the world.