Disease-dependent interaction policies to support health and economic outcomes during the COVID-19 epidemic

Lockdowns and stay-at-home orders have partially mitigated the spread of Covid-19. However, the indiscriminate nature of mitigation — applying to all individuals irrespective of disease status — has come with substantial socioeconomic costs. Here, we explore how to leverage the increasing reliability and scale of both molecular and serological tests to balance transmission risks with economic costs involved in responding to Covid-19 epidemics. First, we introduce an optimal control approach that identifies personalized interaction rates according to an individual’s test status; such that infected individuals isolate, recovered individuals can elevate their interactions, and activity of susceptible individuals varies over time. Critically, the extent to which susceptible individuals can return to work depends strongly on isolation efficiency. As we show, optimal control policies can yield mitigation policies with similar infection rates to total shutdown but lower socioeconomic costs. However, optimal control policies can be fragile given mis-specification of parameters or mis-estimation of the current disease state. Hence, we leverage insights from the optimal control solutions and propose a feedback control approach based on monitoring of the epidemic state. We utilize genetic algorithms to identify a ‘switching’ policy such that susceptible individuals (both PCR and serological test negative) return to work after lockdowns insofar as recovered fraction is much higher than the circulating infected prevalence. This feedback control policy exhibits similar performance results to optimal control, but with greater robustness to uncertainty. Overall, our analysis shows that test-driven improvements in isolation efficiency of infectious individuals can inform disease-dependent interaction policies that mitigate transmission while enhancing the return of individuals to pre-pandemic economic activity.


A. Model and parameters
The epidemic model analyzed here is an 'SEIR' model, including SEIR dynamics [1], susceptibles S, exposed E, infectious I, recovered R , with the total number of fatalities denoted by D. The force of infection is the contact rate of infectious individuals multiplied by the probability that the interaction is with a susceptible person multiplied by the probability that the event leads to an infection. Let c S , c E , c I and c R equal the contact rate of S, E, I, and R individuals respectively. The force of infection is where η I is the measure of disease transmission effectiveness from I class to S class [2]. Suppose all the contact rates are equal to a baseline contact rate c B . Assuming relatively few fatalities per-capita, then the total population is approximately constant at N , such that the force of infection can be approximated as F ≈ η I c B (I/N ), which recovers the standard SEIR model by defining β = ηc B as infection rate. In the generalized case, the model is given by the following system of nonlinear differential equations where F (S, E, I, R; c) is the force of infection given in Eq. 1, and T inf is the infectious period, T inc is the incubation period, µ is the case fatality ratio for infected individuals, see SEIR model schematic in Fig. S1.
FIG. S1: SEIR Model Schematic. The interactions between susceptible and infected individuals lead to newly exposed cases. These interactions are modeled by the force of infection F (S, E, I, R; c). The exposed individuals undergo an incubation period (Tinc) before the onset of infectiousness. Infectious individuals will either recover (and develop protective immunity) or die after an infectious period (Tinf ), see model equations in Eq. 2.
At the start of an outbreak (t = 0), the contact rate of each individual is c B . The basic reproduction number R 0 of this compartmental model is The assumed model parameters used in the model are shown in table S3.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https: //doi.org/10.1101//doi.org/10. /2020 Initial conditions and contact rate The model assumes an initial total population of 1, 000, 000 (one million) denoted as N 0 . An initial outbreak is seeded in this population given one infected individual. The simulation is run for two months (60 days) with fixed baseline contact rate (all c's are equal to c B ) -we use this time point (which we denote time t 0 in our simulations) as the time at which intervention policies might be applied. The individual behaviors (or activities) are quantified by contact rates, c S (t), c E (t), c I (t) and c R (t), thus the control of contact rates can be mapped to intervention strategies. For examples, when c S goes down that means shelter-in-place, when c E or c I goes down that can mean quarantine and isolation. When there is no intervention during the epidemic, the population dynamics of SEIR model is shown in Fig. S2. We denote c(t) = (c S (t), c E (t), c I (t), c R (t)) T as the contact rate vector at time t, where (·) T denotes a matrix transpose. The effective reproduction number R eff (t) is where

B. Optimal control formulation
Here, we aim to optimally deploy public health control strategies (via control of the contact rates) that minimizes deaths during the outbreak while keeping the socioeconomic costs low. The baseline average of contacts at time t, Q(t), equals c B N (t), where c B is the baseline contact rate and N (t) = S(t) + E(t) + I(t) + R(t) is the total number of alive individuals The socioeconomic costs may result from two consequences of contact rate interventions: (1) loss of essential connections; (2) shifting of roles relative to baseline, as quantified by variation in contact rates by testing status. Given the intervention policy c(t) at time t, we quantify the socioeconomic costs as E 1 + E 2 , where E 1 and E 2 may take form of . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 1, 2020. . https: //doi.org/10.1101//doi.org/10. /2020 where W 1 ≥ 0, W 2 = diag(w S , w E , w I , w R ) ≻ 0 are weight parameters, and K is a constant shape parameter for the exponential function. In the absence of any strategy, there are few deaths. This would mean that N (t) ≈ N 0 . For the sake of simplicity, E 1 is approximated as W 1 exp (K (c B N 0 − Q(t)) /c B N 0 ). For every t, we restrict the contact rate vector c(t) ∈ [c min , c max ] 4 , where c min and c max are the minimum contact rate and maximum contact rate. The set C = [c min , c max ] 4 is a convex and compact set in R 4 . We define the space of admissible controls, denoted by A, as the set of Lebesgue-measurable functions c : [t 0 , t f ] −→ C. The (dimensionless) cost functional is written in the following Bolza form, where W I ≥ 0 and W D ≥ 0 are weight regulators. The optimal control problem is The weight parameters used in Eq. 6 are shown in table S4, these weight parameters are set to balance the scale of different cost components in Eq. 6. However, frequent and continuous updates bring in practical issues in terms of (1) establishing how subtle changes in contact rates actually translate into applicable changes in individual behaviors (i.e., a continuous strategy can not be practically implemented), and (2) individual fatigue (and reduced adherence to proposed measures) due to frequent behavioral adjustments (i.e., frequent update is also impractical). To address these concerns, we need to modify the space of controls A.
To begin with, we fix the full time horizon of control as [t 0 , t f ]. For example, t 0 is fixed to 60 days after outbreak (see Sect. I A) and t f to 360 days after outbreak, where we set the timing of the outbreak to 0. We only consider an intervention policy which is updated a finite number of times in the interval t 0 , t f ]. These epochs are denoted by In most cases, the exposed individuals are unaware of their infection status, so it's impractical to control their behaviors separately from the susceptible individuals. Hence, we view them as susceptibles for the purpose of policy setting and set c E (t) = c S (t) ∀t ∈ [t 0 , t f ]. In doing so, the contact rate vector c(t) actually is a three dimensional vector consisting of c S (t), c I (t) and c R (t).

C. Optimization algorithm
We follow Polak [3] who developed a general framework for computational optimal control based on optimization on Hilbert spaces. We first discuss the conceptual algorithm in an abstract setting of infinite-dimensional spaces, and then provide approximation details in the sequel.
Let L n 2 [t 0 , t f ] denote the space of equivalence classes of square integrable functions from [t 0 , t f ] into R n (n = 3 in our case). We use the notation · and ·, · for the norm and scalar product in Euclidean space (e.g. R n ), and use . H and ., . H for the norm and scalar product in a Hilbert space H. For example, given u, v ∈ L n 2 [t 0 , t f ] (a Hilbert space), the norm . 2 and scalar product ., . 2 are: , v(t) dt. In the optimal control problem under consideration, we define the state variable of the system, x ∈ R 5 , by x = (S, E, I, R, D) T . We also define the control variable, c ∈ R 3 , by c = (c S , c I , c R ) T . The dynamic equation of the system, Eq. 2, can be rewritten in a compact form: where dot represents differentiation with respect to time t, and f (x(t), c(t)) is the right hand side (RHS) of equations 2. The cost functional defined by Eq. 6 can be written succintly as . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The principal elements of most gradient-descent algorithms (including the one described here) are the descent direction and the stepsize. For our problem, the descent direction is based on −∇J (c(t)); in fact, it is the projection of c(t) − ∇J (c(t)) into the constraint-set C = [c min , c max ] 3 as defined in Sect. I B. The stepsize that we use is the Armijo stepsize (see [3]) described in the sequel, which gives an approximate line minimization of J in the computed descent direction from the control c(t).
Consider first the conceptual algorithm. In its formal presentation by Algorithm 1, below, we use the term "compute" in a conceptual sense, since the "computations" refer to elements in the Hilbert space L n 2 [t 0 , t f ]. Also, we use the notation P A to mean the projection from L n 2 [t 0 , t f ] into the space A.
Given a control iteration c (i) ∈ A, compute from it the next iteration, c (i+1) ∈ A, as follows.
The main modification of Algorithm 1 towards implementation is in numerical solutions of the state equation and costate equation. Furthermore, the time-interval [t 0 , t f ] has to be discretized by a suitable grid in order to adequately represent various time-dependent functions such as c (i) (t) and ∇J (c(t)). All of this can be achieved by a common grid, Step 3 of Algorithm 1 can be computed as follows: For a given grid-point . This can be done by a simple co-ordinate projection since C is a box. Denote the result of this pointwise projection by P C (c (i) (t j ) − β ℓ ∇J (c (i) )(t j )) ∈ C, and observe that the function comprised of a zero-order or first-order interpolation of these points can serve to approximate the functional projection These modifications of the conceptual algorithm give an implementable version for it. The resulting implementable algorithm falls in Polak's framework of numerical optimal control [3] which includes results pertaining to asymptotic convergence.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint As noted, the control c(t) must be piecewise constant and maintain constant values for substantial periods. At the same time, the state trajectory typically is continuous and not piecewise constant. Therefore, it is natural to maintain the present framework of continuous-time optimal control and not to consider the problem in the setting of discrete-time systems. To do that, we have to consider a form of projection of the control functions, c(t), into the space of piecewise-constant functions. We next describe the specific way we did that in our simulations.
Consider an increasing set of points {t 0 = t 0 < t 1 < t 2 < ... < t M = t f }, and the corresponding partition of the time-interval [t 0 , t f ) by the subintervals [t m−1 , t m ), m = 1, . . . , M . Denote the interval [t m−1 , t m ), m = 1, . . . , M , by ∆ m ; right-close the last interval to be ∆ M = [t M −1 , t M ], and denote the partition by O. Let ψ m (t), m = 1, . . . , M denote the indicator function of the subinterval ψ m , namely ψ m (t) = 1 for t ∈ ψ m and ψ m (t) = 0 otherwise. A continuous-time signal can be approximated over an interval by a constant equal to its average value over that interval. Given a continuous-time policy c : [t 0 , t f ] → R 3 , the piecewise constant reconstruction is here θ m ∈ R 3 . Let V O denotes the operator comprising this piecewise-constant reconstruction procedure. Then Eq. 13 can be written as c = V O (c). Now we modify the algorithm by replacing the projection operator P A in Step 3 by the operator In summary, the implementable algorithm that we use consists of the following modification of Algorithm 1: 1. Compute x(t) and λ(t) by numerical integrations using the forward-Euler method.
2. Compute ∇J (c(t)) only at the points t j on the grid G 0 .
A flow chart of the algorithm is depicted in Fig. S3.

Simulation details
The model assumes an initial total population of 1, 000, 000 (one million) denoted as N 0 . An initial outbreak is seeded in this population given one infected individual. The simulation is run for two months (60 days) with fixed baseline contact rate (all c's are equal to c B ) -we use this time point (which we denote time t 0 in our simulations) as the time at which intervention policies might be applied: we have c(t) = c B ∀t ∈ [0, t 0 ], and we set the initial state to For the algorithm, we set α = 0.1 and β = 0.5, the grid's time increments to 0.05, ∆ m = 30 days, t 0 = 60 and t f = 360 days. For the uncontrolled part of the simulation we take c(t) = c B for all t ∈ [0, t 0 ]. The term Θ(i) := |J (c (i+1) ) − J (c (i) )|/J (c (i) ) acts as a convergence indicator for stopping the algorithm. The algorithm is terminated whenever Θ(i) ≤ 10 −6 .

D. Optimised time-dependent contact rate
One way of estimating the economic impact is measuring the fraction of working-days recovered, i.e., how many days are people working. We assume that if a person has a contact rate less than the base contact rate (c B ), it implies a proportional reduction of in-person contacts that have a direct economic benefit. For example, for a susceptible individual, the working fraction at day t can be computed as c * where c B is the baseline contacting rate. This ensures that contact rates above the baseline does not correspond to an increase in hours worked. For a given contacting rate policy c = (c S , c E , c I , c R ) T , the working days restored in the period [t 0 , t f ] is . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint FIG. S3: Flowchart of optimal contact rate computation. The contact rates are projected into the allowed subspace, and then fed to the system dynamics to generate the state and costate trajectory. Using the cost function provided and the state and costate trajectories, a projected gradient descent algorithm is employed to update the contact rates. If the termination condition is satisfied, these new contact rates are accepted as the optimal contact rates. Otherwise, the contact rates are replaced with the computed values and the loop is rerun.
where c * k (t) = min{1, c k (t)/c B }, k ∈ {S, E, I, R}, is the working fraction of an individual in subpopulation class k at day t. When there is no policy intervention, the contacting rate of each individual is c B for all time, the working days will be maximized, which can be written as Then, the fraction of working days restored can be written as Economic costs are proportional to 1 − Ψ(c); such that economic impacts of policies are minimized as Ψ(c) approaches 1 Varying the relative importance of the illness-related cost vs. socioeconomic costs The relative importance given to the socioeconomic impact of the epidemic and the cost associated with deaths and infection spread is critical to design intervention policies. From Eq. 6, we note that the cost function J can be decomposed into the two parts, socioeconomic costs (J E ) and costs of infectious diseases (J I ), J = J E + J I , where In order to explore how to manage and control disease outbreak with low socioeconomic costs, we need to balance the importance of these two seemingly conflicting objectives. To this end, we define a new cost function where the relative importance of J E and J I can be parameterized. With this parameter, optimal policy decisions can be made based on the relative importance assigned to the socioeconomic effect and public health impacts. Define the relative weight ratio as ξ and the altered cost function is J (c; ξ) = J E + ξJ I . As ξ increases, the relative importance of costs associated with death and spread of infection are increased vis-a-vis the socioeconomic impact and vice versa. The optimal intervention policies with different relative importance ratios (ξ = 10 −4 , 1, and 10 4 ) vary with isolation efficiencies (25%, 50%, 75%) are shown in Figs. S4, S5, S6.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10. 1101/2020 For example of balance case (ξ = 1) in the case of 50% isolation efficiency, Fig. S5 (middle) shows that the way of managing the epidemic while considering the socioeconomic impact is to isolate all the infected cases and enhance the interactions of recovered cases (i.e., shielding, [6]), the susceptible cases are stay home first and slowly go back to normal towards the end of intervention period. Using the optimized intervention policy (i.e., optimal contact rate), the reduction in final deaths (D(t f )) is substantial (∼ 75%). The fraction of working-days recovered is about ∼ 66%. The results suggest advantages of isolation of infected cases and shielding enhancement of recovered cases.

E. Inferring state-dependent heuristic policies from optimal control solutions
In the previous sections, an optimal policy based on contact rates of different sub-populations was derived. However, as is often the case with optimal control solutions, these policies are open-loop, that is, once computed, they are prescribed to the system without feedback. Therefore, open-loop policies can have large sensitivities to modeling errors and signal-delays in the loop, especially if the system is unstable, as is the case with dynamic models of epidemics near the disease-free equilibrium at the onset of an outbreak. To address this, we design a feedback law for optimizing a 'reasonable' performance metric which, though different from the one of the optimal control problem, gives similar respective measures of two important performance metrics: total mortality (D(t f )), and working fractions (Ψ). In Sect. I F we verify this point by comparative simulations of the optimal control solution and the feedback control system, and not surprisingly, the sensitivity of the former with respect to a one-cycle (30-days) delay is much larger than that of the latter.
The optimal control solutions reveal a general structure that guides us in the choice of the feedback control law. The patterns observed in the computed optimal-policy solutions (see Figs. S4, S5, S6) suggest that the contact rates for the infected cases should be the smallest possible while the recovered population should be used as shields. Further, they display switch overs between the low and high contact rates for the susceptible populations, and these tend to occur towards the end of the lock-down period in a way that depends on the numbers of infected and recovered individuals.
To find the state feedback policy, the I-R (Infected-Recovered) phase plane is divided into 2 parts -one where the susceptibles need to isolate and another where they can get back to work. If the number of infected cases is high, the susceptible population should isolate to reduce the infection spread. For simplicity, the I-R plane is divided by a line, though better classification boundaries may exist. On one side of the line, the contact rate for the susceptible population is set to the minimum while on the other side, it is equal to the baseline (corresponding to opening the system).
The optimal slope of the dividing line is found via a genetic algorithm [7] using matlab's built-in optimization function ga, with the maximum generation number (set to 30) serving as the stopping criterion [8]. The intercept of the line is fixed to a small negative number (10 −3 ) as the results are not affected by the intercept. The slope parameter is called θ, and the following cost function is minimized: where D(t f )/N 0 gives the normalized casualties, Ψ(c) is the fraction of working-days restored and w is a weight to bring the two terms on the same scale (see table S4). The output of the genetic algorithm is θ * , the slope of the optimal dividing line. We assume that the divided plane on the side of the line with the origin, i.e., zero infected and recovered cases, is the side where the susceptibles are free to work, while the divided plane on the other side corresponds to lockdown. The simulation is run for different isolation and shielding levels and the compiled results are shown in Fig. S9. The isolation efficiencies vary from 25% to 75% while the shielding ratio varies from 2 to 5. We note that as the efficiency of isolation increases, the susceptible population can be let out for a longer duration towards the end of the epidemic. This is because the reduced contact rate of the infected population reduces the spread of infection. If the isolation efficiency is 65% or more, there is no need for the susceptible population to be in lockdown. Similar observations can be made for the shielding ratio. While the effects of it are not as dramatic, an increase in shielding ratio decreases the force of infection, resulting in a a lower time necessary for lockdown.

F. Sensitivity of optimal control and feedback control to mis-timed implementation of policies
We compute an optimal control for a system with a delayed input. We do not assume any modeling uncertainty, and attribute the perturbation only to the delayed application of the computed control. Suppose that an optimal control is computed for a scenario where its application is slated to commence 30 days after the outbreak of an epidemic, but the computed input control is uniformly delayed by 30 days. Then, by simulating the epidemic with a delayed policy, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020.  S1: Comparison between optimal control approach for contact policy with and without delay. The comparisons are made for isolation efficiencies of 25%, 50% and 75%, with performance metrics of total deaths and working fraction. The total death is significantly higher for the system with delay when the isolation efficiency is 25% and 50%, suggesting poor robustness of the computed optimal control to input-delay.  we can compare total deaths and working fraction for the delayed system with the results obtained without delay and with the same initial conditions. The results, comprising total death and working fraction, are presented in table S1 for various isolation efficiencies. These findings indicate higher death rates as well as higher working fractions for the delayed system except for the third, and highest efficiency of isolation, where the respective performance measures are similar. For the case of 50% isolation, the death count is more than doubled due to delay. This significant difference in performance metrics demonstrates the shortcomings of implementing a policy based on optimal control if the system is not ideal.
We also tested the sensitivities of the total death and working fraction to 30-day delays. The results, shown in table S2 below, indicate low sensitivity compared to the observed results for the optimal control solution. We compared these performance metrics obtained from applications of the optimal control vs. feedback control to the system without delay. The results, summarized in tables S1 and S2 respectively, indicate similar performance with respect to the performance metrics used. Thus, feedback control is nearly as good as optimal control if there is no delay; while having more robust features in the event that there is a delay in application.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020.    . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020: Population dynamics in a SEIR model with controlled contact rate (25% isolation efficiency). (Top) The relative importance ratio (ξ) is set as 10 −4 , the policy (i.e., contact rate) is deployed to primarily minimize socioeconomic costs (JE). The socioeconomic costs are maintained at low level for all the policy intervention period (see the right panel after 60 days). Final state of the system in terms of population fraction is: D(tf ) ≈ 1%, S(tf ) ≈ 4% and R(tf ) ≈ 95%, where tf = 360 days. The fraction of working-days recovered is about 100%. (Middle) The relative importance ratio (ξ) is set as 1, the policy (i.e., contact rate) is deployed to minimize infectious diseases costs (JI ) while keeping the socioeconomic impacts low. The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.6%, S(tf ) ≈ 38% and R(tf ) ≈ 61%, where tf = 360 days. The fraction of working-days recovered is about 91%. (Bottom) The relative importance ratio (ξ) is set as 10 4 , the policy (i.e., contact rate) is deployed to primarily minimize infectious diseases costs (JI ). The controlled contact rates of susceptible cases (cS) and infected cases (cI ) are overlapped at minimum contact rate boundary (cmin). The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.6%, S(tf ) ≈ 38% and R(tf ) ≈ 61%, where tf = 360 days. The fraction of working-days recovered is about 88%. Here, we set cmin = (3/4)cB (i.e., up to 25% isolation efficiency) and cmax = 2cB (i.e., up to twice enhanced interactions).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020: Population dynamics in a SEIR model with controlled contact rate (50% isolation efficiency). (Top) The relative importance ratio (ξ) is set as 10 −4 , the policy (i.e., contact rate) is deployed to primarily minimize socioeconomic costs (JE). The socioeconomic costs are maintained at low level for all the policy intervention period (see the right panel after 60 days). Final state of the system in terms of population fraction is: D(tf ) ≈ 1%, S(tf ) ≈ 3% and R(tf ) ≈ 96%, where tf = 360 days. The fraction of working-days recovered is about 100%. (Middle) The relative importance ratio (ξ) is set as 1, the policy (i.e., contact rate) is deployed to minimize infectious diseases costs (JI ) while keeping the socioeconomic impacts low. For the first half intervention period (from 60 days to 240 days), the controlled contact rates of susceptible cases (cS) and infected cases (cI ) are overlapped at minimum contact rate boundary (cmin). There are huge socioeconomic impacts at beginning of policy intervention period (due to the rapid contact rate reduction of susceptible cases and infected cases, e.g. isolation), the socioeconomic impacts become lower later because of shielding (i.e., contact rate of recovered cases is enhanced). Moreover, at the last final 3 months, the socioeconomic costs drop more since the susceptible cases are back to normal (contact rate cS is increasing). The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.25%, S(tf ) ≈ 75% and R(tf ) ≈ 25%, where tf = 360 days. The fraction of working-days recovered is about 66%. (Bottom) The relative importance ratio (ξ) is set as 10 4 , the policy (i.e., contact rate) is deployed to primarily minimize infectious diseases costs (JI ). The controlled contact rates of susceptible cases (cS) and infected cases (cI ) are overlapped at minimum contact rate boundary (cmin) for all time. Similar to the intermediate case (ξ = 1), large socioeconomic impacts are observed at beginning of policy intervention period. The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.25%, S(tf ) ≈ 75% and R(tf ) ≈ 25%, where tf = 360 days. The fraction of working-days recovered is about 59%. Here, we set cmin = cB/2 (i.e., up to 50% isolation efficiency) and cmax = 2cB (i.e., up to twice enhanced interactions).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint FIG. S6: Population dynamics in a SEIR model with controlled contact rate (75% isolation efficiency). (Top) The relative importance ratio (ξ) is set as 10 −4 , the policy (i.e., contact rate) is deployed to primarily minimize socioeconomic costs (JE). The socioeconomic costs are maintained at low level for all the policy intervention period (see the right panel after 60 days). Final state of the system in terms of population fraction is: D(tf ) ≈ 1%, S(tf ) ≈ 4% and R(tf ) ≈ 95%, where tf = 360 days. The fraction of working-days recovered is about 100%. (Middle) The relative importance ratio (ξ) is set as 1, the policy (i.e., contact rate) is deployed to minimize infectious diseases costs (JI ) while keeping the socioeconomic impacts low. Infected individuals are locked down at home for all intervention period while susceptible and recovered individuals are free to go out. The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.04%, S(tf ) ≈ 96% and R(tf ) ≈ 4%, where tf = 360 days. The fraction of working-days recovered is about 100%. (Bottom) The relative importance ratio (ξ) is set as 10 4 , the policy (i.e., contact rate) is deployed to primarily minimize infectious diseases costs (JI ). The controlled contact rates of susceptible cases (cS) and infected cases (cI ) are overlapped at minimum contact rate boundary (cmin) at beginning, then the susceptibles are free to move later. The Final state of the system in terms of population fraction is: D(tf ) ≈ 0.03%, S(tf ) ≈ 97% and R(tf ) ≈ 3%, where tf = 360 days. The fraction of working-days recovered is about 50%. Here, we set cmin = cB/4 (i.e., up to 75% isolation efficiency) and cmax = 2cB (i.e., up to twice enhanced interactions).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint FIG. S7: Population dynamics in a SEIR model with mis-timed control policy for various isolation efficiencies: (Top) 25% isolation efficiency; (Middle) 50% isolation efficiency and (Bottom) 75% isolation efficiency. The relative importance (ξ) is 1 for all the cases. The optimal control policy is computed for a system one month after an outbreak. Then, we enforce the optimal control policy 30 days later, i.e., at the end of 60 days after the start of the outbreak. The contact rate interventions start at 60 days. The total deaths and working fraction for the delayed system are presented in table S1 for various isolation efficiencies. The optimal control policies and their corresponding dynamics without mistiming are shown in the middle row of Figs. S4, S5, S6.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint FIG. S8: Flowchart for computing state dependent heuristic policy. The line dividing the I-R plane is parameterized and for a particular parameter set, the contact rate is determined solely from the current infected and recovered cases. Starting with an initial parameter guess, the contact rate is computed from the current I and R values, giving the system's state at the next time instant. After the full state trajectory is computed, the associated cost is calculated and a genetic update step is taken which identifies the next parameters. If the termination condition is satisfied, these parameters are passed on as the optimal parameters which can be used to compute the optimal contact policy. If the termination condition is not satisfied, the line parameters are updated and the loop is run again.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint FIG. S9: Heuristic state feedback intervention policies varying with isolation efficiency and shielding level on I − R phase plane. The isolation efficiency for the first row is 25% and each row increases the efficiency by 5%, which gives the efficiency of the last row as 75%. Next, the 4 columns correspond to shielding levels of 2, 3, 4 and 5 from left to right. In each plot, the grey region specifies the area of the I − R plane where a lockdown policy is enforced for the susceptible population and the white region gives the area where the lockdown is lifted i.e. the system is 'open'. The phase trajectory (I(t), R(t)), i.e., the black line, crosses from the lockdown to open region when it intersects the dividing line and this crossover happens at most once. The final state of the system is marked by a green diamond.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted September 1, 2020. . https://doi.org/10.1101/2020.08.24.20180752 doi: medRxiv preprint