## Abstract

An important task during the current Covid-19 pandemic is to predict the remainder of the epidemic, both without preventive measures and with. In the current paper we address this question using a simple estimation-prediction method. The input is the observed initial doubling time and a known value of *R*_{0}. The simple General epidemic model is then fitted, and time calibration to calendar time is done using the observed number of case fatalities, together with estimates of the time between infection to death and the infection fatality risk. Finally, predictions are made assuming no change of behaviour, as well as for the situation where preventive measures are put in place at one specific time-point. The overall effect of the preventive measures is assumed to be known, or else estimated from the observed increased doubling time after preventive measures are put in place. The predictions are highly sensitive to the doubling times without and with preventive measures, sensitive to *R*_{0}, but less sensitive to the estimates used for time-calibration: observed number of case fatalities, typical time between infection and death, and the infection fatality risk. The method is applied to the urban area of Stockholm, and predictions show that the peak of infections appear in mid-April and infections start settling in May.

## Introduction

The covid-19 is currently spreading at rapid pace in most countries world-wide, resulting in scary number of case fatalities and healthcare systems being overwhelmed. In the present paper we present a simple method for predicting the progress of the epidemic in a community which is fairly well-mixed, an urban region being a typical example.

We focus on predicting of the main phase of the epidemic, and not on the very beginning or end (when very few are infectious implying that randomness plays a crucial role). For this reason we use a deterministic epidemic model. More specifically we use the General epidemic model [4] because it allows for more heterogeneity in how many infectious contacts different infected have, and when in time these happen, in comparison to generation-time models such as the Reed-Frost model [4]. The model hence allows for heterogeneity in terms of infections, but all individuals are equally susceptible and mix homogeneously.

As input to our prediction model we use the observed doubling time *d* during the initial (random) phase of the epidemic and the basic reproduction number *R*_{0}. Estimation of *d* is straightforward (e.g. [1]) and many estimates of *R*_{0} for covid-19 can be found in the literature, most of them lying in the range 2.2-2.8 (e.g. [9], [8]). We start by describing the different steps in the methodology, including also how to time-calibrate the model to calendar time, and the apply our method by predicting the Covid-19 outbreak in the Stockholm region.

### Methods

We now present our estimation-prediction procedure. We start by predicting the behaviour of the outbreak based on knowing the basic reproduction number *R*_{0} and the doubling time *d* in the initial phase of the epidemic. We then time-calibrate the model in the sense of estimating where in the epidemic outbreak we are on a given date *t*_{1} by using cumulative case fatalities and knowledge of the typical time between infection and death *s*_{D}. Finally, we predict the time calibrated epidemic outbreak under the assumption that a set of preventive measures are put in place a given date *t*_{p} during the early phase of the epidemic outbreak. This is done either assuming the overall reduction in spreading is known, or else by assuming that the increased doubling time after prevention, *d*_{p}, is observed.

### Prediction based on *R*_{0} and initial doubling time *d*

As input for our prediction we use the observed doubling time *d* during the initial growth rate and the basic reproduction number *R*_{0}, both valid before any preventive measures were put in place. The doubling time can for example be estimated from the empirical doubling time of case fatalities as described in [1]. The doubling time relates to the exponential growth rate *r* of the epidemic by the relation *e*^{rd} = 2, so *r* = ln(2)*/d*. The basic reproduction number can be estimated using other data sources, for example using contact tracing giving information about the (random) generation time *G* and its mean *g* = *E*(*G*) (this is however not trivial, cf. [2], [12]). Here we assume the initial doubling time *d* and the basic reproduction number *R*_{0} to be known.

We assume the epidemic progresses according to the General epidemic model (GEM) [4]. For this model, the mean generation time relates to *r* and *R*_{0} by the relation [11]

We hence have that the mean generation time equals *g* = (*R*_{0} − 1)*/r*. The GEM has two model parameters: the rate of infectious contact *λ* that infectious individuals have, and the rate of recovery *γ* (e.g. [4]). The basic reproduction number for GEM equals *λ/γ* (the rate of infectious contacts multiplied by the mean duration of the infectious period 1*/γ*. The rate of infecting someone *s* time units after infection is *λe*^{−γs}, and since the mean number of contacts equals *R*_{0} = *λ/γ*, it follows that *G* has density *f*_{G}(*s*) = *λe*^{−γs}*/*(*λ/γ*) = *γe*^{−γs}. It hence follows that *G* is exponential with parameter *γ* with mean *E*(*G*) = 1*/γ*.

From our input data, *d* and *R*_{0}, we hence conclude that *g* = (*R*_{0} − 1)*d/* ln(2) and hence that *γ* = ln(2)*/*(*d*(*R*_{0} − 1)).

The contact rate *λ* can also be obtained from our input values *R*_{0} and *d* from the fact that *R*_{0} = *λ/γ* for the GEM [4]. We immediately have that *λ* = *R*_{0}*γ* = (*R*_{0}*/*(*R*_{0} − 1))(ln(2)*/d*). Once we have calibrated our parameters *λ* and *γ* to *R*_{0} and the observed initial doubling time *d* we simply use the GEM to predict the epidemic (assuming no preventive measure are put in place). Suppose the community size is *N* (assumed large) and that we start with a small fraction (but fairly large number) infected and the rest being susceptible, for example *i*_{0} = 50 *s*_{0} = *N* − 50, where *s*_{t} and *i*_{t} denote the number of susceptible and the number of infectious individuals respectively (the index *t* is hence relative to the start of the epidemic and not calendar time). The transitions of the GEM are given by

The interpretation is that each infectious individual at time *t* has on average *λ* infectious contacts per day, and with probability *s*_{t}*/N* each such contact results infection. Those susceptibles who get infected move to the infectious state. The other transition is for an infectious individual to stop being infectious and recovering and becoming immune (a few also die). The number of individuals who have recovered (also including the few who die) equals *N* − *s*_{t} − *i*_{t}. This system can be iterated forward sequentially until the first time *T* when the number of infectious individuals drops below 1: *i*_{T} *≤* 1 (it never reaches exactly 0 but asymptotes to 0). The number of individuals who have been infected by *t* equals *N* − *s*_{t}.

By solving the iterative system it is easy to plot the number of infectives *i*_{t} and the number of infected *N* − *s*_{t} over time, as well as the daily incidence *s*_{t−1} − *s*_{t}.

### Time calibration based on initial case fatality data

In the previous subsection an epidemic model was fitted to the observed initial doubling time *d* and the known basic reproduction number *R*_{0}. An important remaining task is to identify where in this outbreak the epidemic is at calendar time *t*_{1} (“today”) (cf. [1]). We hence want to know which relative time *t* since the start of the epidemic that corresponds to the calendar time *t*_{1}. In order to do so we use the observed total number of case fatalities up to calendar time *t*_{1}, denoted Λ(*t*_{1}). We further need approximate knowledge of two quantities: the typical duration *s*_{D} between getting infected and dying (for those who die) and the infection fatality ratio *f* being defined as the probality that an individual who gets infected dies. For covid-10 *s*_{D} *≈* 21 days but estimates of *f* vary in the range 0.2% − 1% (e.g. [13], [12]). Fortunately it turns out that the time calibration is not too sensitive to these numbers.

Given Λ(*t*_{1}), *s*_{D} and *f* we know that the Λ(*t*_{1}) who have died by *t*_{1} were all infected by *t*_{1} − *s*_{D}. But these “to-die” infected individuals only make up a fraction *f* of all who were infected by *t*_{1} − *s*_{D}, so the number of infected individuals at calendar time *t*_{1} − *s*_{D} equals Λ(*t*_{1})*/f*. This means that we calibrate calendar time to time relative to the start of the epidemic by equating *t*_{1} − *s*_{D} to the relative value .

Given this time calibration between relative and calendar time, *t*^{*} = *t*_{1} −*s*_{D}, the estimated number of infected people at present time equals and e.g. the predicted number of infectious individuals three weeks later (calendar time *t*_{1} + 21) equals .

### Prediction with preventive measures put in place

Suppose that a set of preventive measures are put in place at some calendar time *t*_{p} still assumed to be in the early phase of the epidemic. Here we assume that this effects the rate of infectious contacts but not the (mean) generation time *g* = 1*/γ*. Most preventive measures agree with this: school closure, self-isolation, closing (or reduced activities) of restaurants, bars, cinemas. There are also some preventions which aim at reducing *g*, such as contact tracing followed by isolation, but here we restrict ourselves to preventions reducing *λ*. We assume that the new preventive measures have the overall effect of reducing *λ* by a factor *ρ*, so that the new *effective* rate of contact equals *λ*_{E} = *λ*(1 − *ρ*), and the new effective reproduction number equals *R*_{E} = (1 *ρ*) _{0}. For covid-19 there is currently no available vaccine, but for situations where there is it is also possible to include vaccination of a fraction of the community as a preventive measure. In this case, the factor *ρ* also includes effects from vaccination. If for example a fraction *v* are vaccinated with a vaccine giving perfect immunity then this results in *ρ* = *v* if this is the only preventive measure.

In applications it is close to impossible to know the overall magnitude of the preventive measures *ρ* when a set of preventive measure are put in place jointly. In [1] it is shown how to estimate *ρ* by observing the change in the doubling time once the preventive measures have influenced e.g. fatality rates. Applying Equation (3) in [1] it follows that

Above *r* is the exponential growth rate before preventive measures and *r*_{E} is the growth rate after preventive measures have affected case fatality rates, and similarly for the doubling times *d* and *d*_{E}.

The preventive measure hence induce a lower growth rate *r*_{E} = ln(2)*/d*_{E}. We assume the same mean generation time *g* = 1*/γ*, and since *r*_{E} = *λ*_{E} − *γ* it follows that the new contact rate *λ*_{E} = *r*_{E} + *γ*. The new reproduction number equals *R*_{E} = 1 + *r*_{E}*g* = 1 + *r*_{E}*/γ*. To sum up, if *R*_{0} and *d* = ln(2)*/d* are known from before preventive measures, and preventive measures are put in place on day *t*_{p} resulting in a new longer doubling time *d*_{E}, then the time calibrated prediction model should change from *λ* = (ln(2)*/d*)*R*_{0}*/*(*R*_{0} − 1) to *λ*_{E} = (ln(2)*/d*_{E}) + *γ*.

### Predicting the outbreak in Stockholm

We illustrate our methods on the Stockholm region in Sweden. Greater Stockholm urban area has around *N* = 2 million people, and the initial doubling time of cumulative case fatalities was around *d* = 3.5 days before preventive measures were put in place [7], and we assume that *R*_{0} = 2.5 this being a common estimate [9]. A number of (mainly but not exclusively) preventive measures were put in place around the date *t*_{p} = March 16. We assume the typical time between infection and death (for those who die from covid-19) equals *s*_{D} = 21 days. These preventive measures will start affecting fatality rates at *t*_{p} + *s*_{D} = April 6. We are now (April 14) one week later so the effect is still very (!) uncertain, but *d*_{E} = 9 days seems to be a reasonable estimate based (only) on case fatalities between April 4 and April 13 where the total number has close to doubled [7]. To calibrate the relative time to calendar time we finally assume that the cumulative number of case fatalities on March 31, before effect of preventive measures, equals Λ(March 31) = 200. We emphasize that the quantities are by no means precisely estimated, so results contain a lot of uncertainty.

To start, the evolution of the epidemic outbreak is simulated using the system defined above with *λ* = (ln(2)*/*3.5)***2.5*/*1.5 = 0.330. The rate of recovery equals *γ* = *λ*−ln(2)*/d* = 0.132. Finally, the new contact rate *λ*_{E} induced by the new bigger doubling time *d*_{E} = 9 equals *λ*_{E} = (ln(2)*/d*_{E}) + *γ* = 0.209. As a consequence, the iterative prediction model should have *λ* = 0.330 replaced by *λ*_{E} = 0.209 on March 16 and onwards. As a side remark we note that this change of doubling time from 3.5 days to 9 days corresponds to changing *R*_{0} = 2.5 to *R*_{E} = 1 + (ln(2)*/d*_{E})*/γ* = 1.58 giving the magnitude of preventive effects of *ρ* = 1 − *R*_{E}*/R*_{0} = 0.37 so a 37% overall reduction in contact rates. Since the estimate *d*_{E} = 9*days* of the new doubling time after preventive measures are put in place is highly uncertain we do similar calculations assuming the new doubling time instead equals 6 and 14 days respectively, and also for the situation where the preventive measures reduces *R*_{E} to 0.8 implying that it directly starts decaying.

Finally the time calibration. For this we set the infection fatality risk to *f* = 0.3% as a guess. As mentioned above it will not change the time calibration more than a week if the true fatality risk is 0.1% or 1%. The 200 case fatalities by March 31 would hence imply that the number of infected three weeks earlier, March 10, equals 200*/*0.003 = 67 000. We therefore calibrate March 10 to the relative day *t* at which the cumulative number of infected equals 67 000 (or as close to as possible). This turns out to be on day *t* = 31 of the epidemic.

Once the calibration is performed it is possible to predict relevant quantities of the outbreak, both with and without preventive measures of different magnitudes. The results are summarized in Figure 1 and Figure 2 below. Figure 1 reports the daily incidence of new infections for the different scenarios. For example, considering our main prediction curve, assuming the new doubling time equals *d*_{e} = 9 days (blue curve) shows that the peak day when most individuals get infected is April 3 when as many as 27 000 (=1.3%) get infected. Further, 75% of all infections have happened by April 19 and 90% of all infections have occurred by May 5. Comparing the different scenarios, without preventions and with preventions of different magnitude, it is seen that the peak is reduced the higher magnitude of preventive measures, and slightly shifted forward in time. Had the preventive measures happened earlier in relation to the outbreak, as for example was the case with other parts of Sweden, the peak heights would have been lower and further shifted forward in time.

Figure 2 shows the cumulative numbers of infected (with overall percentages to the right) for the different scenarios. It is seen that without preventive measures slightly more 1.8 million get infected, corresponding to 90%, and our main prediction curve reduces this number to 1.32 million corresponding to 66%.

Another important observation is that in the first three scenarios, the final fraction infected exceeds the critical immunity level *v*_{C} = 1 − 1*/R*_{0} = 0.6 [4]. This implies that in these three situations the community has reached herd immunity and is hence protected from additional outbreaks when preventive measures are relaxed (assuming infection induce complete immunity!). For the two latter scenarios, a new doubling time of *d*_{E} = 14 days and the situation where preventive measures give *R*_{E} = 0.80, the final fraction infected is even smaller which of course is positive. However, since the corresponding fractions lie below herd immunity, the community is at risk for additional outbreaks if all preventive measures are relaxed. This is particularly the case for the last scenario where less than 20% are infected during the first outbreak.

Needless to say, there are several uncertainties in the presented predictions. One uncertainty is the time calibration which is affected by the choice *s*_{D} = 21 (typical time between infection and death) and the infection fatality risk *f* = 0.3%. However, changing *s*_{D} say 4 days up or down only shifts the time calibration by the same number of days, and changing *f* to 0.1% or 1% only moves the time calibration by less than a week forward or backward.

The initial doubling time *d* = 3.5, and even more so the new doubling time after prevention, play a more significant role, more so than changing *R*_{0} = 2.5 by 10% up or down. If the initial doubling time instead was set to *d* = 4 this would have slowed down the entire epidemic by about a week. The big effect of having different doubling times after preventive measures are put in place, corresponding to different magnitude of preventive measures, is illustrated by the alternative predictive curves.

Finally, our model allows only for heterogeneity in terms of infectiousness. If heterogeneity in susceptibility as well as mixing patterns was included (thus making the model more realistic and complex) the result will typically be that the outbreak is slightly slower and less peaked. To investigate the exact effect, simulations from a realistic models are needed (and recommended!), but a rough estimate of the effect of heterogeneity is that the peak height is reduced by 25% and the left side of the peak is shifted about 1 week forward in time and the right side of the peak is shifted 2 weeks forward in time. This is of course a very crude correction. However, several assumptions in this, as well as other more complicated models, are very crude so there is a lot of uncertainty anyway. The purpose of the current method was more on comparing effects of various preventive measures on the predictions.

Specific predictions for the Stockholm region have not been published elsewhere as far as we know. Predictions for Sweden have however been performed in [6] and [10]. In [6] predictions are only for a short period and not to the end of the outbreak. The main comparison to be made is their statement that by March 28 they estimate the fraction infected to 3.1% of the Swedish population (with credibility bound 0.85%-8.4%). Our best prediction (as described above) would be to look one week earlier, so March 21, and the blue curve of Figure 2, which then has 13.3% infected. We note that our prediction is for Stockholm which had the vast majority of all infections in the beginning of the outbreak, and since the Stockholm region makes up 20% of the country population, 13.3% infected in Stockholm could very well agree with 3.1% (or slightly more) in all of Sweden.

Comparing our predictions with those of [10] is hard because their focus is on hospitalization and case fatalities for the whole country of Sweden rather than Stockholm. Hospitalization and case fatalities shift the curves forward in time by about 2 and 3 weeks respectively. The fact that they consider Sweden and we consider Stockholm will make their curve more spread out in time, and with a peak shiftet forward in time (since covid-19 hit Stockholm first). Given these qualitative differences our result agrees quite well with [10]: their peak for intensive care being around June 1 with most hospital burden being between mid-April and mid-July.

## Conclusions and Discussion

We have demonstrated a simple method to estimate and predict an on-going epidemic outbreak both with and without preventive measures put in place. As input data we use the basic reproduction number *R*_{0} and the doubling time during the early stage of the epidemic, and its new doubling time after preventive measures are put in place. The method also uses the reported cumulative number of deaths at a given time Λ(*t*_{1}), the typical time between infection and death *s*_{D}, and the infection fatality risk *f*, in order to time-calibrate the model to calendar time. The method is most sensitive to the doubling times, to some extent also to *R*_{0}, but fortunately less sensitive to the latter quantities which are often equipped with high uncertainty.

The model used was the simple SIR General epidemic model. This model only allows for heterogeneity in terms of infectiousness (when and how many to infect), but not in terms of susceptibility or mixing-patterns. To include such heterogeneities is of course important and often done (e.g. [5],[3]). The purpose of the present paper is however to keep things simple enough in order to make procedures more transparent. Another feature in the model that would make it more realistic is to include a latent state before becoming infectious by instead using SEIR models [4].

When analysing effects of prevention it was observed that the total number getting infected was reduced as expected. An even bigger difference was however the reduction in the size of the peak, and infections being more spread out in time. Since healthcare systems are directly affected by incidence, and peak incidence (with some delay for severe symptoms to develop) being the most problematic time for hospitals, this effect is very important. Quantifying the positive effect of prevention is hence not straightforward: should it be in terms of how many fewer that ultimately get infected, how much the peak is reduced, or something else? We have no clear answer to the question but one comparison that can be misleading is to compare the number of infections for two different scenarios at a given time point [6]. As an illustration, out main predictive curve (the blue curve) is clearly favourable compared to no preventive measures (the black curve). The peak size is reduced by 65%, and the final number infected is reduced from 1.80 million to 1.32 million. However, if we compare the cumulative number of infections on e.g. April 10, there are 1.59 million infected with no preventive measures and 0.79 million according to our main prediction. However, to say that these preventive measures have reduced the number of infections by 0.8 million is misleading. The curve for the preventive scenario is shifted to the right, and at the end of the outbreak the number of saved infections is 0.48 million rather than 0.80 million (of course still a substantial reduction!). The same reasoning applies when comparing the number of infected having required intensive care or case fatalities: to compare on a specific date during an outbreak can be misleading [6]. In the current paper the focus has been on predicting the number of infected over time. Clearly, the burden on the health care system measured by hospitalized patients or case fatalities, are more relevant quantities. When predicting these quantities, an age-structured model is advantageous since the risk for severe symptoms and death increases with age. However, there is also high uncertainty in what fraction of all infected that will require health care at different levels, as well as the infection fatality risk *f*. For example, current estimates of *f* (not to be mixed up with case fatality risk, cfr) vary between 0.2% up to 1% (e.g. [13], [12]). Clearly, any prediction of the number of fatalities will be equipped with very large uncertainty due to uncertainty in *f* let alone all other uncertainties, and this big uncertainty is most often not acknowledged by instead picking just one published estimate of *f*.

There are several papers doing more advanced and realistic modelling/prediction (e.g. [3], [6], [10]). However, our estimation-prediction methodology is much simpler and straightforward to implement, and we feel it is a useful complement to the more advanced methods referred to. As a consequence, it is much more transparent to see how the few model assumptions affect the results, and it is easy to vary the few parameters to see their effect on predictions. We hope the method will increase understanding about which parameteruncertainties that have biggest impact on predictions, and which parameter-uncertainties that are less influential. Finally, we expect this simple method to give predictions being quite similar to the more complicated models (and even more similar if reducing peak size and shifting epidemic curve forward in time as described in the Stockholm prediction), and if they don’t there is strong reasons to investigate why this is not the case.

There are of course also obvious advantages with more realistic models containing e.g. age-structure, households, work places, symptom response and different preventive measures. Spatial aspects are less important since we consider one city-urban area. More advanced models will give better fit if correct parameter estimates are used, and also more questions can be addressed, such as infection risk in different age-groups, which means of spread is most common, and the effectiveness of different preventive measures as well as when making statements about case fatalities and hospitalization. The general effect of making a more realistic model with more heterogeneities is that slightly fewer will get infected and that the peak height is slightly lowered and shifted a week or two later.

## Data Availability

I have only used publicly available data

## Acknowledgements

I am grateful to the Swedish Research Council (grant 2015-0501) for financial support, and to Dongni Zhang for help in producing the figures.