Guiding Austria through the COVID-19 Epidemics with a Forecast-Based Early Warning System

Background. The corona crisis hit Austria at the end of February 2020 with one of the first European superspreading events. In response, the governmental crisis unit commissioned a forecast consortium with regularly projections of case numbers and demand for hospital beds. Methods. We consolidated the output of three independent epidemiological models (ranging from agent-based micro simulation to parsimonious compartmental models) and published weekly short-term forecasts for the number of confirmed cases as well as estimates and upper bounds for the required hospital beds. Findings. Here, we report om four key contributions by which our forecasting and reporting system has helped shaping Austria's policy to navigate the crisis and re-open the country step-wise, namely (i) when and where case numbers are expected to peak during the first wave, (ii) how to safely re-open the country after passing this peak, (iii) how to evaluate the effects of non-pharmaceutical interventions and (iv) provide hospital managers guidance to plan health-care capacities. Interpretation. Complex mathematical epidemiological models play an important role in guiding governmental responses during pandemic crises, provided they are used as a monitoring system to detect epidemiological change points. For policy-makers, the media and the public, it might be problematic to distinguish short-term forecasts from worst-case scenarios with undefined levels of certainty, creating distrust in the legitimacy and accuracy of such models. However, when used as a short-term forecast-based monitoring system, the models can inform decisions to ease or strengthen governmental responses.


I. INTRODUCTION
The first known COVID-19 cases in Austria appeared at the end of February 2020 together with one of the first European superspreading events in the Tyrolean tourist region of Ischgl, visited by travellers from all over the globe [1]. In the first half of March 2020, a nationwide spread of the virus occurred with an exponential sources. The overarching policy goal at this stage was to navigate the crisis without overburdening the Austrian healthcare system.
At an earlier stage than other middle European countries, Austria took a series of non-pharmaceutical interventions (NPIs) in response to the crisis [5]. Next to a ramping up of healthcare and public health capacities, airport restrictions and landing bans intensified in the first week of March, also targeting other countries than China. Gatherings were limited to 500 persons, cultural and other events started to be cancelled on March 10. On March 16, Austria went into a full lockdown with schools, bars, restaurants, and shops being closed, as well as a transitioning into home office for all non-essential employees [5]. Together with other, earlier measures, these NPIs effectively led to a rapid reduction of daily infection numbers. The number of new cases per day reached a first peak on March 26 with 1,065 cases [6]. COVID-19 related hospitalisations peaked on March 31 with 912 regular beds, whereas the ICU utilization peaked on April 8 with 267 beds being occupied by COVID-19 patients. Daily new cases decreased over April after which they fluctuated at values below one hundred until July [7]. Austrian ICU capacities, estimated to be around 1,000 beds that could have been used for COVID-19 patients while maintaining enough capacity for non-COVID-19 emergencies, have never been in danger of being exceeded in the considered time period [8].
The Austrian COVID-19 forecast consortium provided short-term forecasts for case numbers and required hospital beds. Our consortium consisted of three independent modelling teams with experience in the use and development of sophisticated mathematical and computational models to address epidemiological and public health challenges [9][10][11][12][13][14]. The consortium was complemented with experts from the Ministry of Health, the Austrian Agency for Health and Food Safety, as well as external public health experts in weekly meetings. A plethora of epidemiological models to forecast the spread of COVID-19 has been proposed recently [15][16][17][18][19][20]. Here, we consolidated the output of three models into a single forecast of case numbers for 8 days and used these case numbers to predict the numbers of required hospital and ICU beds for 14 days for the country as a whole and for each of its nine federal states over the next two weeks. In addition to these point estimates, we also provided upper and lower bounds for these numbers at various levels of uncertainty. These upper bounds of the hospital bed forecasts served as a guidance system for the regional hospital managers, allowing them to estimate how many beds should be reserved for COVID-19 patients if they were willing to accept a given level of risk. These forecasts have been published each week on the homepage of the Ministry of Health. [21].
At the very beginning of our work as consortium we decided that short-term forecasts have to be clearly separated from long-term scenarios. Due to the multiplicative growth of uncertainties in epidemiological models, accurate forecasts are typically only possible over a time horizon of several days [22][23][24][25]. For longer term scenarios that span several weeks, months or even years, however, there is no meaningful way to estimate their uncertainty. For policy-makers and non-technical experts, however, it would not be immediately clear whether a certain projection is a prognosis with a defined level of certainty, or a hypothetical what-if experiment. Therefore, we decided to publish only short-term forecasts.
In this work we present the forecast and reporting system we developed based on the three independent forecasting models. After a brief summary of the individual models, we describe three different strategies we used to combine their outputs, to forecast healthcare demand based on the combined output and to communicate the joint forecast. To evaluate the impact of certain policies (e.g. the lockdown), we report numerical experiments that show how the epidemic would have progressed if measures would have been taken later or not at all. We discuss how our results were received by policy-makers, stakeholders in the healthcare system, and the public. We claim that our approach offered valuable contributions to chart a safe path to re-open the country after the lockdown using a strategy that can be described as "driving on sight". The aim of this work is to communicate the methods applied and developed which allowed three individually thinking modelling and simulation research units to work together in a joint task force producing a consolidated forecast, the benefits and shortcomings of the process, and the political impact of the achieved results. We conclude that epidemiological models can be useful as the basis for short-term forecast-based monitoring systems to detect epidemiological change points, but become problematic when used to produce long-term worst-case scenarios due to their undefined levels of certainty.

II. METHODS
We used three conceptually different epidemiological COVID-19 models, developed and operated individually by three research institutions, namely a modified SIR-X differential equation model (Medical University of Vienna / Complexity Science Hub), an Agent-Based simulation model (TU Wien / dwh GmbH), and a simplified state space model (Austrian National Public Health Institute).

A. Data
Although the three models use different parameters and parametrization routines, they are calibrated using the same data to generate weekly forecasts. Consequently, differences between the model forecasts are a result of different model structure and calibration, but not a result of different data sources. The models also used different nowcasting approaches to correct for late . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020. 10.18.20214767 doi: medRxiv preprint reporting of positive test results. We used data from the official Austrian COVID-19 disease reporting system (EMS, [7]). The system is operated by the Austrian Ministry of Health, the federal administrations, and the Austrian Agency for Health and Food Safety.
For every person tested positively in Austria, it contains information on the date of the test, date of recovery or death, age, sex and place of residence. Furthermore, hospital occupancy of COVID-19 patients in ICU and normal wards are available from daily reports collected by the Ministry for Internal Affairs.

B. Extended SIR-X Model
One of our models is an extension of the recently introduced SIR-X model [15]. The original SIR-X model introduced a parsimonious way to extend the classic SIR dynamics with the impact of NPIs. In particular, two classes of NPIs are considered. First, there are NPIs that lead to a contact reduction of all individuals (susceptible and infected ones). Such NPIs include social distancing and other lockdown measures. Second, the model also represents NPIs that reduce the effective duration of infectiousness for infected individuals. Contact-tracing and quarantine belong to this category.
The original SIR-X model does not offer a way to model the return-to-normal, i.e., the taking back of NPIs. We extended the model by introducing a mechanism by which susceptible but quarantined individuals increase their number of contacts again; a model we dub the XSIR-X model, see SI Appendix A.. Further, we structured the population according to age, introduced multiple calibration phases to model behavioural changes in the population over time, and used mobility data to identify such turning points [26]. Forecast errors are estimated by recalibrating the model to perturbed data points that are displaced proportionally to the empirical deviation between model and data; see SI Appendix A.

C. Agent-Based SEIR Model
The second model is an Agent-Based SEIR type model [12]. It is stochastic, population-dynamic and depicts every inhabitant of Austria as one model agent. It uses sampling methods to generate an initial agent population with statistically representative demographic properties and makes use of a partially event-based, partially timestep (1 day)-based update strategy to enhance in time.
It is based on a validated population model of Austria including demographic processes like death, birth, and migration [14]. Contacts between agents are responsible for disease transmission and are sampled via locations in which agents meet: schools, workplaces, households and leisure-time. After being infected, agents go through a detailed disease and/or patient pathway that depicts the different states of the disease and the treatment of the patient.
The model input consists of a time-line of modelled NPIs; parameters are calibrated using a modified bisection method. Results are gathered via Monte-Carlo simulations as the point-wise sample mean of multiple simulation runs. Due to the large number of agents in the model, 8 simulation runs are used which are sufficient to have the sample mean approximate the real unknown mean with an error of less than 1% with 95% confidence (estimated by the Gaussian stopping introduced in [27]).
The model considers uncertainty with respect to the stochastic perturbations in the model by tracing the standard deviation of the Monte Carlo simulations. Parameter uncertainty is considered in form of manually defined best and worst case scenarios. Parameter values of the model are continuously improved and available online [28].

D. Epidemiological Clockwork Model
The third model is a simplified state space model that traces individuals through the stages infected, latency period, infectious, reported, contained and immunized/deceased; see also SI Appendix B. Based on the ratio between infected and infectious, the true infection rate is calculated and extrapolated in the future using exponential smoothing or moving average. The model distinguishes between detected an undetected cases and accounts for the import of infected cases (who did not acquire the infection from the calculated number of infectious individuals).
The underlying detection rate, immigration, and the effectiveness of contact-tracing are time-varying parameters that also reflect qualitative information such as spikes in the number of reported new infected cases that can be explained by mass test screenings.
Uncertainty can be modelled by varying underlying model parameters and also results from parameter uncertainty in the infection rate extrapolation.

E. Model harmonization
In order to harmonize the models and generate a single consolidated forecast for the number of accumulated positive COVID-19 tests, each model was set up to generate its output in a common data format for each of the nine federal states of Austria individually. Our forecasts consisted of time series that start with the number of positive tests at the day of the prognosis committee meeting at 11:59 pm and continued in daily time intervals. The length of the time series, the forecasting scope, varied between 8 and 14 days.
Three averaging procedures were considered to generate the joined forecast. These included (1) the point-wise arithmetic mean and two dynamic weighting procedures . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020.10.18.20214767 doi: medRxiv preprint wherein the timeseries for each model contributes with a (2) continuous or (3) discrete weighting function with values proportional to the accuracy of its most recent forecasts; see SI Appendix C.
Confidence intervals (CIs) for the harmonized model are derived from the empirical forecast error. Until end of September CIs where derived from the SIR-X model before the method was refined using the empirical forecast error of the harmonized model. Concrete, we retrospectively evaluate the ratio of the consolidated forecast and the actual total number of new cases since the start of the forecast horizon. This ratio follows a log-normal distribution. The upper and lower limits of the CI are derived from the corresponding percentiles of the empirical distribution of this forecast error.

F. Hospital bed usage model
Hospital occupancy is modeled in a stock-flow approach. Inflow (admission to ICU and normal wards) is calculated as a ratio of the time-delayed number of reported or projected new infected. Outflow (discharge) happens after a fixed amount of days for fixed ratios of patients (e.g. 20% of ICU patients are discharged after eight days). Admission rates are calculated separately for sex and age groups (0-39, 40-59, 60-79, and 80 years and above, respectively) and are scaled in order to fit the current occupancy in all federal states. The scaling parameter (one for each federal state) can thus be interpreted as the hospitalisation rate.
Confidence intervals for the occupancy forecast are calculated from increments of the forecast error. A technical description is given in the SI, Appendix D.
Model parameters were initially extracted from literature [29] and subsequently calibrated to actual data to better fit the observed time series. A subsequent analysis based on March-June inpatient data revealed that the calibrated model parameters correspond with observed average length of stay. Refer to the supporting information for a full list of model parameters.

A. Forecasting positive test numbers
We show the results for our rolling forecasts compared with the actual case numbers in Figure 1. For the time period from April 4 to September 25 2020, we performed and harmonized weekly forecasts that are visibly as bundles of lines in Figure 1. The first published forecasts coincided closely with the peak of the first epidemic wave in Austria. This can be seen by a gradual flattening of the curve of cumulative case numbers over April. From May until July, the curve showed a linear growth pattern.
While the models showed a clearly discernible divergence for the first prognosis day, the agreement increased over time. The starting points for the early weekly forecasts occasionally lie below the actual cases due to a substantial amount of very late reporting of cases in these early periods of the epidemic. In the weeks thereafter, agreement amongst the three models is typically stronger than the agreement with the data, meaning if one model over-or underestimated the actual trend, so did the other models. We investigated the performance of different averaging procedures that weigh models according to their past performance in terms of their root mean square error (RMSE), see Methods. The results are summarized in Table I. Performance weighting procedures yielded only a marginal improvement over simple averaging in terms of forecast accuracy.

B. Forecasting bed usage
In Figure 2 we show our rolling forecasts for the number of (A) intensive and (B) normal care beds currently in use for COVID-19 patients. While the case numbers and normal ward occupancy peaked in late March, ICU bed usage peaked about two weeks later in April.
The first forecast for normal care occupancy overshot the observed values. With more available data and bet-. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020.10.18.20214767 doi: medRxiv preprint FIG. 1. Rolling combined and consolidated out-of-sample forecasts for the number of confirmed cases in Austria. We show the weekly predictions from the three different models, their arithmetic average with it corresponding CI, and the actual case numbers. In the first two weeks, no CI was given.
ter calibration, the model adequately captured the trends for both normal and intensive care beds. When case numbers started to rise again in late August/September, the model correctly anticipated the corresponding rise in bed usage. In general, ICU occupancy forecasts featured higher accuracy than normal ward occupancy forecasts. ing equal, this lockdown was applied (A) one week later or (B) not at all. By end of April, instead of the approx. 16,000 actual cases, we would have expected 32,000 (95%CI 18,000-51,000) cases would the lockdown have been implemented one week later and 58,000 (95%CI 26,000-114,000) cases in the complete absence of a full lockdown. Assuming a constant case fatality rate, on April 30 these cases would have translated into (A) 1,100 (95%CI 650-1,800) or (B) 2,100 (95%CI 930-4,100) instead of the recorded 549 deaths. With no lockdown taken on March 16, the size of the outbreak in Austria would therefore have been comparable with the dynamics observed in Sweden at the time.

E. Reporting of the forecasts
We developed a standard reporting template used to communicate our forecasts to other stakeholders and decision-makers, see Figure 5. These visual reports showed our forecasts for cases and beds, as well as in-formation on the effective reproduction number and the daily increments in positive tests. The visual reports are complemented by a brief synopsis of the researcher's appraisal of the current situation and particularities of the most recent forecast. We highlight what drives our results and illustrate the nature of the underlying uncertainty, such as the role of tourists returning from risk areas in late August. Furthermore, the researchers are at disposal for any questions that members of the health ministers' office or the regional crisis management units may have.
The first panel in Figure 5 provided an outlook for the expected developments in cases. The trends in these developments, i.e. whether the growth in case numbers accelerates or decelerates, is shown with the timeseries for the daily increments and the effective reproduction number. We communicated the bed forecasts by reporting the prognosis with two CIs, giving the current capacity in an inset. The upper limit of the 95%CI we defined as the "capacity provision", i.e. the upper limit for the number of needed beds at the given level of certainty.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

IV. DISCUSSION
Considering the impact of COVID-19 policy measures on economic and social life, any related decision support needs to be done with caution. Our approach considers the high impact of COVID-19 forecasts by (1) focusing on monitoring rather than long term prognosis and (2) consolidation of three different "model opinions" which not only improved the quality of the short term forecast, but also distributed the responsibility of the decision support.
Our forecasts provided sound evidence for the expected number of total and hospitalised cases that include appraisals of uncertainty via forecast intervals. This is in contrast to what has come to be known as "worst case coronavirus science", i.e. the communication of worst case scenarios as baselines in the public pandemic management strategy. For instance, the UK policy change toward adopting more drastic NPIs on March 23 was informed by worst case scenarios created by the Imperial College COVID-19 Response Team that within the current policy regime 250,000 deaths were to be expected.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020.10.18.20214767 doi: medRxiv preprint FIG. 5. Example for a reporting template of our out-ofsample forecasts. The visual reports consist of five panels. First, we report the harmonized forecast, its 68% and 95% CI, and the historic case numbers, stratified according to their status of being dead, recovered, or active. Four additional panels show the daily increments in cases, the effective reproduction number, and the forecasts for intensive and normal care beds in use for COVID-19 patients.
In a press conference in April, the Austrian chancellor publicly stated that soon "everyone will know someone who died because of COVID-19", based on an external SIR-model-based worst case scenario that contained a death toll of 100,000 people (1.1% of the Austrian population) [30]. Such scenarios are problematic due to their undefined level of certainty and contributed to a public perception of unreliable epidemiological models. While scenario analysis and numerical experiments in "what if" scenarios have their merits in evaluating effectiveness of certain NPIs, they reduce the trust in government policy once it becomes obvious that the worst case is not going to materialize. If it is the fear of hundreds of thousands of victims in UK or Austria that should compel us to wear masks, why continue wearing them once it is clear that this scenario has been avoided? Based on our results, we argue that the main benefit of epidemiological models comes from their use as shortterm monitoring systems. The models are typically calibrated to the infection dynamics of the last couple of days or weeks and forward project this dynamic based on epidemiological parameters often assumed to be fixed. If a short-term forecast is accurate this means that infection numbers have continued as expected, based on a recent trend. If however, the short term forecast severely overor underestimates the observed dynamics, one should inquire more closely what might have caused this change. This is in line with our observation that if one of our models considerably underestimated future growth, so did typically the other models that expected a similar trend. Inaccurate short term forecasts therefore signal a change in the epidemiological situation that needs to be explained. Inaccurate predictions can be highly informative from a practical point of view.
Next to its role in monitoring, our approach informed both the strategy to re-open the country and to re-instate measures during the second infection wave. The main insight was that even if a second wave would start the next day, experiences from March told us that we would have several weeks for appropriate reactions (namely reinstating NPIs) before there would be any concern for an overburdening of the healthcare system. For those weeks we could additionally provide estimates for the required number of beds hospitals would need to free for COVID-19 patients, the "capacity provision" for ICU beds. First, it gave hospital managers an estimate for providing a certain number of beds at a given level of risk, thereby freeing up capacities for the treatment of non-COVID patients and minimizing health-related collateral damage. Second, and maybe more importantly for the re-opening, these forecasts clearly revealed that an overcrowding of ICU beds was extremely unlikely from April onward, even in the most pessimistic scenarios (upper limits of the confidence level). Further, the average length of stay for non-COVID-19 patients in ICUs in Austria is less than a week. By postponing non-essential treatments it would therefore be possible to free up additional beds in time if the capacity provision would exceed the number of available beds. Based on such insights, the Austrian government decided to gradually ease the NPIs in intervals of 14 days.
By continually benchmarking the actual case numbers against our trend-following short-term predictions, we could offer an unambiguous signal for whether potential rises in case numbers would be reason for concern or not. The capacity provision served as a weekly checkpoint for the epidemiological situation in this sense: as long as it was not exceeded, there were no significant changes in the infection dynamics and the opening could continue.
Our first ICU forecast that severely underestimated the actual development occurred in September. During the summer, infection numbers increased from around 20 confirmed cases per day to about 200 cases, mostly driven by patients aged below 40y. Consequently, the number . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020. 10.18.20214767 doi: medRxiv preprint of severe COVID-19 cases remained low and the effective rate to require intensive care dropped to one percent and below. The situation changed qualitatively in September, when not only case numbers started to soar again, but also bed usage increased much more strongly than projected. Our analysis revealed that the driver behind above-forecast ICU occupancy was above-forecast total case numbers, while age-specific ICU rates remained constant. In other words, the bed usage forecasts were inaccurate because of the infection number forecast, but not because the characteristics of the detected cases changed (e.g., more symptomatic or severe cases). This result reassured crisis managers that an uncontrolled spread of the disease, characterised by an increase in undetected cases and, subsequently, an increase in ICU rates, was not happening. What drove this change was a shift of the age distribution of cases from younger patients toward a demographic more representative of the Austrian population. This development was amplified by several non predictable infections in nursing care homes. The fact that our forecast severely underestimated the actual developments, combined with similar signals from other indicators, contributed to a shift in Austria's policy from re-opening to re-instating NPIs.
Our forecast-based decision support comes with limitations. First of all, the weekly prognosis is based on shared data from the Ministry of Health and the Ministry of Internal Affairs which comes with quality and reporting bias limitations. Moreover, even though the consortium has access to the most accurate and up-todate data about the epidemic in Austria, a lot of information required for valid epidemiological forecasting is available only with considerable delay, or unavailable, since it is and cannot be measured. For instance, the fraction of undetected persons due to asymptomatic disease progression would be one example for such a variable. Further, our forecast is based on simulation models which are generally subject to errors that come from abstraction and simplification of the real system. Through the harmonized handling of three models with entirely different approaches we attempted to reduce such structural uncertainties. Finally, our decision support framework is mostly limited by its political and public visibility. According to our experience, our forecast was of special public and policy interest in periods of rapid movements but also had a confirmatory effect in times of decreasing case numbers or slow growth with respect to taken policy measures.
In conclusion, we argue that worst-case coronavirus science is maybe interesting as an academic exercise, but modellers need to be more cautious and responsible in communicating the strongly speculative nature of their results to politicians and the public. Even if their limitations are adequately discussed in lengthy reports, what often surfaces in the public discourse are tweets and powerpoint slides showing horrific future case numbers and death tolls. Instead, we argue that short-term epidemiological models can be valuable ingredients of a comprehensive monitoring and reporting system to detect epidemiological change points and thereby inform decisions to ease or strengthen governmental responses.

V. ACKNOWLEDGMENTS
We thank Reinhild Strauss, Gabriela El Belazi, Lukas Richter, Daniela Schmid, and Uwe Siebert for helpful and stimulating discussions.

VI. FUNDING
Our work as COVID-19 Forecast Consortium was financially supported by the Federal Ministry for Social Affairs, Health, Care and Consumer Protection. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

SUPPORTING INFORMATION Appendix A: Details on the Extended SIR-X Model
The SIR-X model [15] was originally devised to model the transition from exponential to sub-exponential growth due to the implementation of non-pharmaceutical interventions (NPIs). With the entry into containment 2.0, i.e. the taking back of NPIs, a couple of modifications are therefore needed to the original model. In particular, the process of "quarantining" susceptibles has to be added. In the following we give a description of how we extended the baseline SIR-X model in order to implement such processes. We also discuss additional extensions to the model, such as introducing age-structured populations and multiple calibration periods. Note that this document only discusses aspects of the model implementation not described or treated different than in the original model description.

Compartments for quarantined infected and susceptibles
Here we give the extended SIR-X model to account for scaling back NPIs. The baseline model includes two different types of NPI. First, there are NPIs that act on the susceptible population (social distancing, home office, etc.). Second, there are NPIs that act on the infected population, in particular an accelerated detection of cases (e.g., testing and contact tracing). Clearly, scaling back of the NPIs affects primarily the first type of NPIs, while it might be reasonable to expect that NPIs targeting the infected population might even increase in effectiveness.
The baseline SIR-X model is of the following form, We now introduce two extensions, namely (i) having two compartments of locked down individuals (susceptibles, X S , and infecteds, X I ) and (ii) introducing a scaling back of NPIs affecting susceptibles encapsulated in the rate κ 1 ≥ 0. The extended SIR-X model is then of the following form, There are two notable differences now. First, the compartment X I is the cumulative number of confirmed cases, it will be used to calibrate the model. It is imperative to note that the model was explicitly designed to make statements concerning X I . A couple of further straight-forward extensions could be introduced to model, for instance, also recovery within the active X I compartment. Such extensions would further add to the model complexity while providing no value at all for modelling the development of X I . Secondly, the compartment X S is now an explicit model representation of locked down or socially distanced susceptibles. The parameter κ 0 gives the inflow to this compartment from the susceptibles (strength of corresponding NPIs), κ 1 gives the outflow (how fast people increase their levels of social contacts back to normal).

Age structure
We include an age structure in the model in the ususal way. All compartments (S, I, R, X I , X S ) become vectorvalued, so do the rates κ 0 , κ 1 , κ, and β. The parameter α becomes a matrix α ij giving the likelihood that a susceptible of age group j will be infected by an infected from age group i. Entries in α have been calibrated using mobile phone data, where we assume that they are proportional to the probability that a call will take place between individuals of age group i and j. The spectral radius of α is chosen in accordance with [15].
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Calibration
As already mentioned, the model is calibrated via the timeseries of the cumulative number of confirmed cases X I in each federal state of Austria. The basic calibration procedure follows [15], i.e. we solve the model for solutions to the parameters κ and κ 0 , as well as the initial condition I(t = 0) via a trust region reflective algorithm (MatLab's lsqnonlin function). Calibration takes place in different time windows that roughly represent the different phases of the epidemic in Austria. The first phase lasts from t = 0 to end of March and encompasses the "first wave". With beginning of April, Austria moved into a containment phase characterized by less than hundred new confirmed cases per day. The second calibration phase ends mid-June where the daily cases started to increase again with most days showing more than hundred new cases. The third calibration phase lasts until the end August, when infection numbers started to increase again, after which the fourth calibration phase commenced.

Appendix B: Details on the Epidemiological Clockwork Model
The epidemiological stage model tracks individuals through the stages "infected", "infectious", "reported", "isolated", and "immunized". It aims to isolate the true infection rate via augmenting reported case numbers with time-constant epidemiological parameters and time-varying assumptions on detection rate, isolation rate, and number of imported cases. This infection rate is then extrapolated using exponential smoothing models in order to forecast future case numbers.

Data preparation
In a first step, assumptions on detection and isolation rates are made based on the cluster analysis. For example, if a large share of new reported cases is attributed to a single workplace setting where the whole staff was tested, we assume that contact isolation and detection rate are high at the day these cases were reported, and were lower in days before the mass testing. We furthermore subtract sporadically imported cases as reported in the cluster analysis. This serves to isolate domestically acquired infections for the calculation of the infection rate.
In a second step, case numbers are cleared in a nowcasting procedure, where expected delays and weekend effects are accounted for based on historic values of the relevant federal states. We use exponential smoothing to identify seasonality, error and level of the case numbers for each federal state.
In a third step, assumptions on the risk of importing infectious cases are made, i.e. people who have not acquired the infection domestically but increase the number of infectious.. This figure is parameterised to account for size of federal states and adjusted to reflect travel patterns in risk areas. For example, this value was increased to account for a spike in reported cases that were traced back to travellers from Croatia.

Fixed parameters
• Duration infected before infectious: d 1 = 1 day • Duration infectious before reported: d 2 = 1 day • Duration infectious for non-severe cases: d 3 = 6 days Latency period is therefore 2 days. This value was initially set to 3 days but was reduced owing to evidence that average latency period is shorter. Note that transmission relies on both pre-symptomatic detected cases as well as non-detected cases, who may or may not have mild symptoms.

Variable parameters (default values)
• Background infection risk t = population/100, 000 • Detection rate: new reported cases t /new total cases t = r 1 = 1/5 • Effectiveness of contact isolation: new isolated cases t /new reported cases t = r 2 = 2/5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020. 10.18.20214767 doi: medRxiv preprint Those given values are standard parameterisations that are adjusted to account for specific circumstances. In the standard case, there would be four additional undetected cases for every reported case, and on average two infected people would be quarantined for every reported case.

Mathematical model
The model calculates the number of individuals in each stage according to the following difference equations, new infectious cases t = new total cases t+1 − new isolated cases t − new immunized cases t (B1) new immunised cases t = new total cases t−6 − new isolated cases t−6 (B2) infection rate t = (new total cases t+2 − sporadically imported cases t+2 )/total infectious cases t (B3) wheretotal infectious = t 0 new infectious cases + background infection risk t . (B4)

Trend extrapolation
In a last step, the isolated true infection rate is forecasted in an exponential smoothing model (R, package 'forecast'). Owing to the tendency of the disease to progress linear instead of exponential [11], all trends are damped.

Strengths and Limitations
In the Epidemiological Stage Model, isolation of the true infection rate is by a large degree driven by researcher assumptions on the key time-varying variables of imported cases and detection rates. This follows the understanding that reported case numbers should be assessed with available additional information before being further processed by mathematical models. Known sources of error, such as non-detection of cases, travel activity of infected or nonrandomised testing will affect, unless these sources of bias are time-constant and the relevant sample size is large. The otherwise simple mechanic of the model facilitates attribution of case numbers to such causes as time of reporting determines day of infection. Of course, researchers may err as well as model when trying to attribute observed patterns to causes, or parameters. The Epidemiologic Clockwork Model will perform better than more data-driven models if known particularities in the time-series of reported cases play a substantial role in the course of transmission. A limitation of the model is that due to the nature of the information used in assessing the time series of reported cases, which is often qualitative in nature, the data clearing process is not transparent. Ongoing improvements of the model will address these shortcomings as the pandemic progresses.
Appendix C: Combining the three models into a consolidated forecast In summary, three strategies have been evaluated in terms of their forecast error. Let F j i (t) denote the forecasts for the total number of COVID-19 cases on day t for model j for runs made in week i, the harmonized forecast, and R i the corresponding reported number, then the following three strategies have been investigated to calculate the weights a j i , j = 1 . . . 3. Note that, here and in the following, j is an index and not an exponent.
• Naive average. This strategy describes a static arithmetic average of the forecasts. ∀j. (C2) • Continuously weighted dynamic mean. This strategy describes a dynamically weighted average. The weights are determined from the forecasting errors of the previous three weeks. .

(C3)
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 20, 2020. . https://doi.org/10.1101/2020.10.18.20214767 doi: medRxiv preprint