## Abstract

Following the spread of the COVID-19 pandemic and pending the establishment of vaccination campaigns, several non pharmaceutical interventions such as partial and full lockdown, quarantine and measures of physical distancing have been imposed in order to reduce the spread of the disease and to lift the pressure on healthcare system. Mathematical models are important tools for estimating the impact of these interventions, for monitoring the current evolution of the epidemic at a national level and for estimating the potential long-term consequences of relaxation of measures. In this paper, we model the evolution of the COVID-19 epidemic in Belgium with a deterministic age-structured extended compartmental model. Our model takes special consideration for nursing homes which are modelled as separate entities from the general population in order to capture the specific delay and dynamics within these entities. The model integrates social contact data and is fitted on hospitalisations data (admission and discharge), on the daily number of COVID-19 deaths (with a distinction between general population and nursing home related deaths) and results from serological studies, with a sensitivity analysis based on a Bayesian approach. We present the situation as in November 2020 with the estimation of some characteristics of the COVID-19 deduced from the model. We also present several mid-term and long-term projections based on scenarios of reinforcement or relaxation of social contacts for different general sectors, with a lot of uncertainties remaining.

## 1. Introduction

While there are many models addressing the COVID-19 pandemic, it is important to have models representing each specific country since the evolution of the outbreak as well as the mandated control measures and their efficacies are different. Compartmental SEIR-type epidemic models [1] — where the population is divided into some compartments such as Susceptible, Exposed, Infectious and Recovered — are very suitable for long term projections due to their potential computational speed of running different scenarios, in comparison to e.g. individual-based models. Moreover, SEIR-QD variants — with additional compartments concerning hospitalisations (Q because hospitalisation status is quite similar to quarantine) and deaths (D) — are particularly well suited for COVID-19 pandemic due to the lack of unbiased information on the real prevalence [2, 3].

We present one of the very few existing extended SEIR-QD model adapted and calibrated on Belgium situation and data. Two similar approaches have been developed by the SIMID COVID-19 team (UHasselt-UAntwerp) [4] and the BIOMATH team (UGent) [5]. Those independently developed models have their own characteristics and are complementary since it is difficult at this time to exactly know how to model COVID-19 in the best way. The main goal of those three models is to inform policymakers in Belgium about the projections of potential future decisions as well as informing hospitals, institutions and the scientific community on the estimated effects of non pharmaceutical interventions (NPI). Alternative approaches have also been developed as an individual-based model [6] and a meta-population model [7].

The three Belgian compartmental models have common characteristics as a calibration on hospitalisations, deaths and serological studies (but not considering cases data), a separation in several age classes with different characteristics, a distinction between asymptomatic, presymptomatic and symptomatic people with a different infectiousness, the use of social contact data [8] to monitor the transmission of the virus at different places (home, work, school, leisure) and a Bayesian sensitivity analysis using Markov Chain Monte Carlo (MCMC) methods. However, the model presented in this paper provides several improvements. The main one is the fact that nursing homes are modelled as isolated entities in order to account for differences in timing of spread of the coronavirus compared to the general population and for a proportion of non-COVID-19 related deaths in Belgian nursing homes collected data. Our model has no informed parameter (except social contact data) in order to recover different characteristics of COVID-19 and is calibrated on different stages of the hospitalisation path (admission, discharge and death) to get a good view on length of disease and hospital stay. There is also a specific estimation of potential reimportations coming from travellers during the holiday period to avoid an overestimation of the national transmission.

The paper is organised as follows. In Section 2, we present a technical description of the model. The main characteristics are presented in Subsection 2.1, equations in Subsection 2.2, precisions on the data in Subsection 2.3 and explanations of the calibration method and sensitivity analysis in Subsection 2.4. Additional details including the timeline used and the full set of estimated parameters of the model are given in Appendix A. The Results and Discussion Section 3 starts with a presentation of the current estimation from the model in Subsection 3.1 with different indicators including reproduction number and infection fatality rate at different periods as well as some characteristics of the COVID-19 disease. Then we present a test on the validity of the model in Subsection 3.2 with the confrontation of more recent data with previous calibrations. In Subsection 3.3, we analyse a mid-term projections based on estimations of new policy measures applied in October and November in Belgium concerning hospitalisations and deaths together with an extrapolation on prevalence and seroprevalence within each age group. Then some scenarios-based long-term projections are presented in Subsection 3.4 visualising potential impacts of various exit strategies during the first semester of 2021. Finally, in Subsection 3.5 we provide a conclusion with strengths and limitations concerning the presented model.

## 2. Materials and Methods

### 2.1. General description of the model

The continuous deterministic compartmental model is divided into the following 8 compartments in order to take account of the different possible stages of the disease as well as the separation between asymptomatic and symptomatic people with different infectiousness: Susceptible *S*, Exposed *E*, Asymptomatic Infectious *I*^{A}, Presymptomatic Infectious *I*^{P}, Symptomatic Infectious *I*^{S}, Hospitalised *Q*, Deceased *D* and Recovered *R*. A more precise description is presented in Table A.5 of Appendix A. All those compartments exist for every age class. We do not consider in this model any subdivision inside the hospital compartment. A schematic view of the compartments with their relations is presented in Figure 1.

In addition, 2000 isolated nursing homes [9] of similar average size are considered with all those compartments, also presented in Figure 1, and modelled as isolated entities in order to take account of the different spread timing of the coronavirus compared to the general population. The transmission of infection from the general population to those nursing homes is modelled by a discrete random infection process which is detailed in Subsection 2.2.2.

We consider the following age classes among the population: 0-24, 25-44, 45-64, 65-74 and 75+. Those classes correspond to public available data [10]. We assume that the classes up to 74 are only present among the general population, while the remaining is divided between a general 75+ and a specific class of nursing homes residents. The transmission of the coronavirus between all classes of the general population is computed using social contact data at different places [8].

Some additional estimated parameters are considered in order to capture specific effects. A probability parameter is capturing the fact that only a part of the reported deaths from nursing homes are due to the COVID-19 [11]. A corrective coefficient is used to correct the new hospitalisations data since patients initially hospitalised for another reason or with no valid PCR test are not officially considered in the admissions data [12]. Recovery and death rates from hospitals are considered variable in time in order to take the continuous improvement of care methods into account [13]. A variable hospitalisation policy is considered for nursing homes during the first wave (period March-June) since residents are less likely to be hospitalised when the hospital load is important (more than half of the hospitals had admission criteria and specific agreements with nursing homes during the first wave [14]). All those specificities are detailed in Subsection 2.2. This model takes into consideration potential reimportations of COVID-19 from abroad during the holidays period based on travel trends data which are detailed in Subsection 2.3.

Policy changes, according to Belgian epidemic’ schedule, are modelled using different coefficients for the social contact matrices [8]. Social contacts are divided into 4 categories: home (household and nearby family), work (with transport), school and leisure (with other places). All contacts are considered at 100% during the period up to March 14, 2020. Then reduced percentages are estimated by the model for the different periods of lockdown and phases of lift of measures. These reduced percentages are the effect at the same time of mobility restrictions, social distancing, prevention measures, testing and contact tracing, while it is mathematically impossible to determine the exact part of those effects. Hence new parameters for some or all social contact types are estimated each time there is an important policy change. The timeline of control measures in Belgium and the way those measures are modelled are described in Appendix A and Table A.7. Long-term scenarios-based projections are constructed assuming a constant policy and compliance to measures during the future with different realistic possibilities of percentage of social contacts for still unknown policy effects, but otherwise estimated impacts of previous control measures are assumed to remain the same in future.

This model does not take into consideration effects like seasonality or cross-immunity. The population is only age-structured and not spatially structured. A spatial refinement of such a model would be really important, but currently the public data officially provided are not detailed enough to properly fit a complex spatial-structured model.

### 2.2. Equations of the model

#### 2.2.1. Equations of the model for the general population part

Equations of the model for the general population are the following ones, with *i* = 0-24, 25-44, 45-64, 65-74, 75+ depending on the age class:
Compartments are Susceptible (*S*), Exposed (*E*), Infectious asymptomatic , presymptomatic or symptomatic , Hospitalised (*Q*), Deceased (*D*) and Recovered (*R*). The main part of the model is continuous (time unit is days). Elements in italic are additional discrete actions performed each day. Infections during vacation travels are modelled as follows: during the holiday period (July-September 2020), some individuals are removed each day from the *S*_{i} classes and added to the corresponding *E*_{i} classes (for age classes below 75) according to estimated travellers and estimated prevalence in the visited countries (as explained in Subsection 2.3) with a global coefficient *C*_{reimp}.

Parameters are listed and explained in Table A.6 of Appendix A. Some specific parameters are time-dependent and their dependence are computed using a logistic sigmoid function in order to model a smooth transition between two states with a minimal number of estimated parameters in order to minimise overfitting. For the general population, such a logistic function (called “recovery” function) monitors the continuous care improvement at hospitals over time [13] (with parameters estimated from the data):
The structure of the population is *N*_{0−24} = 3250000, *N*_{25−44} = 3000000, *N*_{45−64} = 3080000, *N*_{65−74} = 1150000 and *N*_{75+} = 870000 outside nursing homes (with an additional *N*_{h} = 150000 inside nursing homes) for a total population of *N* = 11500000 (including death compartments) which is assumed constant. Those numbers are round numbers coming from the structure of the Belgian population as provided by the Belgian Federal Government on April 2020 [15]. The estimated initial prevalence *p*_{0} is proportionally distributed between the *E*_{i} on day 1 among the general population (corresponding to March 1 reported situation = February 29 real situation). Nursing homes are assumed not initially infected.

Transmission is governed by the so-called social contact hypothesis [16]. Social contact matrices *M*_{ij} (representing the average number of contacts per day of age class *i* from an individual of age classe *j*) are based on social contact data from Flanders (Belgium main region) collected in 2010 [17] and computed separately for home, work, school and leisure using the SOCRATES online tool [8]. Work and transport categories are merged as well as leisure and other places. Scaling coefficients *C*_{*} are used to represent the differential effect of introduced lockdown/policy measures on each of these types of social contact. Those coefficients capture at the same time the transmission reduction coming from a global diminution of the contact rate (lockdown, closures) as well as from sanitary measures like social distancing or mask wearing. Hence the complete contact matrices are (for each given constant policy period detailed in Appendix A):
In addition to the contact rate, there are two coefficients *λ*_{a} and *λ*_{s} representing the transmission probability for asymptomatic/presymptomatic and symptomatic individuals, capturing susceptibility and infectiousness. There are assumed age-class independent, while the heterogeneity in infectiousness is introduced by a distinct probability of being asymptomatic.

The basic reproduction number for the general population is estimated by the leading eigenvalue of the next-generation matrix [18, 19] (with *i, j* indexing the age-classes of the general population):
The effective reproduction number is estimated by . Those reproduction numbers only capture the epidemic within the general population, while the situation within nursing homes is considered as a separated system (for which cases can arise due to contact with the general population through visits, but which does not itself affect the general population).

#### 2.2.2. Equations of the model for the nursing homes part

Equations for the population in nursing homes are nearly similar to those of the general population:
Parameters for disease dynamics in nursing homes are distinct from those in the general population (except age-class independent ones, cf. Table A.6). There are 2000 nursing homes [9] considered as separated entities, with a constant population of 75 inside each one, for a total of *N*_{h} = 150000 residents (rounded up from official 2018 statistics [20]). New arrivals are considered in order to maintain each nursing home’s population and are removed from the 75+ susceptible class (while deaths originated from nursing home are considered as belonging to the general population, hence the nursing homes population remains constant). Those transfers from the general population *S*_{75+} to nursing homes *S*_{h} are taken into consideration since, according to the small population inside each nursing home, new arrivals can have a non-negligible effect on the proportion of susceptible residents.

Transmission inside each nursing home follows a usual SEIR-type transmission with a specific contact rate *m*_{h}. Transmissions coming from the general population is computed in a particular way using a daily probability of infection: each day, for each nursing home, one additional (integer) infected resident is moved from the *S*_{h} compartment to the *E*_{h} compartment with probability , where the coefficient distinguishes between the initial phase *P*_{th} and lockdown and subsequent phases . Note that this particular process is stochastic, as opposed to the rest of the model which is deterministic. Starting from lockdown, transmissions are only considered from the 25-65 population (i.e. with *j* = 25-44 and 45-64) since transmissions are mainly from nursing homes’ workers. Potential reverse transmissions are however not monitored here i.e. nursing home residents do not infect the general population because their impact is more negligible due to the huge size of the general population infected compartments.

Deaths from nursing homes through hospitalisation are counted within the 75+ class for consistency with reported data. Additional COVID-19 reported deaths from nursing homes (without hospitalisation) are monitored using an additional death rate . Since the officially reported data combine both confirmed COVID-19 deaths and suspected COVID-19 deaths [11], there is an unknown overreporting percentage within the data. This overreporting is captured by a constant probability *P*_{cor} that deaths are COVID-19 related. Hence only are removed from the symptomatic compartment while the remaining non-COVID-19 related deaths are assumed occurring in the susceptible class or in the recovered class if the first one is empty. For the first wave only (March 1 to June 30) a variable hospitalisation policy is computed (to represent the fact that the probability of hospitalisations of COVID-19 patients from nursing homes varied over time [14]) using variable parameters *δ*_{h}(*t*) (proportion of hospitalised) and (proportion of direct deaths) of constant sum , this proportion being monitored over time by an estimated logistic function (called “hosp” function) depending on hospitals load with an additional delay:
This variable hospitalisation policy was not applied in the second wave since most of nursing home residents were hospitalised. Hence from July 1, those parameters are considered with the value *Q* = 0.

### 2.3. Data used

We consider the following data for the calibration of the model coming from Sciensano’s (national public health institute of Belgium) public raw data [10] (October 31, 2020 release), all in daily incidence: new hospitalisations, discharged and deaths from hospital, age-class specific deaths and deaths directly coming from nursing homes. Concerning new hospitalisations, an additional corrective estimated parameter SUPP_{hosp} is added which estimates the percentage of missing COVID-19 patients at the time of admission (hence catching supplementary patients not initially hospitalised for COVID-19 or with no valid PCR test [12]). This correction is directly applied to the data. Deaths reported with a specific date are considered on that specific date while situations reported by hospitals are considered to occur up to 24h before the hospital report hence 2 days before the official data communication. Note that graphics are plotted using the dates of Sciensano’s communications (1 day delay).

Additional constraints are considered coming from Sciensano’s epidemiological reports [10]. Those constraints determine the set of admissibles parameters. Serological studies on blood donors during the first wave are considered to provide strong constraints on the prevalence. However, those serological data are biased since there are strict conditions to be blood donors: being between 18 and 75 years old and having not developed any COVID-19 symptoms during the previous weeks. This bias is naturally integrated into the model by considering for the fit the ratio between immune people coming directly from the asymptomatic compartment (hence the total number in ∑_{i}*R*_{i} coming from compartments, denote by and the total asymptomatic population who has not developed a symptomatic COVID-19 disease for the classes *i* = 25-44, 45-64 and 65-74. This ratio should be respectively between 0.5% and 2.8% 7 days before March 30 and between 3.5% and 6.2% 7 days before April 14, April 27 and May 11 (the 7 day delay is here to take the needed time to build a detectable immunity into account). There are also trivial constraints on parameters as e.g. *δ*_{0−24} < *δ*_{25−44} < … in order to reproduce the increase severity of the COVID-19 for older people as well as trivial constraints to avoid negative or out-of-bound parameters.

Additional constraints are imposed on nursing homes coming from the result of massive PCR test on April-May: the average percentage of infected people should be 8% ± 3% during the period April 15-30 and less than 2% ± 2% during the period May 15-31. Those percentages are estimated from Sciensano’s epidemiological reports using a calculated incidence between each week. Additionally, the average percentage of asymptomatic residents (including presymptomatic ones) among infected should be 75% ± 10%.

This model takes into consideration potential reimportations of COVID-19 from abroad during the holiday period. No reimportation is assumed in June since borders where barely opened. Reimportations are estimated during the period July to September using the following method: According to 2019 travel trends [21], we consider a proportionality of travellers of 36% in July, 26% in August and 21% in September. There is no data available concerning the inhomogeneous repartition inside each month, but we assume a homogeneous one for July and August while a 2 to 1 ratio between the first half of September and the second half. Only the top five countries of destination are considered with the following proportion: France 23%, Spain 11%, Italy 9% and The Netherlands 7% (Belgium is discarded). Then for each of those countries we consider the daily ECDC statistics on cumulative numbers for the previous 14 days of COVID-19 cases per 100000 [22]. The reimportations are added using an estimated global coefficient *C*_{reimp} and injected proportionally in the exposed compartment of the classes 0-24, 25-44, 45-64 and 65-74 and removed from the corresponding susceptible compartments. The estimated reimportations per day are presented in Appendix A Table A.9.

### 2.4. Calibration method

Except social contact data, all of the 65 parameters of the model are estimated using a Markov Chain Monte Carlo (MCMC) method [23], hence there is no assumption coming from studies in other countries. We assume that each daily incidence data follows a Poisson distribution which is appropriate when dealing with count data [24]. The log-likelihood function, representing the probability that observed data correspond to model’s projections, is computed as:
where *y*_{i} represent the observed incidences and *Y*_{i} the expected incidences as given by the model for a given set of parameters. Note that the sum is done over all incidence data (new hospitalisations, discharged and deaths) presented in Subsection 2.3 for each day and that a constant log(*y*_{i}!) is ignored.

The fitting procedure is performed in two steps:

Best-fit mode: An initial calibration step is performed using the maximum likelihood method with an optimised first-choice hill climbing algorithm performed half of the steps on one parameter at a time (i.e. one neighbour = variation of one parameter) and the other half on all parameters (i.e. one neighbour = variation of all parameters), with a quick best fit search performed on accepted descent directions to speed up the process. For all parameters, wide normal prior distributions are used (Table A.10). This initial calibration is highly computationally demanding due to the presence of a very high number of estimated parameters and the presence of local minima. It is initially performed during 5000000 iterations with a special trick to increase the rapidity of the algorithm: instead of 2000 different nursing homes, only 100 nursing homes are considered with each time 20 copies of each. This approximation is suitable as long as the algorithm is still far from the best-fit. In a second time, the best-fit search is pursued for 20000 iterations using the complete 2000 different nursing homes in order to further refine selected parameters. All this procedure is repeated at least 1000 times using parallel computing and 250 parameter sets with best scoring model runs are retained (the others 75% are discarded in order to avoid unwanted local minima).

MCMC mode: A classic Random-Walk Metropolis (RWM) algorithm [23, 25] is performed in order to provide Bayesian inference using the Poisson log-likelihood assumption with the algorithm initiated from the 250 parameter sets obtained from the best-fit mode. For each parameter set, a 20000 burn-in period is performed followed by 200000 iterations retaining every 20000th iteration, which provide 2500 final samples coming from potentially different local minima zones in order to avoid a too high autocorrelation of the results.

The program is written is C language. The code source is publicly available [26]. The full ODEs are solved by numerical integration using the GNU gsl odeiv2 librairy and a Runge-Kutta-Fehlberg45 integrator. The computation was performed on the HPC cluster Hercules2 ^{1}.

## 3. Results and Discussion

### 3.1. Current estimations

We present in this section the result of the calibration of the model as on November 1, 2020, with considered data up to October 31, 2020. Results are presented in the figures with medians, 5% and 95% percentiles, hence with a 90% confidence interval. The comparison between the model and some hospitalisations and deaths incidence data are presented in Figure 2 for the general incidence data in hospitalisations and deaths and in Figure 3 for incidence data in deaths with age class repartition (for the classes which have a significative amount of deaths). Data in Figure 2 are plotted with the correction coming from the underreporting on new hospitalisations to account for the discrepancies in COVID-19 hospital admissions, discharges and deaths. We can notice that, since this model is deterministic, it only captures the average evolution with an uncertainty concerning this average evolution, hence the uncertainty does not capture stochastic realisations around this average.

Figure 4 shows a general representation of the evolution of the epidemic in Belgium with hospitalisations, people discharged from hospitals and deaths coming from hospitals and from nursing homes, all in prevalence or cumulative numbers. We can see that the model calibration fits the hospitalisation prevalence and cumulative deaths data with a good exactitude (excluding of course data noises) despite that fact that the calibration is entirely done on incidence data. The interest in modelling the epidemic within nursing homes separately from the general population can clearly be seen on this figure. Indeed, the form of the death curve for nursing homes is really different from the ones for the general population since the epidemic started later in nursing homes but was more significant.

Initially the model overestimated the number of deaths from the end of the first wave. It was not possible to calibrate constant death rates throughout all phases of the epidemic. This may be a consequence of either or both care improvement in hospitals and lower aggressiveness of the virus. Hence death and recovered rates within each age class are modified by a logistic function depending on time (cf. Subsection 2.2 for details). The current improvement (in comparison to the very beginning of the epidemic) is estimated as 58.2% [49.3% ; 64.4%], hence 58.2% of the patients which should have died in March are now recovering from hospitals. We remark that it is impossible to know which part is due to care improvement (which was confirmed [13]) and which part is potentially due to lower aggressiveness of the virus (if there is any) and that the death rate seems to rise again in October. However, Figure 3 presents a slightly larger increase than expected for the deaths within the class 75+. This could be indicative of a small decrease in the quality of care during the second wave due to the huge load of the hospitals (but still better than during the first wave).

Deaths coming directly from nursing homes are not all due to COVID-19 since many PCR tests are lacking. The model estimates that only 83.1% [66.9% ; 89.4%] of direct nursing home deaths are really due to COVID-19. The ratio between deaths coming directly from nursing homes and deceased patients in hospitals coming initially from nursing homes seems to be not constant, and it was necessary to introduce a variable hospitalisation policy [14]. The best answer found was to monitor hospitalisations from nursing homes through a logistic function depending on general hospital load but with a specific delay (cf. description of the logistic function in Subsection 2.2.2). Hence, when hospital load starts to become too high, less people from nursing homes are hospitalised and the reverse effect occurs when hospital load gets lower, but with a delay parameter estimated at 10.6 days [8.4 ; 12.9].

The basic reproduction number *R*_{0}, representing the average number of cases directly generated by one infectious case in a population which is assumed totally susceptible, is estimated in average for each period (we consider this number dependent on lockdown measures) and computed as the leading eigenvalue of the next-generation matrix (cf. Subsection 2.2 for details). The effective reproduction number *R*_{t} represents the average number of cases directly generated by one infectious case taking account of the already immune population, hence varying over a period. Estimations for the general population are presented in Table 1.

The reproduction number of the pre-lockdown period is a bit overestimated compared to other Belgian models ([4],[6], but in accordance with [5]). This is probably due to the fact that the model does not take explicitly account of infections coming from foreign travel at this particular time and this results in an estimated *R*_{0} slightly above 4. For the period May 4-June 7 phases 1A-1B-2 (cf. Table A.7), since there were policy changes almost every week, we only provide here the estimated *R*_{0} at the end of this period. The second wave *R*_{0} does not take account of the new measures applied in October 19 whose effects should only be visible on November.

The infection fatality rate (IFR) can be estimated using the total set of recovered people according to the model (hence including untested and asymptomatic people). Due to variable death rates over time, the IFR in the early period of the epidemic is higher than in the later months. Estimations are presented in Table 2. The mean and last period are limited to September since October data need some consolidation regarding the number of deaths.

Table 3 presents some model estimates characterising disease progression. Durations are derived according to some specific rate parameters related to the model. The model cannot really detect the exact time when symptoms appear, hence the end of the incubation period merely corresponds to the estimated time when the infectiousness becomes more important. The total disease duration for symptomatic people concerns only not hospitalised individuals (not directly recovering from the compartment), while the duration is longer for the others. The hospitalisation duration is the average until discharged or deceased (no distinction is provided, hence according to the average rate of exit of the *Q*_{i} compartment) at the beginning of the epidemic, hence before care improvement. The duration for asymptomatic nursing homes’ residents cannot really be estimated by the model (the confidence interval is very wide). Indeed, once a single nursing home is completely infected, asymptomatic infected residents can remain a very long time inside the compartment without infecting any new resident, hence there is no constraint within the model on this duration coming from the available data. This excessive duration must be considered as an outlier.

### 3.2. Confrontation of previous calibrations

One way to test the robustness of a model is to assess ability of the model to predict the evolution of the epidemic beyond the time period for which data is used to fit the model. This model can provide projections or scenarios in two different ways. When new policy interventions are expected or a specific behaviour change is planned due to the calendar, it is possible to extrapolate the future transmission of the COVID-19 (monitored here by the number of contacts) using relative percentage of transmission in comparison to the pre-lockdown phase. This percentage can only be a vague estimate of what could be the real transmission and it is sometimes necessary to look at several different scenarios. On the other hand, when no new policy intervention is expected for a certain time, it is possible to have a more precise projection based on current estimated contacts (what we call the current behaviour) but which is only valid up to the next policy intervention.

We present two old projections from the model. The first one in Figure 5a is a 2.5 month projection based on the specific scenario that the transmission at school from September 1 would be at a level of 75% in comparison to the pre-lockdown period due to sanitary measures like masks wearing. The second one presented in Figure 5b is a 1 month projection based on the current behaviour and the estimation from the model of the percentage of transmission at schools compared to the pre-pandemic situation (coefficient *C*_{school}), which was estimated at that time to be 69.7% [44.2% ; 88.6%]. Comparison of these old projections with observations highlights the fact that the uncertainty must be taken into consideration.

### 3.3. Mid-term scenario-based projection

Every scenario is hypothetical. New measures that have not been tested cannot really be estimated on the level of their impact and it is impossible to predict evolution in compliance to them from the population as well as future policy changes. This is why any realistic projection must rely on the assumption of a perfect continuity of measures and compliance for elements which are a priori not suspected to change soon and on different hypothetical scenarios for untested modifications of measures.

In response to the large second wave in Belgium, authorities decided to enforce new measures on October 19 such as closing bars and restaurants, reducing the allowed social contact (known as bubble) to one person, promoting teleworking and establishing a curfew night time. On November 2, a soft lockdown was put into place, with closure of non-essential shops, teleworking mandatory, leisure mostly reduced and social contacts even more reduced. Schools are closed during 2 weeks and then reopen with a 5/6 attendance (except for universities).

While it is impossible to know with precision the impact from those measures, we estimate that the effect from the soft lockdown could be comparable to the effect of the first lockdown, since the small remaining liberties could be balanced by generalised sanitary measures like mask wearing. The effects from October 19 measures are more uncertain but should be situated in terms of efficacy between September behaviour and lockdown behaviour. Hence the most realistic mid-term scenario is to consider a medium situation from October 19, with a full reduction applied from November 2 until the December 13 planned deadline. Schools are considered at 0% transmission from November 2 to November 15 and at 5/6 thereafter. Every contacts are assumed to be restored at September level after December 13 (except for usual school closures). Social contact matrix coefficients concerning this scenario are detailed in Table A.8.

In Figure 6a, we present the estimated effect on hospital load from those measures. We note that, according to those measures and to the model, the theoretical maximum capacity of 10000 hospital beds in Belgium should be almost reached but not exceeded. Figure 6b presents the expected mortality in case of the new measures scenario. This expected mortality relies on a quality of care that may not be maintained.

We must remark that the uncertainty shown in the different projections (Figures 5, 6, 7) is the result of uncertainty on posterior parameter estimates, but those parameters include both disease characteristics and previous reductions on specific contact rates (coefficients *C*_{*}). For example, the current behaviour scenario in Figure 6a is represented using the uncertainty coming from COVID-19 estimated characteristics as well as the uncertainty coming from the estimation of the current behaviour, while the new measures scenario also uses the uncertainty coming from the estimation of first lockdown contact rates.

We can also extrapolate the evolution of the prevalence. In Figure 6c, we present the estimated percentage of infected people over time for each age class. We can clearly see the effect of mid-March lockdown measures on children and working people. The effect of lockdown measures on older people (especially 75+) is less important since the curve is broken in a less effective manner. Concerning the second wave, we can see that the virus is really present among the very young population due to two complete months of school opening. Prevalence in this age groups is drastically reduced by the two weeks closure and brought at a lower level than other age classes.

In Figure 6d, we present the estimated percentage of recovered people, hence the estimated percentage of immunity acquired within each age class if we make the assumption that a non-waning immunity is granted to recovered people. Such a lasting immunity is not guaranteed for the moment, but recent studies show that antibodies are present after several months for a large majority of the population [27]. The seroprevalence is calibrated using blood donor tests results (around 1.3% on March 30 and 4.7% on April 14) [10]. Since those tests where only performed on an (almost) asymptomatic population which have not developed COVID-19 symptoms from the past 4 weeks, the model also takes into account immunity coming from the symptomatic population and from nursing homes. Note that we allow a 7 days delay in our model after recovering to be sure of the detectability of the antibodies. Table 4 presents the detail of some seroprevalence estimation.

### 3.4. Long-term scenarios-based projections

The model allows to construct long-term scenarios which are very suitable to study the potential impact from a specific measure. The possibilities are numerous but we present in this section a simple study of the potential impact of an increase in contacts at a specific place (school, family, work and leisure). The increase is perform from January 4, 2021 up to June 30, 2021, when the risk of an emerging third wave is present. We work here with the assumption that there is no modification on the set of susceptible people except from natural infection, hence with the assumption that a non-waning immunity is granted to recovered people. This hypothesis could be modified negatively in the future if the probability of a reinfection is important or positively if the immunity is artificially increased by the arrival of a vaccine. We must emphasise that those scenarios are not real forecasts but only projections under some assumptions. In particular, these projections do not take into account any potential variant of concern with significantly different characteristics.

The baseline scenario is the restart of all activities on January 4 with similar transmissions/contacts as in September. Those estimated contacts percentage are *C*_{school}=88.2% [40.5 %; 99.0%] for school contacts, *C*_{home}=51.4% [46.9 %; 54.4%] for family contacts, *C*_{work}=9.3% [6.0 %; 14.5%] for work contacts and *C*_{leisure}=31.3% [21.2 %; 55.6%] for leisure contacts. We recall that those percentages do not correspond to the exact number of contacts as determined by the attendance, but to the reduced transmission in comparison to the pre-lockdown period as the result of decrease of contacts but also of sanitary measures. These numbers reflect that transmission is estimated at a very low level at work since sanitary measures and social distancing are more respected than during leisures or among family. The high transmission percentage at school does not necessarily mean that schools are the engine of the virus transmission since most of the student are asymptomatic with a reduced infectiousness, and the uncertainty concerning this parameter is very high.

The baseline scenario is presented in Figure 7a together with the potential impact of full transmission at school *C*_{school} = 100%, hence a transmission without any sanitary measure as well as without any quarantine imposed by the testing and tracing process. We can see that the baseline scenario itself provides a non-zero probability of a third wave but still low. The full contacts at school scenario increases a bit this probability to a reasonable extent.

Increases in family contacts, work contacts and leisure contacts are presented in Figures 7b, 7c and 7d with each time a hypothetical increase of 10% or 20% on respectively *C*_{home}, *C*_{work} and *C*_{leisure}. Those increase must be understood as a non-proportional increase (e.g. a work increase of 10% corresponds to *C*_{work} = 9.3% + 10% = 19.3%). We can clearly see that an increase in leisure contact has the most important effect on the evolution of the epidemic and could lead to a potentially problematic third wave. Full transmission scenarios for family, work or leisure cannot be taken as realistic since they would provide a complete explosion in the absence of vaccine.

### 3.5. Conclusion

We have presented an age-structured SEIR-QD type model with a number of improvements compared to others models as a specific consideration for nursing homes, variable parameters and reimportation from travellers. Those improvements were important in order to catch the specificity of the epidemic in Belgium.

The model allows to have a good study of the current behaviour of the epidemic, with an estimation of hidden elements like the real prevalence of the virus and the potential evolution of the immunity. More important, the model allows to construct scenarios-based projections in order to estimate the potential impact from new policy measures and can explicitly serve to complement others models for policy makers.

However, the model suffers from several limitations which would be important to try to solve in order to better catch the evolution of the epidemic. In particular, the model is only capturing an average behaviour resulting in a kind of underestimation of the uncertainty on the real data. This is mainly due to the deterministic nature of the model not capturing daily stochastic realisations. This model captures some key elements as heterogeneity concerning the age structure and the particular role of nursing home but misses other heterogeneous aspects. For example, the lack of spatial consideration is a huge approximation of the reality, even if the Belgian country is small and very connected. Also, the compartmental distinction is limited to asymptomatic and symptomatic while there are several variations of the severity and hospitals are considered as a unique homogeneous element. Furthermore, the lack of refinement inside age classes is a brake on the study of interesting scenarios, as e.g. studying the separated impact from transmission at primary school, secondary school or university. We must remark however that such a distinction is impossible without sufficiently refined data, and those are not publicly released in Belgium, which is very problematic for quality scientific research.

## Data Availability

All used data are available on the website of the Belgian Scientific Institute for Public Health, Sciensano. The code source is publicly available.

## Acknowledgement

The author wants to acknowledge the different members of the Walloon consortium on mathematical model of the COVID-19 epidemic for the numerous discussions, especially Sebastien Clesse, Annick Sartenaer, Alexandre Mauroy, Timoteo Carletti as well as Germain van Bever for statistical discussions. The author wants also to acknowledge the members of the Flemish consortium for the very useful exchanges, models’ comparisons and helps on improvement, especially the members of the SIMID-COVID-19 consortium (UHasselt-UAntwerp) and the BIOMATH team (UGent).

This work was supported by the Namur Institute for Complex Systems (naXys) and the Department of Mathematics of the University of Namur, Belgium. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11 and by the Walloon Region.

## Appendix A. Supplementary material: model details, timeline and estimated parameters

In this Appendix, we provide some technical details concerning the construction of the model.

### Appendix A.1. Terminology and parameters description

A description of the terminology used for the compartmental model is presented in Table A.5 and a description of the model parameters in Table A.6. Parameters without age class index *i* are assumed similar for all classes while those with age class index *i* are class-dependent.

### Appendix A.2. Timeline and intervention measures

According to the start of the pandemic in March 2020, the Belgian government began to apply several non pharmaceutical interventions (NPI) in order to reduce the transmission of the virus and to protect health care capacities. In this subsection, we detail how those interventions have been taken into consideration within our model. Those NPI are modelled by the four generic coefficients of the contact matrices: *C*_{home} for transmission within the family (household and nearby family), *C*_{work} for transmission during work and travels, *C*_{school} for transmission at school and *C*_{leisure} for transmission during leisures or other activities. All those coefficients are considered at maximal value 1 during the pre-lockdown period. Then there are assumed to be at 0 if the sector is completely closed (which only happens which school) or estimated by the model according to specific parameters (whose estimated values are detailed in Table A.10).

The lockdown during the first wave took place in two steps. From March 14, the government imposed the closure of schools, shops and of all leisures activities, which is modelled by *C*_{school} = 0 and *C*_{leisure} = *C*_{leisurelock}, an estimated specific parameter catching the reduction of the transmission for leisures and others activities. Then from March 18 midday (assumed March 19 here), additional measures were taken imposing stricter measures of physical distancing, with teleworking mandatory whenever possible and all travels restricted to specific essential tasks. From this period, the new transmission at work is modelled by *C*_{work} = *C*_{worklock} and between family members by *C*_{home} = *C*_{homelock} to catch the reduction of contacts with non-household members.

The lockdown release was planned with several phases 1A-B, 2, 3 and 4 during May to July 2020 (cf. Table A.7 for specific dates). Concerning family and closed contacts, the concept of social bubble [6] has been implemented, with initially only few contacts (bubble of additional 2 people) and a bubble of maximum 10 people on phase 3. This is estimated by *C*_{home} = *C*_{homeunlock} on phase 3 with an intermediate state *C*_{home} = (*C*_{homelock} + *C*_{homeunlock})*/*2 on previous phases. A further extension of the bubble on phase 4 is modelled by *C*_{home} = 2*C*_{homeunlock} − *C*_{homelock}. Concerning works, a progressive reopening took place on phase 1A-B (industries and professional services on 1A and all shops on 1B) modelled by the estimation *C*_{work} = *C*_{workunlock} on phase 1B and the intermediate point on phase 1A. Schools have also a progressive partial opening on phase 2 and 3, modelled by *C*_{school} = *C*_{schoolunlock} on the maximal point on phase 3 and a progressive 20%-40%-60% before. Leisures remained closed until phase 3 in June with limited people and activities modelled by *C*_{leisure} = *C*_{leisurejune} and in phase 4 in July with more people and activities modelled by *C*_{leisure} = *C*_{leisurejuly}.

Due to a restarting epidemic at the end of July, the government decided to backtrack on some measures, restricting again closed and outside contacts. This is implemented by a backtrack to *C*_{home} = *C*_{homeunlock} and a new estimation of non-work related activities *C*_{leisure} = *C*_{leisureaug}. Restarting activities in September are modelled by *C*_{leisure} = *C*_{leisuresept} (since more leisure activities occur during non-holidays periods) and by an estimation of the full opening of schools *C*_{school} = *C*_{schoolsept}. The social contact bubble is once more relaxed *C*_{home} = 2*C*_{homeunlock} − *C*_{homelock} with a backtrack end of September *C*_{home} = *C*_{homeunlock}.

A summary of the Belgian policy timeline with all corresponding social contact matrices coefficients is presented in Table A.7. The Table A.8 contains the social contact matrices coefficients used for the second lockdown scenario described in Subsection 3.3.

### Appendix A.3. Estimated parameters

Table A.9 shows the estimated number of reimportations of COVID-19 per day during the holidays period. The complete list of estimated parameters from the calibration on October 31, 2020 data is given in Table A.10.

## Footnotes

Minor revision.

↵

^{1}“Plateforme Technologique de calcul Intensif” (PTCI) located at the University of Namur, Belgium.