COVIDHunter: An Accurate, Flexible, and Environment-Aware Open-Source COVID-19 Outbreak Simulation Model

Background: Early detection and isolation of COVID-19 patients are essential for successful implementation of mitigation strategies and eventually curbing the disease spread. With a limited number of daily COVID-19 tests performed in every country, simulating the COVID-19 spread along with the potential effect of each mitigation strategy currently remains one of the most effective ways in managing the healthcare system and guiding policy-makers. Methods: We introduce COVIDHunter, a flexible and accurate COVID-19 outbreak simulation model that evaluates the current mitigation measures that are applied to a region and provides suggestions on what strength the upcoming mitigation measure should be. The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by simulating the average number of new infections caused by an infected person considering the effect of external factors, such as environmental conditions (e.g., climate, temperature, humidity) and mitigation measures. Results: Using Switzerland as a case study, COVIDHunter estimates that if the policy-makers relax the mitigation measures by 50% for 30 days then both the daily capacity need for hospital beds and daily number of deaths increase exponentially by an average of 5.1x, who may occupy ICU beds and ventilators for a period of time. Unlike existing models, the COVIDHunter model accurately monitors and predicts the daily number of cases, hospitalizations, and deaths due to COVID-19. Our model is flexible to configure and simple to modify for modeling different scenarios under different environmental conditions and mitigation measures. Availability: We release the source code of the COVIDHunter implementation at https://github.com/CMU- SAFARI/COVIDHunter and show how to flexibly configure our model for any scenario and easily extend it for different measures and conditions than we account for.


Introduction
Coronavirus disease 2019  is caused by SARS-CoV-2 virus, which was first detected in Wuhan, the capital city of Hubei Province in China, in early December 2019 (Du Toit, 2020). Since then, it has rapidly spread to nearly every corner of the globe and has been declared a pandemic in March 2020 by the World Health Organization (WHO). As of January 2021, COVID-19 has since resulted in more than 96 million laboratory-confirmed cases around the world, and has killed nearly 2.2% of the infected population. As there are currently no anti-SARS-CoV-2-specific drugs or effective vaccines widely available to everyone, early detection and isolation of COVID-19 patients remain essential for effectively curbing the disease spread. As a result, many countries across the world have implemented unprecedented lockdown and social distancing measures, affecting millions of people. Regardless of the availability and affordability of COVID-19 testing, it is still extremely challenging to detect and isolate COVID-19 infections at early stages due to three key issues. 1) It is very difficult to accurately identify the initial contraction time of COVID-19 for a patient. This is because COVID-19 patients can develop symptoms between 2 to 14 days (or longer in a few cases) after exposure to the new coronavirus (Lauer et al., 2020;Li et al., 2020). This variable delay is referred to as the virus' incubation period.
2) The coronavirus genome can exhibit rapid genetic changes in its nucleotide sequence, which may occur during viral cell replication, within the host body, or during transmission between hosts (Andersen et al., 2020). This genetic diversity affects the virus virulence, infectivity, transmissibility, and evasion of the host immune responses (Phan, 2020;Pachetti et al., 2020;Toyoshima et al., 2020). 3) The situation becomes even worse as the coronavirus can survive and therefore remain infectious outside the host, on common surfaces such as metal, glass, and banknotes (both paper and polymer) at room temperature for up to 28 days (Kampf et al., 2020;Riddell et al., 2020).
Simulating the spread of COVID-19 has the potential to mitigate the effects of the three key issues, help to better manage the healthcare system, and provide guidance to policy-makers on the effectiveness of various (current, planned or discussed) social distancing and mitigation measures. To this end, many COVID-19 simulation models are proposed (e.g., (Tradigo et al., 2020;Russell et al., 2020;Ashcroft et al., 2020)), some of which are announced to assist in decision-making for policy-makers in countries such as the United Kingdom (ICL (Flaxman et al., 2020)), United States (IHME (Reiner et al., 2020)), and Switzerland (IBZ ). These models tend to follow one of two key approaches. (1) Evaluating the current actual epidemiological situation by accounting for reporting delays and under-reporting due to inefficiencies such as low number of COVID-19 tests. (2) Evaluating the current and future epidemiological situation by simulating the COVID-19 outbreak without relying on the observed (laboratory-confirmed) number of cases in simulation.
The first approach, taken by the IBZ , LSHTM (Russell et al., 2020), and (Ashcroft et al., 2020) models, is not mainly used for prediction purposes as it reflects the epidemiological situation with about two weeks of time delay (due to its dependence on observed COVID-19 reports). The IBZ model  estimates the daily reproduction number, R, of SARS-CoV-2 from observed COVID-19 incidence time series data after accounting for reporting delays and under-reporting using the numbers of confirmed hospitalizations and deaths. The R number describes how a pathogen spreads in a particular population by quantifying the average number of new infections caused by each infected person at a given point in time. The LSHTM model (Russell et al., 2020) adjusts the daily number of observed COVID-19 cases by accounting for under-reporting (uncertainty) using both deaths-to-cases ratio estimates and correcting for delays between case confirmation (i.e., laboratory-confirmed infection) to death.
The second approach, taken by ICL (Flaxman et al., 2020) and IHME (Reiner et al., 2020) models, usually requires a large number of various input parameters and assumptions. IHME (Reiner et al., 2020) model requires input parameters such as testing rates, mobility, social distancing policies, population density, altitude, smoking rates, self-reported contacts, and mask use. This model makes two key assumptions: 1) the infection fatality rate (IFR), which indicates the rate of people that die from the infection is taken using data from the Diamond Princess Cruise ship and New Zealand and 2) the decreasing fatality rate is reflective of increased testing rates (identifying higher rates of asymptomatic cases). ICL (Flaxman et al., 2020) model requires input parameters such as the daily number of confirmed deaths, IFR, mobility rates from Google, age-and country-specific data on demographics, patterns of social contact, and hospital availability. This model makes three key assumptions: 1) age-specific IFRs observed in China and Europe are the same across every country, 2) the number of confirmed deaths is equal to the true number of COVID-19 deaths, and 3) the change in transmission rates is a function of average mobility trends.   (only R) LSHTM (Russell et al., 2020) (only cases) ICL (Flaxman et al., 2020) (R, cases, hospitalizations, and deaths) IHME (Reiner et al., 2020) * (cases, hospitalizations, and deaths) # Based on each model's GitHub page (all models are available on GitHub). * The available packages are configured only for the IHME infrastructure.
To our knowledge, there is currently no model capable of accurately monitoring the current epidemiological situation and predicting future scenarios while considering a reasonably low number of parameters and accounting for the effects of environmental conditions, as we summarize in Table 1. The low number of parameters provides four key advantages: 1) allowing flexible (easy-to-adjust) configuration of the model input parameters for different scenarios and different geographical regions, 2) enabling short simulation execution time and simpler modeling, 3) enabling easy validation/correction of the model prediction outcomes by adjusting fewer variables, and 4) being extremely useful and powerful especially during the early stages of a pandemic as many of the parameters are unknown. Simulation models need to consider the fact that the environmental conditions (e.g., air temperature) affect pathogen infectivity (Fares, 2013;Kampf et al., 2020;Riddell et al., 2020;Xu et al., 2020) and simulating this effect helps to provide accurate estimation of the epidemiological situation.
Our goal in this work is to develop such a COVID-19 outbreak simulation model. To this end, we introduce COVIDHunter, a simulation model that evaluates the current mitigation measures (i.e., non-pharmaceutical intervention or NPI) that are applied to a region and provides insight into what strength the upcoming mitigation measure should be and for how long it should be applied, while considering the potential effect of environmental conditions.
Our model accurately forecasts the numbers of infected and hospitalized patients, and deaths for a given day, as validated on historical COVID-19 data (after accounting for under-reporting). The key idea of COVIDHunter is to quantify the spread of COVID-19 in a geographical region by calculating the daily reproduction number, R, of COVID-19 and scaling the reproduction number based on changes in both mitigation measures and environmental conditions. The R number changes during the course of the pandemic due to the change in the ability of a pathogen to establish an infection during a season and mitigation measures that lead to lower number of of susceptible individuals. COVIDHunter simulates the entire population of a region and assigns each individual in the population to a stage of the COVID-19 infection (e.g., from being healthy to being short-term immune to COVID-19) based on the scaled R number. Our model is flexible to configure and simple to modify for modeling different scenarios as it uses only three input parameters, two of which are time-varying parameters, to calculate the R number. Whenever applicable, we compare the simulation output of our model to that of four state-of-the-art models currently used to inform policy-makers, IBZ , LSHTM (Russell et al., 2020), ICL (Flaxman et al., 2020), and IHME (Reiner et al., 2020).
The contributions of this paper are as follows: • We introduce COVIDHunter, a flexible and validated simulation model that evaluates the current and future epidemiological situation by simulating the COVID-19 outbreak. COVIDHunter accurately forecasts for a given day 1) the reproduction number, 2) the number of infected people, 3) the number of hospitalized people, 4) the number of deaths, and 5) number of individuals at each stage of the COVID-19 infection. COVIDHunter evaluates the effect of different current and future mitigation measures on the COVIDHunter's five numbers.
• As a case study, we statistically analyze the relationship between temperature and number of COVID-19 cases in Switzerland. We find that for each 1 • C rise in daytime temperature, there is a 3.67% decrease in the daily number of confirmed cases. We demonstrate how considering the effect of climate (e.g., daytime temperature) on COVID-19 spread significantly improves the prediction accuracy.
• Compared to IBZ, LSHTM, ICL, and IHME models, COVIDHunter achieves more accurate estimation, provides no prediction delay, and provides ease of use and high flexibility due to the simple modeling approach that uses a small number of parameters.
• Using COVIDHunter, we demonstrate that the spread of COVID-19 in Switzerland is still active (i.e., R > 1.0) and curbing this spread requires maintaining the same or greater strength of the currently applied mitigation measures for at least another 30 days.
• We release the well-documented source code of COVIDHunter and show how easy it is to flexibly configure for any scenario and extend for different measures and conditions than we account for.

Overview
The primary purpose of our COVIDHunter model is to monitor and predict the spread of COVID-19 in a flexibly-configurable and easy-to-use way, while accounting for changes in mitigation measures and environmental conditions over time. We employ a three-stage approach to develop and deploy this

How does the COVIDHunter Model Work?
The COVIDHunter model predicts the dynamic value of R for a population at a given day while considering three key factors: 1) the transmissibility of an infection into a susceptible host population, 2) mitigation measures (e.g., lockdown, social distancing, and isolating infected people), and 3) environmental conditions (e.g., air temperature). Our model calculates the time-varying R number using Equation 1 as follows: The R number for a given day, t, is calculated by multiplying three terms: 1) the base reproduction number (R0) for the subject virus, 2) one minus the mitigation coefficient (M ), for the given day t and 3) the environmental coefficient (Ce) for the given day t.
The R0 number quantifies the transmissibility of an infection into a susceptible host population by calculating the expected average number of new infections caused by an infected person in a population with no prior immunity to a specific virus (as a pandemic virus is by definition novel to all populations). Hence, the R0 number represents the transmissibility of an infection at only the beginning of the outbreak assuming the population is not protected via vaccination. Unlike the R number, R0 number is a fixed value and it does not depend on time. The R number is a time-dependent variable that accounts for the population's reduced susceptibility. The R0 number for the COVID-19 virus can be obtained from several existing studies (such as in (Anastassopoulou et al., 2020;Hilton and Keeling, 2020;Chang et al., 2020;Shi et al., 2020;de Souza et al., 2020;Rahman et al., 2020)) that estimate it by modeling contact patterns during the first wave of the pandemic.
The mitigation coefficient (M ) applied to the population is a time-dependent variable and it has a value between 0 and 1, where 1 represents the strongest mitigation measure and 0 represents no mitigation measure applied. In different countries, mitigation measures take different forms, such as social distancing, self-isolation, school closure, banning public events, and complete lockdown. These measures exhibit significant heterogeneity and differ in timing and intensity across countries (Hale et al., 2020;Davies et al., 2020). Quantifying the mitigation measures on a scale from 0 to 1 across different countries is challenging. The Oxford Stringency Index (Hale et al., 2020) maintains a twice-weekly-updated index that takes values from 0 to 100, representing the severity of nine mitigation measures that are applied by more than 160 countries. Another study (Brauner et al., 2020) estimates the effect of only seven mitigation measures on the R number in 41 countries. We can directly leverage such studies for calculating the mitigation coefficient on a given day after changing the scale from 0:100 to 0:1 by dividing each value of, for example, the Oxford Stringency Index by 100.
The environmental coefficient (Ce) is a time-dependent variable representing the effect of external environmental factors on the spread of COVID-19 and it has a value between 0 and 2. Several related viral infections, such as the Influenza virus, human coronavirus, and human respiratory, already show notable seasonality (showing peak incidences during only the winter (or summer) months) (Moriyama et al., 2020;Fisman, 2012). The seasonal changes in temperature, humidity, and ultraviolet light affect the pathogen infectiousness outside the host (Fares, 2013;Kampf et al., 2020;Riddell et al., 2020;Xu et al., 2020). However, the indoor environmental conditions are usually well-controlled throughout the year, where human behavior and number of households can be the major contributor to the spread of the COVID-19 (Moriyama et al., 2020). There are currently several studies that demonstrate the strong dependence of the transmission of SARS-CoV-2 virus on one or more environmental conditions, even after controlling (isolating) the impact of mitigation measures and behavioral changes that reduce contacts. Several studies have demonstrated increased infectiousness by a country-dependent fixed-rate with each 1 • C fall in daytime temperature Prata et al., 2020). Another study supports the same temperature-infectiousness relationship, but it also finds that before applying any mitigation measures, a one degree drop in relative humidity shows increased infectiousness by a rate lower (2.94× less) than that of temperature . Another study follows a simple way of modeling the effect of seasonality on COVID-19 transmission using a sinusoidal function with an annual period (Noll et al., 2020).
One of the most comprehensive studies that spans more than 3700 locations around the world is HARVARD CRW . It finds the statistical correlation between the relative changes in the R number and both weather (temperature, ultraviolet index, humidity, air pressure, and precipitation) and air pollution (SO2 and Ozone) after controlling the impact of mitigation measures. The study provides a CRW Index that has a value from 0.5 to 1.5. The percentage difference between any two consecutive values provided by the CRW Index represents the effect that both weather and air pollutants have on the R number. For example, a drop in the CRW Index by 10% in a given location points to a 10% reduction in the R number due to weather changes and air pollutants. Our model enables applying any of these studies by adjusting our environmental coefficient on a given day, as we experimentally demonstrate in Section 3. For example, if the COVIDHunter user chooses to consider the HARVARD CRW study, and the CRW Index shows, for example, a 10% drop compared to its immediately preceding data point, then the environmental coefficient of COVIDHunter should be 0.9 so that the R value decreases by also 10%. Next, we explain how our model forecasts the number of COVID-19 cases based on Equation 1. days before symptom onset (Wei et al., 2020;Slifka and Gao, 2020). COVID-19 patients can develop symptoms mostly after an incubation period of 1 to 14 days (the median incubation period is estimated to be 4.5 to 5.8 days) (Lauer et al., 2020;Li et al., 2020). We calculate the number of days of being contagious after being infected as a random number with a Gaussian distribution that has user-defined lowest and highest values. Each contagious person may infect N other persons depending on mobility, population density, number of households, and several other factors (Ferguson et al., 2020).

Predicting the Number of COVID-19 Cases
We calculate the value of N to be a random number with a Gaussian distribution that has the lowest value of 0 and the highest value determined by the user. If N is greater than the R number (i.e., the target number of infections for that day has been reached), further infections are curtailed preventing overestimation of N by infecting only R persons. Once the contagious person infects the desired number of susceptible persons, the status of the contagious person becomes immune (IMMUNE). The immune status indicates that the person has immunity to reinfection due to either vaccination or being recently infected (Lumley et al., 2020;Jagannathan and Wang, 2021).
Our model also simulates the effect of infected travelers (e.g., daily cross-border commuters within the European Union) on the value of R. These travelers can initiate the infection(s) at the beginning of the pandemic. If such infected travelers are absent (due to, for example, emergency lockdown) from the target population, the virus would die out once the value of R decreases below 1 for a sufficient period of time. Both the number and percentage of infected travelers entering a region are configurable in our model. The percentage of incoming infected travelers is not affected by the changes in the local mitigation measures, as these travelers were infected abroad.
Our model predicts the daily number of COVID-19 cases for a given day t, as follows: where T IN F is the daily number of infected travelers that is a user-defined variable, N () is a function that calculates the number of persons to be infected by a given person as a random number with a Gaussian distribution, and U CON is the daily number of contagious persons calculated by our model.

Predicting the Number of COVID-19 Hospitalizations and Deaths
There are currently two key approaches for calculating the estimated number of both hospitalizations and deaths due to COVID-19: 1) using historical statistical probabilities, each of which is unique to each age group in a population (Bhatia and Klausner, 2020;Bi et al., 2020) and 2) using historical COVID-19 hospitalizations-to-cases and deaths-to-cases ratios (Kobayashi et al., 2020). We choose to follow a modified version of the second approach as it does not require 1) clustering the population into age-groups and 2) calculating the risk of each individual using the given probability, which both affect the complexity of the model and the simulation time.
The number of COVID-19 hospitalizations for a given day, t, can be calculated as follows: where Daily_Cases(t) is calculated using Equation 2 and X is the hospitalizations-to-cases ratio that is calculated as the average of daily ratios of the number of COVID-19 hospitalizations to the laboratory-confirmed number of COVID-19 cases. As the true number of cases is unknown due to lack of population-scale testing, it is extremely difficult to make accurate estimates of the true number of COVID-19 hospitalizations (Petropoulos and Makridakis, 2020). As such, we assume a fixed multiplicative relationship between the number of laboratory-confirmed cases and the true number of cases. We use the user-defined correction coefficient, C X , of the hospitalizations-to-cases ratio to account for such a multiplicative relationship.
The number of COVID-19 deaths for a given day t can be calculated as follows: where Daily_Cases(t) is calculated using Equation 2 and Y is the deaths-to-cases ratio, which is calculated as the average of daily ratios of the number of COVID-19 deaths to the number of COVID-19 laboratory-confirmed cases. The observed number of COVID-19 deaths can still be less than the true number of COVID-19 deaths due to, for example, under-reporting. We use the user-defined correction coefficient, C Y , to account for the under-reporting.
One way to find the true number of COVID-19 deaths is to calculate the number of excess deaths. The number of excess deaths is the difference between the observed number of deaths during time period and expected (based on historical data) number of deaths during the same time period. For this reason, C Y may not necessarily be equal to C X .

Model Validation
We can validate our model using two key approaches. 1) Comparing the daily R number predicted by our model (using Equation 1) with the daily reported official R number for the same region. 2) Comparing the daily number of COVID-19 cases predicted by our model (using Equation 2) with the daily number of laboratory-confirmed COVID-19 cases. As of 2021, we have already witnessed more than one year of the pandemic, which provides us several observations and lessons. The most obvious source of uncertainty, affecting all models, is that the true number of persons that are previously infected or currently infected is unknown (Wilke and Bergstrom, 2020). This affects the accuracy of the reported R number since it is calculated as, for example, the ratio of the number of cases for a week (7-day rolling average) to the number of cases for the preceding week. Adjusting the parameters of our model to fit the curve of the number of confirmed cases is likely to be highly uncertain. The publicly-available number of COVID-19 hospitalizations and deaths can provide more reliable data.
For these reasons, we decide to use a combination of reported numbers of cases, hospitalizations, and deaths for validating our model using three key steps. 1) We leverage the more reliable data of reported number of hospitalizations (or deaths) to estimate the true number of COVID-19 cases using the ratio of number of laboratory-confirmed hospitalizations (or deaths) to the number of laboratory-confirmed cases during the second wave of the COVID-19 pandemic. We assume that the COVID-19 statistics during the second wave is more accurate than that during the first wave because generally more testing is performed in the second wave. 2) We consider a multiplicative relationship between the true number of COVID-19 cases and that estimated in step 1. In our experimental evaluation (Section 3), we use the true number of COVID-19 cases calculated using different multiplicative factor values (we refer to them as certainty rate levels) as a ground-truth for validating our model. A certainty rate of, for example, 50% means that the true number of COVID-19 cases is actually double that calculated in step 1. 3) We use our model to calculate both the daily R number (using Equation 1) and the number of COVID-19 cases (using Equation 2). We fix the two terms of Equation 1, R0 and Ce, using publicly-available data for a given region and change the third term, M , until we fit the curve of the number of cases predicted by our model to the ground-truth plot calculated in step 2. We use the same methodology to validate our predicted numbers of hospitalizations and deaths with different certainty rate levels as we show in Section 3 and Supplementary Excel File (SimulationResultsForSwitzerland.xlsx).

Flexibility and Extensibility of the COVIDHunter Model
We especially build COVIDHunter model to be flexible to configure and easy to extend for representing any existing or future scenario using different values of the three terms of Equation 1, 1) R0, 2) M (t), 3) Ce(t), in addition to several other parameters such as the population, number of travelers, percentage of expected infected travelers to the total number of travelers, and hospitalizations-or deaths-to-cases ratios. Our modeling approach acts across the overall population without assuming any specific age structure for transmission dynamics. It is still possible to consider each age group separately using individual runs of COVIDHunter model simulation, each of which has its own parameter values adjusted for the target age group. The COVIDHunter model considers each location independently of other locations, but it also accounts for potential movement between locations by adjusting the corresponding parameters for travelers. By allowing most of the parameters to vary in time, t, the COVIDHunter model is capable of accounting for any change in transmission intensity due to changes in environmental conditions and mitigation measures over time. As we explain in Section 2.

Determining the Value of Each Variable in the Equations
We use Switzerland as a use-case for all the experiments. However, our model is not limited to any specific region as the parameters it uses are completely configurable. To predict the R number, we use Equation 1 that requires three key variables. We set the base reproduction number, R0, for the SARS-CoV-2 in Switzerland as 2.7, as shown in (Hilton and Keeling, 2020;Anastassopoulou et al., 2020). We choose two main approaches for setting the value of to the number of confirmed cases with two certainty rate levels of 100% and 50%, as we explain in detail in Section 2.5. This helps us to take into account uncertainty in the observed number of COVID-19 cases, hospitalizations, and deaths. We set the minimum and maximum incubation time for SARS-CoV-2 as 1 and 5 days, respectively, as 5-day period represents the median incubation period worldwide (Lauer et al., 2020;Li et al., 2020). We set the population to 8654622. We empirically choose the values of N , the number of travelers, and the ratio of the number of infected travelers to the total number of travelers to be 25, 100, and 15%, respectively.

Evaluating the Expected Number of COVID-19 Cases for Model Validation
As the exact true number of COVID-19 cases remains unknown (due to, for example, lack of population-scale COVID-19 testing), we expect the true number of COVID-19 cases in Switzerland to be higher than the observed (laboratory-confirmed) number of cases. We calculate the expected true number of cases based on both numbers of deaths and hospitalizations, as we explain in Section 2.5. To account for possible missing number of COVID-19 deaths, we consider the excess deaths instead of observed deaths. We calculate the excess deaths as the difference between 5-year average of weekly deaths and the observed weekly number of deaths in both 2020 and 2021. We find that X (hospitalizations-to-cases ratio) and Y (deaths-to-cases ratio, using excess death data) to be 3.75% and 2.441%, respectively, during the second wave of the pandemic in Switzerland. We choose the second wave to calculate the values of X and Y as Switzerland has increased the daily number of COVID-19 testing by 5.31× (21641/4074) on average compared to the first wave.
We calculate the expected number of cases on a given day t with certainty rate levels of 100% and 50% based on hospitalizations by dividing the number of hospitalizations at t by X and X/2, respectively, as we show in Figure 1. We apply the same approach to calculate the expected number of cases on a given day t with certainty rate levels of 100% and 50% based on deaths using Y and Y /2, respectively.
Based on Figure 1, we make two key observations. 1) The plot for the expected number of cases calculated based on the number of deaths is shifted forward by 10-20 days (15 days on average) from that for the expected number of cases calculated based on the number of hospitalizations. This is due to the fact that each hospitalized patient usually spends some number of days in hospital before dying of COVID-19. We do not observe a significant time shift between the plot of the expected number of cases calculated based on the number hospitalizations and the plot of observed (laboratory-confirmed) cases.
2) The expected number of cases calculated based on the number of hospitalizations is on average 2.7× higher than the expected number of cases calculated based on the number of deaths (after accounting for the 15-day shift) for the same certainty rate. This is expected as not all hospitalized patients die.
We conclude that both numbers of hospitalizations and deaths can be used for estimating the true number of COVID-19 cases after accounting for the time-shift effect. the hospitalizations-to-cases and deaths-to-cases ratios for the second wave. We assume two certainty rate levels of 50% and 100%.

Observed and Predicted R number of SARS-CoV-2
We calculate the predicted R number using our model (Equation 1) and compare it to the observed official R number and the R number of two state-of-theart models, ICL and IBZ, for the two years of 2020 and 2021. We configure COVIDHunter using the following configurations: 1) CTC as environmental condition approach, 2) certainty rate levels of 50% and 100%, and 3) mitigation coefficient values of 0.35 and 0.7. All our scripts are provided in our GitHub page. We consider the mean R number provided by the ICL model. We consider the median R number calculated by the IBZ model based on observed number of hospitalized patients. IBZ provides the predicted (after 9 April 2021) R number as the mean of the estimates from the last 7 days.

Fig. 2.
Observed and predicted reproduction number, R(t), for the two years of 2020 and 2021. We use CTC environmental condition approach, certainty rate levels of 50% and 100%, and mitigation coefficient values of 0.35 and 0.7 for COVIDHunter. We compare COVIDHunter's predicted R number to the observed R number and two state-of-the-art models, ICL and IBZ. The horizontal dashed line represents R(t) =1.0. Based on Figure 2, we make three key observations. 1) COVIDHunter predicts the changes in R number much (4-13 days) earlier than that predicted by ICL model, which leads to a more accurate prediction. The R number calculated by COVIDHunter (with a certainty rate level of 50%) before 19 April 2021 is on average 1.1× more than that provided by ICL model, IBZ model, and the observed official R number. Using a certainty rate level of 100%, COVIDHunter predicts the R number to be close in value to the observed R number. The R numbers calculated by IBZ model and official authority (observed) are normally not provided for the last two weeks (as we discussed in the Section 1). 2) Our model predicts that the current R number is still higher than 1 (1.215 and 1.099 using certainty rate levels of 50% and 100%, respectively) during April 2021. This indicates that the spread of the We conclude that COVIDHunter's estimation of the R number is more accurate than that calculated by the ICL and IBZ models, as validated by the currently observed R number.

Evaluating the Mitigation Measures
We evaluate the mitigation coefficient, M (t), which represents the mitigation measures applied (or to be applied) in Switzerland from January 2020 to June 2021. We use two different environmental condition approaches, CRW and CTC. We assume two certainty rate levels of 50% and 100% to account for uncertainty in the observed number of cases. We use five mitigation coefficients, M (t), values of 0.35, 0.4, 0.5, 0.6, and 0.7 for each configuration of COVIDHunter during 19 April to 19 May 2021. We compare the evaluated mitigation measures to that evaluated by the Oxford Stringency Index (Hale et al., 2020), as we provide in Figure 3. We also evaluate the mitigation coefficient when we ignore the effect of environmental changes (i.e., by setting Based on Figure 3, we make four key observations. 1) Excluding the effect of environmental changes from the COVIDHunter model, by setting Ce=1 in Equation 1, leads to an inaccurate evaluation of the mitigation measures. For example, during the summer of 2020 (between the two major waves of 2020), COVIDHunter (WithoutCTC_50%) evaluates the mitigation coefficient to be as high as 0.6. This means that the mitigation measures (only mandatory of wearing mask on public transport) applied during the summer of 2020 are only 14% more relaxed compared to the mitigation measures (e.g., closure of schools, restaurants, and borders, ban on small and large events) applied during the first wave, which is implausible. This highlights the importance of considering the effect of external environmental changes on simulating the spread of COVID-19. Unfortunately, environmental change effects are not considered by any of the IBZ, LSHTM, ICL, and IHME models, which we believe is a serious shortcoming of these prior models. 2) A drop by 3-30% (as we observe during the mid of November 2020 and the end of August 2020, respectively) in the strength of the mitigation measures for a certain period of time (10 to 20 days) is enough to double the predicted number of COVID-19 cases. 3) We evaluate the strength of the mitigation measures applied in Switzerland to be usually (65% of the time) up to 80% to 131% higher than that provided by the Oxford Stringency Index. 4) The strength of the mitigation measures has changed 11 times and 2 times during the years of 2020 and 2021, respectively, each of which is maintained for at least 9 days and at most 66 days (32 days on average).
We conclude that considering the effect of environmental changes (e.g., daytime temperature) on the spread of COVID-19 improves simulation outcomes and provides accurate evaluation of the strength of the past and current mitigation measures.

Evaluating the Predicted Number of COVID-19 Cases
We evaluate COVIDHunter's predicted daily number of COVID-19 cases in Switzerland. We compare the predicted numbers by our model to the observed numbers and those provided by three state-of-the-art models (ICL, IHME, and LSHTM), as shown in Figure 4. We calculate the observed number of cases as the expected number of cases with a certainty rate level of 100% (as we discuss in Section 3.2). We use three default configurations for the prediction of the ICL model: 1) strengthening mitigation measures by 50%, 2) maintaining the same mitigation measures, and 3) relaxing mitigation measures by 50% which we refer to as ICL+50%, ICL, and ICL-50%, respectively, in Figures 4,5,and 6. We use the mean numbers reported by the IHME model that represents the most relaxed mitigation measures, called as "no vaccine" by the IHME model. We use the median numbers reported by the LSHTM model. Based on Figure 4, we make four key observations. 1) Our model predicts that the number of COVID-19 cases reduces significantly (less than 50 daily cases) within May 2021 if the mitigation measures that are applied nationwide in Switzerland are tightened (M(t) increases from 0.55 to 0.7) for at least 30 days. If the authority decides to relax the mitigation measures to the lowest strength that has been applied during the year of 2020 (i.e., M (t) = 0.35), then the daily expected number of cases increases by an average of 5.1× and 4.13× (up to 17,892 daily cases) using the CRW and CTC environmental approaches, respectively. We provide a comprehensive evaluation for the effect of different mitigation coefficient values on the number of cases in the Supplementary Materials, Section 2. 2) COVIDHunter (CTC_100%_M(t)=0.7) predicts the number of COVID-19 cases to be equivalent to that predicted by the IHME model during the second wave with a certainty rate level of 100%. However, during the first wave, the prediction of the IHME model is 3.8 × less than the expected number of cases using a certainty rate level of 100%. This means that, unlike our model, the IHME model considers the laboratory-confirmed cases during the first wave to be as if the tests are done at a population-scale, which is very likely incorrect. This is in line with a recent study (Ioannidis et al., 2020) that demonstrates the high inaccuracy of the IHME model. 3) Overall, our model predicts up to 7.9× and 6.4× (on average 1.9× and 2.1×) smaller number of COVID-19 cases than that predicted by ICL model using CTC and CRW approaches, respectively, and a certainty rate of 50%. This suggests that the multiplicative relationship between the confirmed number of cases and the true number of cases can be represented by a certainty rate of 22% to 33%, which our model can easily account for. 4) The number of COVID-19 cases estimated by the LSHTM model during the first wave is 1) on average 24% less than that estimated by COVIDHunter and 2) 10 days late from that predicted by COVIDHunter, IHME, and ICL. The prediction of the LSHTM model during the second wave is not available by the model's pre-computed projections.
We conclude that COVIDHunter provides more accurate estimation of the number of COVID-19 cases, compared to IHME (which provides inaccurate estimation during the first wave) and ICL (which provides over-estimation), with a complete control over the certainty rate level, mitigation measures, and environmental conditions. Unlike LSHTM, COVIDHunter also ensures no prediction delay.

Evaluating the Predicted Number of COVID-19 Hospitalizations
We evaluate COVIDHunter's predicted daily number of COVID-19 hospitalizations in Figure 5. We use the observed official number of hospitalizations as is. Using the number of cases calculated with Equation 2, we find X (hospitalizations-to-cases ratio) to be 4.288% and 2.780%, using CRW and CTC, respectively, during the second wave.

Number of COVID-19 Hospitalizations
Date Date We make five key observations based on Figure 5. 1) COVIDHunter (CRW_50%_M(t)=0.7) with a certainty rate level of 50% predicts on average 5.33× smaller number of COVID-19 hospitalizations than that calculated by the IHME model.
2) The ICL model predicts the number of hospitalizations to be similar to that predicted by COVIDHunter (CTC_50%_M(t)=0.7) during the first and the second waves. This suggests that both the ICL model becoming as high as the peak of the second wave (up to 767 daily hospitalized patients), using the CRW and CTC environmental approaches, respectively.
ICL model predicts the situation to be worst, showing 2× and 3.74× higher number of hospitalizations than COVIDHunter CRW_50%_M(t)=0.35 and CRW_50%_M(t)=0.35, respectively, when ICL model is configured to 50% relaxation in the mitigation measures. We provide a comprehensive evaluation for the effect of different mitigation coefficient values on the number of hospitalizations in the Supplementary Materials, Section 2. 5) The use of the CTC approach for determining the environmental coefficient value yields a slightly different number (on average 1.7× less) of hospitalizations compared to that provided by the use of the CRW approach. This is expected as the CTC approach considers only the monthly average change in temperature, whereas the CRW approach considers the daily change in several environmental conditions.
We conclude that 1) unlike the IBZ and LSHTM models, COVIDHunter is able to predict the number of hospitalizations and 2) COVIDHunter provides more accurate estimation of the number of hospitalizations compared to that calculated by ICL (which provides overestimation) and IHME (which provides late estimation). COVIDHunter predicts the number of COVID-19 hospitalizations in a simple, convenient and flexible way that requires calculating only the daily number of cases and the hospitalization-to-cases ratio, C X .

Evaluating the Predicted Number of COVID-19 Deaths
We evaluate COVIDHunter's predicted daily number of COVID-19 deaths in Figure 6 after accounting for the 15-day shift (as we discuss in Section 3.2).
We calculate the observed number of deaths as the number of excess deaths (Section 2.4) to account for uncertainty in reporting COVID-19 deaths. Using the number of cases calculated using Equation 2, we find Y (deaths-to-cases ratio, using excess death data) to be 2.730% and 1.739%, using CRW and CTC, respectively, during the second wave.
We make three key observations based on Figure 6. 1) COVIDHunter with a certainty rate of 100% predicts the number of deaths to perfectly fit the three curves of the observed number of excess deaths, ICL deaths, and IHME deaths, reaching up to 144 deaths a day. During the second wave, the ICL curve is shifted (late prediction) by 5-10 days from that of other models. We conclude that 1) unlike the IBZ and LSHTM models, COVIDHunter is able to predict the number of deaths, 2) COVIDHunter predicts the number of deaths to be similar to that predicted by the ICL and IHME models. Yet, COVIDHunter provides more accurate estimation of other COVID-19 statistics (R, number of cases and hospitalizations) compared to ICL and IHME, as we comprehensively evaluate in the previous sections, and 3) COVIDHunter requires calculating only the daily number of cases and the deaths-to-cases ratio, C Y , to predict the daily number of deaths.

Summary and Future Work
We demonstrate that we can monitor and predict the spread of COVID-19 in an easy-to-use, flexible, and validated way using our new simulation model, We benchmark our model against major alternative models of the COVID-19 pandemic that are used to assist governments. Compared to these models, We provide insights on the effect of each change in the strength of the applied mitigation measure on the number of daily cases, hospitalizations, and deaths. We make all the data, statistical analyses, and a well-documented model implementation publicly and freely available to enable full reproducibility and help society and decision-makers to accurately and openly review the current situation and estimate future impact of decisions.
We suggest and plan at least five main directions/additions to further improve the predictive power and benefits of our COVIDHunter model. 1) Clustering the population based on age-groups. This has potential different effects on, for example, population, environmental conditions, mitigation measures (Bhatia and Klausner, 2020;Bi et al., 2020). 2) Considering vaccinated persons as another new category of persons in a population. 3) Considering reinfection after immunity (Lumley et al., 2020)

Statistical Relationship Between Temperature and Number of COVID-19 Cases
The purpose of this study is to explore the relationship between the daily new confirmed COVID-19 case counts or death counts and temperature in Switzerland. We obtain the daily number of confirmed COVID-19 cases and deaths in Switzerland from official reports of the Federal Office of Public Health (FOPH) in Switzerland [1] starting from March 2020 until January 2020. We obtain the air temperature data from the Federal Office of Meteorology and Climatology (MeteoSwiss) in Switzerland [2]. We calculate the daily average air temperature during the same time period (March 2020 to December 2020) for all the 26 cantons in Switzerland.
To evaluate the correlation between the temperature data and the number of daily confirmed COVID-19 cases or the daily counts of death, we use a generalized additive model (GAM). GAM is usually used to calculate the linear and non-linear regression models between meteorological factors (e.g., temperature, humidity) with COVID-19 infection and transmission [3,4,5]. Our analyses are performed with R software version 4.0.3., where p − value < 0.05 is considered statistically significant. Our model attempts to represent the linear behavior of the growth curve of the counts of the new confirmed cases or deaths in Switzerland. Therefore, we can test the hypothesis of whether there is a significant negative correlation between the COVID-19 confirmed daily case or death counts and temperature.
The results demonstrate a significant negative correlation between temperature and COVID-19 daily case and death counts. Specifically, the relationship is linear for the average temperature in the range from 1-26 • C. Based on Figure S1, we make two key observations. 1) For each 1 • C rise in temperature, there is a 3.67% (t-value = -3.244 and p-value = 0.0013) decrease in the daily number of COVID-19 confirmed cases ( Figure S1(a)). 2) For each 1 • C rise in temperature, there is a 23.8% decrease in the daily number of COVID-19 deaths (t-value = -9.312 and p-value = 0.0), as shown in Figure S1

Evaluated Datasets
Our experimental evaluation uses a large number of different real datasets, including 1) daily R number values, 2) observed daily number of COVID-19 cases, 3) observed daily number of COVID-19 hospitalizations, 4) observed daily number of COVID-19 deaths, 5) number of excess deaths, 6) the estimated strength of mitigation measures as calculated by the Oxford Stringency Index, 7) estimation of COVID-19 statistics as calculated by existing state-of-the-art simulation models, ICL, IHME, LSHTM, and IBZ, from seven different sources as we list below. The raw datasets are provided in the Supplementary Excel File 1 and it can be also obtained from the original sources as we list below: •