Implementation of An Online COVID-19 Epidemic Calculator for Tracking the Spread of the Coronavirus in Singapore and Other Countries

This paper describes the methods underlying the development of an online COVID-19 Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available infection case data, the calculator is used to estimate the effective reproduction number, doubling time, final epidemic size, and death toll. As a case study, we analyzed the results for Singapore during the “Circuit breaker” period from April 7, 2020 to the end of May 2020. The calculator shows that the stringent measures imposed have an immediate effect of rapidly slowing down the spread of the coronavirus. After about two weeks, the effective reproduction number reduced to 1.0. Since then, the number has been fluctuating around 1.0.The COVID-19 Epidemic Calculator is available in the form of an online Google Sheet and the results are presented as Tableau Public dashboards at www.cv19.one. By making the calculator readily accessible online, the public can have a tool to meaningfully assess the effectiveness of measures to control the pandemic.


Introduction
As countries around the world take drastic measures to contain the COVID-19 pandemic, people need to understand the effectiveness of such interventions. Making sense of epidemiological data can be challenging given confusing and overlapping terminology. Raw data and statistics on infection numbers (e.g. Figure 1) do not directly help answer the following questions: (1) Are social distancing measures working? (2) How much longer does it take to flatten the curve? (3) What will be the final death toll? Figure 1. Total Confirmed Cases in Singapore [1]. This paper describes the methods underlying the online COVID-19 Epidemic Calculator for tracking and estimating COVID-19 growth parameters, including reproduction number, doubling time, final epidemic size, and death toll. These methods are illustrated using the case example of Singapore. We demonstrate how the calculator can reveal the effect of imposing strict social distancing measures ("Circuit breaker") from April 7, 2020 that is not apparent from just looking at infection numbers.
While our methodology is similar to several freely available software packages and programming codes for calculating the effective reproduction number (e.g. [21,23,24,33]), we have implemented the calculations in the widely used Excel spreadsheet. The execution time is a matter of seconds, even for calculating a 4-month, 100-country data set. The calculator is also available as an online Google Sheet to facilitate sharing and collaboration. The input data are obtained from publicly available sources [2,3,31] and are updated daily.
The first section of this paper will introduce and define basic terminology for understanding infectious disease transmission given that publicly available data are related to these variables. As the reproduction number is used as an indicator of the effectiveness of interventions, we will demonstrate how to estimate this number using different methods. First, we use the SEIR model while accounting for changing reproduction number and undetected infections. Next, we estimate the effective reproduction number, a more accurate index, using a non-parametric approach and a Bayesian approach. We also show how to derive estimates of dates of actual symptom onset and dates of being exposed which are important for our estimation of the effective reproduction number. To answer the question about the time needed to flatten the curve, we will estimate the doubling time, defined as the time for the total number of infections to double at the current rate. Finally, we describe a method to forecast the final total number of cases and deaths. Singapore. Figure 2 clarifies the different overlapping terminology used in epidemiology and illustrates the timeline for the various stages of infection. Exposed is the state at which an individual first becomes infected but is not yet contagious. The latent period is the time from being infected (exposed) to becoming contagious. An infected person can be contagious even before the onset of symptoms. Data suggests that some people could have infected others 1 to 3 days before they developed symptoms [10,11].

Introduction to terminology
The incubation period is the time from exposed to the onset of symptoms. The mean incubation period for COVID-19 is estimated to be 5 days [4,29]. The infectious period is the time between becoming contagious to the time of removal or recovery. Hence, it is the difference between the time of removal and the latent period (Tremoved -Tlatent).

Secondary infection
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint In Singapore, the 14-day average time from the onset of symptoms to removal ranges from 1.5 to 6 days after the start of the Circuit breaker on April 7, 2020 ( Figure 3).
The serial interval is the time when a secondary infection is generated. For COVID-19 in Singapore, the serial interval between transmission pairs ranges between 3 days and 8 days [5]. Other researchers have reported serial intervals within the same range [6,7,8,29].

Method for estimating the basic reproduction number using the SEIR model
One of the important numbers in epidemiology is the basic reproduction number, R0, defined as the expected number of infections directly generated by one case in a population in which all individuals are susceptible to infection [15,28]. R0 can be estimated by fitting the data on reported infections to a Susceptible-Exposed-Infectious-Removed (SEIR) model. S Susceptible

E Exposed
CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint Figure 4 is a graphical representation of the classic deterministic SEIR model which shows people passing through four infection states -Susceptible (S), Exposed (E), Infected (I), and Removed (R) [12,13,14,27]. The model is governed by four differential equations. To estimate the model parameters (S, E, I, R, and β), we use the maximum likelihood method [15]. We assume that the number of reported cases, xi, at time, ti, follows the Poisson distribution and has a mean of , where is R, the calculated number of removed individuals at time, ti.
Then, the log-likelihood function to be maximized is . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint We choose the parameter values for I(t = 0) and β that would maximize the log-likelihood function. This can be done by using the Solver function in Excel. From the SEIR model, the basic reproduction number can then be calculated according to the formula 0 = ⁄ .

Accounting for changing reproduction number
With intervention, specifically social distancing measures, the transmission rate, β, and reproduction number will change over time. For a good fit and to obtain a moving average, parameter estimation is best done over a relatively short window of time, say between 7 and 14 days. Figure 5 shows an example where the data is fitted over a 14-day period. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint

Accounting for undetected infection
It is often the case that not all infected cases are reported. To account for undetected infection, one strategy is to estimate the ratio of reported cases (R) to total cases (C), p, and compare this to known values of local case morbidity rate of the disease, ρ (adjusted for age distribution and clinical care) [16].
where D(t) is the reported number of cases that have required critical care or have died. R(t) is the reported number of cases in any period. Figure 6 provides a visual representation of this concept. If we can assume ρ to be constant across regions, then the smallest value of ρ across such different regional data may provide an upper bound for p. Data from studies have suggested about 5 per cent of COVID-19 patients require critical care [1].

Figure 6. A diagram to illustrate the proportion of different groups at various stages of infection
In a test sample of the Singapore population, 39,000 preschool staff were screened and 13 tested positive using the polymerase chain reaction (PCR) test [1]. This represents about 0.035 per cent of those tested. Hence, the number of undetected infections in Singapore is currently considered small.

Method for deriving symptom onset dates from confirmation dates
The daily number of reported cases is partly dependent on the number of tests conducted, which may be variable due to factors such as testing capacity and the day of week. To account for this CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint variation, we perform a running 3-day average of test cases. Other methods of applying a smoothing filter to the time series may be used if appropriate.
Another issue is the delay between the onset of symptoms and case confirmation (removal or isolation). Case onset dates can be derived if records of onset-to-confirmation dates are available for every individual (e.g. see Fig. 2). Otherwise, case onset dates can be estimated by using the following procedure.

i)
For each date, distribute case counts back in time according to a Poisson distribution with a mean of 3 days (symptom onset to removal) as illustrated in Figure 7. ii) Sum the back distributed case counts for each date to derive the onset curve as shown in Figure 8. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint iii) Distributing reported cases back in time and recreating the onset curve result in a "right-censored" time series. This means that there are onset cases close to the present date that are yet to be reported. We correct this by estimating the percentage of onset cases on Day (t-a) that have not yet been reported by today (Day t). We can use the cumulative distribution function of the Poisson "onset-to-removed" distribution to adjust for the number of onset cases, thus removing right censoring.
Adjusted onset = Onset (Delay ≤ Days from present date) Consider an example illustrated in Figure 9. Three days ago, there were 470 reported onset cases. This represents the fraction of the actual number reported over the next 3 days. This fraction is equal to the value of the cumulative distribution function of our Poisson distribution at Day 3, which is 65%. Hence, the current count of onset on that day represents 65% of the actual total. After adjustment, the actual total is estimated to be (1/0.65) of 470, which is 723. Figure 10 shows the adjusted onset curve.    is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Method for deriving infection (exposed) dates from onset dates
A similar procedure as in Section 4 can be applied to the onset counts to derive the infection (exposed) time series. Figure 11 shows the adjusted exposed time series where the incubation period (from exposed to symptom onset) follows a Poisson distribution with a mean of 5 days.

Method for estimating the effective reproduction number, R(t)
The basic reproduction number, R0, is the expected number of infections directly generated by one case given that all individuals are equally susceptible. As the infection spreads, the susceptibility of the population decreases. The effective reproduction number, R(t), is related to the basic reproduction number, R0, by ( ) = 0 ( ), where S(t) is the average susceptibility of the population. R(t) is often used as an indicator of the effectiveness of interventions, such as social distancing measures, to contain the spread of a virus. If R(t) is greater than 1.0, the infection is growing at an exponential rate. If R(t) is at 1.0, the spread is sustained at a linear rate. If R(t) is less than 1.0, the infection is spreading at a slower pace and will eventually die out.
Although R(t) cannot be measured directly, it can be estimated via two methods. We describe these two methods that can be implemented in a spreadsheet without any programming codes.

Non-parametric approach
Wallinga and Lipsitch [17,18] have developed a non-parametric method to derive the reproduction number from the exponential growth rate, Adjusted Onset Adjusted Exposed . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. where c(t) is the rate of new infections at time t, and w(a) is the probability density function of the serial interval (see Figure 2).
The serial interval is the time from being infected to generating a secondary infection. We assume a gamma distribution with a mean serial interval of 7 days and a peak (most infectious) at Day 4 ( Figure 12). This accounts for a latent period where the exposed individual is not yet infectious. Figure 12. A gamma distribution calculated using the GAMMA.DIST function in Excel.
Equation (6) can be evaluated in a spreadsheet using the infection data derived above and a numerical integration scheme. Figure 13 shows a plot of Rt against time for Singapore during the Circuit breaker period. Circuit breaker . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint settled to around 1.0 after about two weeks. Since then, the infection rate has remained sustained. Given that dormitory residents make up the majority of the infected individuals, it can be concluded that individuals continue to infect others with a reproductive ratio of approximately 1 to 1 in that setting during that time period as depicted in Figure 13.
To estimate the confidence intervals of the R(t) estimate, we first discretize equation (6), Note that the expected number of secondary infections generated on Day t by individuals infected on Day (t-a) is where c(t-a) is the number of new infections a days before the present day, and w(a) is the probability that the serial interval is a. Not all the c(t-a) cases will generate new infections on Day t. In general, the probability that k out of n cases will generate secondary infections with a serial interval of a is binomially distributed, This is a binomial distribution with a mean of ( ) and a variance of ( )�1 − ( )�. Table  2 shows the formulas for the probability distributions for N days before Day t (the present date).

No. of infected individuals on Day (t-a)
Probability of having k individuals with serial interval a The probability for the total number of secondary infections on Day t, = ∑ =1 , is the sum of N binomial distributions. Thus, using the binomial sum variance inequality [19,20], the upper bound of the variance of Z can be calculated.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. .
Equation (10) allows us to estimate the confidence intervals of the R(t) estimates.

Bayesian approach
The Bayesian approach allows us to continuously update our estimate of a set of parameters, Θ, as more data becomes available.
(Θ), the prior distribution, represents our prior estimates about the true value of Θ.
( |Θ) is the likelihood distribution. It is also often written as ℒ(Θ| ) which means the probability of observing the data given Θ. For the method to work, it is necessary to calculate the likelihood distribution for all possible values of Θ.
( ) is the model evidence and it is the same for all possible hypotheses (values of Θ) being considered.
(Θ| ) is the posterior distribution and represents our updated estimate of the value of Θ given the observed data.
The main objective of Bayesian inference is to calculate the posterior distribution of our parameters using our prior beliefs updated with our likelihood. From the posterior distribution, we can determine the most likely values of Θ given the observed data. Since we are usually only interested in relative probabilities of different hypotheses, ( ) can be left out of the calculation and we write the model form of Bayes' theorem as where ∝ means "proportional to". For estimating Rt, the Bayes' theorem that we use is where the data, kt, is the daily number of cases, and the parameter, Rt, is the effective reproduction number.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint Equation (13) is updated every day by using yesterday's posterior, ( −1 | −1 ), to be today's prior ( ). On day two, the equation becomes So generally, Assuming a uniform starting prior ( 1 ), this reduces to: Note that the posterior on any given day is equally influenced by the distant past as much as the recent day. This is fine if we are estimating a static parameter that does not change with time. However, the value of Rt is dynamic and is more closely related to recent values than older ones.
To address this issue, we can adopt Systrom's approach [21] of only incorporating the last m days of the likelihood function:

Bettencourt & Ribeiro's likelihood function
To calculate the likelihood function ℒ( | ) = ( | ), we first assume that the number of new infections on any given day can be described by a Poisson probability distribution with a mean of λ. The probability of seeing k new cases is Bettencourt & Ribeiro [22] has derived an equation relating Rt to λ.
where γ is the reciprocal of the serial interval (see Figure 1). Figure 14 shows the variation of λ with Rt for some values of kt-1.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Equations (18) and (19) allow us to reformulate the likelihood function as a Poisson distribution, parameterized by fixing k and varying Rt. Figure 15 shows that as k increases, the peak value of the likelihood function ℒ( | ) increases and the distribution becomes less spread out about. This means that as the number of infections increases the confidence of our Rt estimate should improve. Likelihood k=999 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint In evaluating the posteriors, it is more convenient to use the logarithm of the likelihood function.
To perform the Bayesian update, we can do a sum of the log-likelihoods over the last m days and then exponentiate to get the likelihood. From equations (17) and (21), From the posterior distribution ( Figure 16) we can also obtain the confidence interval for Rt. For example, Figure 17 shows the most likely values of Rt and the confidence interval over time for Singapore. We can see that Rt changes with time and the confidence interval narrows with more data.
The results are in agreement with those calculated using the non-parametric approach ( Figure 13) and the EpiEstim code ( Figure 18) [23,24]. Day 50 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Method for estimating doubling time
Another metric often used to measure infection rate in a population is the doubling time, defined as the time for cumulative cases to double based on the current growth rate [25]. Circuit breaker . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint where 0 is the initial number of cases, Td is the doubling time, and r is the exponential growth rate. Taking the logarithm on both sides of the equation, ln� ( )� = ln( 0 ) + ln� ( + )� = ln( 0 ) + ( + ) = ln(2)+ln( 0 ) + = ln(2) As the growth rate slows, the doubling time increases accordingly. Note that the gradient of the ln� ( )� curve is equal to r. Time-varying estimates of the doubling time can be made with a 7day sliding window by iteratively fitting a linear regression model to ln� ( )�. Figure 19 shows the log plot of the accumulated cases and the doubling time calculated using the Excel LINEST function.
Again, we can directly see the positive effect of the Circuit breaker measures that started on April 7, 2020. From a low point of about 5 days the doubling time has increased to about 4 weeks in slightly more than a month. Figure 19. It is easier to see the pronounced effect of the Singapore Circuit breaker on a plot of the doubling time (right) than the accumulated cases plot (left).

Forecasting the final total number of cases and deaths
When the growth rate is slowing down, we can project the final total cases and death counts by fitting publicly available data to the logistic model. The logistic model is often used to describe the shape of the cumulative epidemic curve (Figure 20) where the number of infected cases grow exponentially at first, then slows down, and finally flattens to a maximum limit. The final epidemic size can be estimated based on this slowing growth. For our application, the total number of cases at time t can be approximated by ( [15]) where r is the exponential growth rate, 0 and are the initial and final numbers, respectively. To find the best fit, we use the maximum likelihood method to estimate (see equations (2) and (3)). The parameter is estimated over a relatively short rolling window of, say 14 days, to obtain a moving update. See Figure 21.  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 4, 2020. .
With an estimate of the final number of cases, we can combine it with the data on case fatality rate to further estimate the final death counts. The case fatality rate varies from region to region but in many countries with a high number of cases they tend to stabilize over time ( Figure 22). For example, the case fatality rate for Singapore has remained quite stable at slightly below 1 per 1000 infected cases. Therefore, the final death count is projected to be around 35, assuming a final case number of 43,000. According to the data on deaths per 100 cases in Table 3, Singapore has one of the lowest COVID-19 case fatality rates in the world. Table 3. The 20 countries with the lowest case fatality rate.    Table 4 shows the forecasts for a few countries including Singapore and how they compare with the projections by the Institute for Health Metrics and Evaluation (IHME) [26] at University of Washington Medicine. The projections by the IHME are based on more complex analytics and take into account factors such as changes in social distancing measures, diagnostic capability, and hospital capacity. Given that we did not directly account for these factors, our forecasts of the total number of cases and deaths may be considered indicative only. On the other hand, our estimation of reproduction numbers and doubling times based on the current data can be considered reliable and valid. Assuming current prevailing conditions in the populations, results from the COVID-19 Epidemic Calculator are likely to be realistic estimates.

Conclusions
This paper describes the methods underlying the online COVID-19 Epidemic Calculator for tracking COVID-19 growth parameters. From publicly available data, the calculator is used to estimate the distributions at time of symptom-onset and infection, effective reproduction number, doubling time, final epidemic size, and death toll for Singapore and other countries.
The calculator and the associated graphs clearly show that the Circuit breaker measures imposed from April 7, 2020 in Singapore had an immediate effect of rapidly slowing down the spread of the COVID-19. Additionally, the results also reveal that the effective reproduction number has CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint settled to around 1.0 after about two weeks. Since then, it has remained at that level. This indicates that the infection rate among the dormitory residents is sustained and not likely to be reduced until this group become less susceptible.
The COVID-19 Epidemic Calculator is available in the form of an online Google Sheet [30] that imports daily infection data from the European Centre for Disease Prevention and Control [31]. The results are presented online as dashboards on Tableau Public [32] (Figure 23). It has the advantage of fast execution time without the need for any specialized software package or programming script. Users can also interact with the models by changing the parameters. Comparing with other similar work, our parameter estimates are found to be in good agreement with those estimated using different models and software. By making the COVID-19 Epidemic Calculator readily accessible online, it is hoped that the public and interested learners have the tool to meaningfully assess our effort in fighting COVID-19. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 4, 2020. . https://doi.org/10.1101/2020.06.02.20120188 doi: medRxiv preprint