A poor-man's approach to the effective reproduction number: the COVID-19 case

It is shown that estimates of the effective reproduction number Rt for COVID-19 using standard packages such as EPIESTIM can be reproduced very accurately using the expression Rt = c(t) /c(t + τ) , where c(t) is the incidence at time t and τ the mean value of the series interval

The effective reproduction number -a fundamental epidemiological parameter that characterizes the temporal dynamics of an infectious disease-is notoriously difficult to determine without detailed modeling [1][2][3][4][5][6]. During the ongoing COVID-19 pandemic, daily tallies of the incidence (new cases on day ), recovered (individuals who are declared cured on day ) and deceased (individuals who die on day ) have become available for many countries and regions [7]. With this information, one might naively expect that can be determined as , (1) with . This definition is exactly equivalent to the expression for the basic reproduction number in a 'Susceptible-Infectious-Removed' (SIR) framework, but can be obviously applied to the observed data without any reference to a specific model. Calculations with Eq. (1) for COVID-19, however, disagree with values of computed from standard packages such as EPIESTIM [4], as seen in Fig.  1.
Part of the reason for the discrepancy is the lack of reliable data. For some countries, such as the United Kingdom, the published cumulative number of recovered people is less than the cumulative number of deceased individuals, a very unlikely scenario. Even globally, the sum total of deceased individuals is a substantial fraction of cured individuals, which is also highly implausible. Furthermore, there is a relative time shift between and , since the former includes the incubation time plus the time to develop symptoms serious enough to be reported. However, attempting to correct for this relative delay does not improve the agreement in Fig. 1 in any significant way.
Standard packages do not suffer from the above problems because they rely only on data, and therefore are not affected by systematic errors in . For these calculations one requires the infectivity profile of the disease, which is approximated as the distribution of the standard interval [4]. Calculations based on EPIESTIM [4] using a series interval  4.7 days and standard deviation 2.9 days, from Ref. [8], are updated daily at [9]. A more recent study finds 5.8 days, with 44% presymptomatic transmission [10]. The recovery time, on the other hand, is much longer, and this implies that individuals at advanced stages of the disease have a very low infectivity. Under these conditions Eq. (1) cannot be valid, since its denominator should not be but the number of people who became infected approximately at time . The latter is greater than the former during the ramp-up phase of the disease, and this explains why the curve calculated from Eq. (1) shows a higher during this time. Conversely, the number of people who became infected at time is less than at later stages, and this explains why the curve calculated from Eq. (1) gives a lower value of the number of at these times.
The above considerations suggest an alternative definition of as , . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 27, 2020. proportional to the square of the second derivative . The relative weight of the two terms is controlled by a parameter that is defined as in Eq. 3 of Ref. [11], with the matrices and taken as the identity matrix. The chosen value was =100 day 4 . A comparison between and is shown in the inset of Fig. 2 for the case of South Korea. The regularization method has the advantage that it distorts the shape of a curve less than a typical sliding average, while being superior to Savitzky-Golay [12] smoothing in terms of noise reduction. This was important in the context of the comparison made here between Eq. (2) and the EPIESTIM package, because artificial shape distortions are eliminated. For practical applications of Eq. (2), however, a simple 7day running average should suffice.
Two calculations with EPIESTIM were performed, both with 5.8 days and a standard deviation of 2.9 days. The first calculation uses a time window w = 1 day, and the second calculation a time window w =7 days. In the latter case, the value that appears in the figures at a particular time corresponds to the window for which is the middle point. The w = 1 day results are very noisy, as expected, but both EPIESTIM curves are in very good agreement with the calculations from Eq. (2). It is important to point out that the incidence data have a significant delay with respect to the actual time of infection, so that all figures should be interpreted at representing the situation between one and two weeks before the dates in the horizontal axes. We made no attempt to correct for reporting delays as in Ref. [9].
In summary, an extremely simple algorithm has been introduced that yields effective reproduction numbers very similar to those obtained from more elaborate approaches as exemplified by the EPIESTIM package. The latter should produce more accurate results if the full epidemiological characteristics of the disease are well known, but for calculations at the present time when little is known about COVID-19, the method proposed here is much easier to use and provides comparable accuracy.  The numerical simulations shown here were carried out on IGOR PRO 8.0 (Wavemetrics, Inc). The code is available upon request.