Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Removing weekly administrative noise in the daily count of COVID-19 new cases. Application to the computation of Rt

View ORCID ProfileLuis Alvarez, Miguel Colom, Jean-Michel Morel
doi: https://doi.org/10.1101/2020.11.16.20232405
Luis Alvarez
1CTIM. Departamento de Informática y Sistemas, Universidad de Las Palmas de Gran Canaria. Spain
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Luis Alvarez
  • For correspondence: lalvarez.mat@gmail.com
Miguel Colom
2Université Paris-Saclay, ENS Paris-Saclay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jean-Michel Morel
2Université Paris-Saclay, ENS Paris-Saclay, France
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

The way each country counts and reports the incident cases of SARS-CoV-2 infections is strongly affected by the “weekend effect”. During the weekend, fewer tests are carried out and there is a delay in the registration of cases. This introduces an “administrative noise” that can strongly disturb the calculation of trend estimators such as the effective reproduction number R(t). In this work we propose a procedure to correct the incidence curve and obtain a better fit between the number of infected and the one expected using the renewal equation. The classic way to deal with the administrative noise is to invoke its weekly period and therefore to filter the incidence curve by a seven days sliding mean. Yet this has three drawbacks: the first one is a loss of resolution. The second one is that a 7-day mean filter hinders the estimate of the effective reproduction number R(t) in the last three days before present. The third drawback of a mean filter is that it implicitly assumes the administrative noise to be additive and time invariant. The present study supports the idea that the administrative is better dealt with as being both periodic and multiplicative. The simple method that derives from these assumptions amount to multiplying the number of infected by a correcting factor which depends on the day of the week. This correcting factor is estimated from the incidence curve itself. The validity of the method is demonstrated by its positive impact on the accuracy of an the estimates of R(t). To exemplify the advantages of the multiplicative periodic correction, we apply it to Sweden, Germany, France and Spain. We observe that the estimated administrative noise is country dependent, and that the proposed strategy manages to reduce it noise considerably. An implementation of this technique is available at www.ipol.im/ern, where it can be tested on the daily incidence curves of an extensive list of states and geographic areas provided by the European Centre for Disease Prevention and Control.

The effective reproduction number R(t) is one of the most important epidemiological characteristics of the COVID-19 pandemic. It is constantly invoked by politicians and their scientific advisers to steer the social distancing measures. R(t) represents the expected number of secondary cases produced by a primary case at each time t. It can be computed from the incidence curve i(t) and the serial interval Φ, which is the empirical probability distribution of the time between the onset of symptoms in a primary case and the onset of symptoms in secondary cases. There are different strategies to estimate R(t) from i(t) and Φ: EpiEstim, the method proposed in [4] is widely used. In the online interface available at www.ipol.im/ern, we compare our estimate of R(t) with the one of EpiEstim. In [11] a technique to compute R(t) separating local transmission and imported cases is proposed. In [6] the authors make a detailed comparison of the methods proposed in [4], [2] and [13]. A systematic review and analysis of the serial interval is presented in [12]. In [10], [5], [7] different statistical distributions of the serial interval Φ for the SARS-CoV-2 are proposed.

The so-called renewal equation (see [9]) is a key epidemiological model linking the daily count of new detected cases of infections, i(t), with the reproductive power, A(t, s), at time t and infection-age s at which an infected individual generates secondary cases. A(t, s) depends on R(t) and Φ(s) and the renewal equation can be expressed as Embedded Image where tc represents the current time (the last time at which i(t) is available). For instance, in [4] and [3] the following formulation of the renewal equation is used to compute R(t) from i(t) and Φ(s): Embedded Image

This model is a simplification of the Nishiura equation [8]: Embedded Image

In the original formulation of this model it was assumed that Φ(s) = 0 for s ≤ 0. However, in the case of SARS-CoV-2 a patient can show symptoms before the patient who infected him/her shows symptoms him/herself. This means that, actually, Φ(s) can be positive for s < 0. In [1], the above model is used without any restriction on the support of Φ(s). Since measurements are generally made daily, a discrete formulation of model (3) is sound. It can be expressed as Embedded Image where f0 is, in general, negative.

In the technique we propose in this work, either of the above expressions of the renewal equation can be used. In the experiments presented, we use the model F2(i, R, Φ, t), and the method developed in [1] to estimate Rt from it and Φs. This method is based on the the minimization of the following energy: Embedded Image where p90(i) is the 90th percentile of Embedded Image used to normalize the energy with respect to the size of it. The first term of E is a data adjustment term which forces the renewal equation (4) to be satisfied as much as possible. The second term forces Rt to be a smooth curve; wt ≥ 0 represents the weight of the regularization at each time t. The higher the value of wt the smoother Rt. The last term of E forces Embedded Image to be close to an initial estimate given by Embedded Image for some particular times tm. Roughly speaking, minimizing the energy E leads to satisfy approximately the renewal equation (4) with a reasonably smooth Rt and, optionally, prescribed initial values for some particular times tm.

The daily number of new detected cases, it, is strongly affected by the “weekend effect”. During the weekend, fewer tests are carried out and there is a delay in the registration of cases. It follows that the actual number of cases is systematically underestimated in some days of the week and overestimated in others. The usual way to deal with this weekly administrative noise is to use a 7-day moving average of it, but this procedure negatively affects the accuracy of the point estimate of the trend of it when approaching the current date and forces stopping the estimate three days before present.

The main assumption of this paper is that a significant part of the discrepancy between it and its expected value F (i, R, Φ, t) is given by the weekend effect. We therefore assume that the quotient Embedded Image follows a 7-day periodic-dynamic, that is, q(i, R, Φ, t) ≈ q(i, R, Φ, t − 7). In other words, the ratio between F (i, R, Φ, t) and it depends mainly on the day of the week. In the experiments shown in this article, it will be observed that, indeed, q(i, R, Φ, t) follows accurately this periodic pattern in several countries1. For example, in Sweden every Monday the registered value it is systematically underestimated and on Saturdays the opposite effect occurs. So we propose to approximate q(i, R, Φ, t) using a 7-day periodic function given by the vector Embedded Image. Therefore Embedded Image where the symbol % represents the modulo operation, that is, the remainder of the division between two numbers.

To compute Embedded Image we proceed to a least square estimate Embedded Image where T represents the number of days used in the estimation (in our experiments we use T = 56, that is 8 weeks). We point out that the value Embedded Image can be considered as an update of it where we have removed the weekly administrative noise. To preserve the number of accumulated cases in the period of estimation, we add to the minimization problem (7) the constraint Embedded Image

In that way, the multiplication by the factor Embedded Image produces a redistribution of the cases it during the period of estimation, but it does not change the global amount of cases. Notice that we could add weights to some particular days in the expression of E(i, R, Φ, q) in (7). For example, if a day is a holiday in the middle of the week, one might reduce, in the energy E(i, R, Φ, q), the weight of that day and the following ones.

Once it is updated using Embedded Image, we can use ît to recompute Rt using the renewal equation (1). We denote by Embedded Image the updated version of Rt. This whole procedure is repeated to update iteratively ît and Embedded Image until convergence. The final value ît and Embedded Image provides a more realistic trend of the evolution of it and improves the estimation of Rt. The final vector Embedded Image represents the set of multiplicative factors used to remove the weekly administrative noise. In Fig. 1 we present a flowchart of the whole procedure and in the appendix we give technical details.

Figure 1:
  • Download figure
  • Open in new tab
Figure 1:

Flowchart of the estimation of Embedded Image and Embedded Image. First Rt is computed from it and Φs using the method proposed in [1] and Embedded Image is initialized. Next, Embedded Image is obtained by minimizing (7) with the constraint (8). Then it is updated as Embedded Image. The iteration is stopped when the efficiency measure ℐ defined in formula (9) does not improve in the current iteration. Otherwise, Embedded Image is updated by the method proposed in [1] from ît and Φs and the iteration goes on.

RESULTS

All of the experiments made here can be reproduced with the online interface available at www.ipol.im/ern. In the appendix, we have included some details about the use of this online demo. To measure how well the removal of the weekly administrative noise improves the explanation of it by the renewal equation F (i, R, Φ, t), we use the following efficiency measure: Embedded Image where E(.) is defined in (7). ℐ represents the reduction, after the removal of the weekly administrative noise, of the average distance between it and F (i, R, Φ, t). The smaller ℐ, the more efficient the noise reduction has been. In fact, the value of ℐ can be used to assess whether it is worth applying the proposed method to a given country and in a given time interval.

In Fig. 2 we plot, for Sweden, Germany, France and Spain the values of the vector Embedded Image obtained for each day of the week. We observe that in Sweden and Germany, Monday is the day of the week where the value of it is most underestimated (the higher Embedded Image is, the more underestimated is it in the day k). However, in France, that day corresponds to Tuesday. This suggests that in France there is an additional delay of one day due to the way France records and reports the number of new cases. In these three countries, the effect of weekly administrative noise is mainly concentrated on Monday, Tuesday and Wednesday (with a 1-day lag in the case of France). The case of Spain is special because it does not provide data on the weekend. In that case we have assumed that on Saturday, Sunday and Monday the number of cases is constant and equal to the accumulated number of cases on Saturday, Sunday and Monday divided by 3. Moreover, in Spain, some regions sometimes do not report data for one or several days, which produces an additional administrative noise different from the weekend effect. The more orderly the daily count of new infected, the better the result obtained by our method.

Figure 2:
  • Download figure
  • Open in new tab
Figure 2:

Plot of Embedded Image for different countries.

In table 1 we present the numerical values of the vector Embedded Image, the efficiency measure ℐ and the effective reproduction number in the current time before and after the weekly administrative noise removal given by Embedded Image and Embedded Image respectively. Notably, and with the exception of Spain, the efficiency value ℐ shows a high reduction of the administrative noise. At the end of the period chosen for the study, between September 9 and October 28, there is a strong expansion of the virus. This period ends on a Wednesday. In the previous days the number of cases was underestimated. It follows that the estimated effective reproduction number becomes considerably higher after eliminating the weekly administrative noise.

View this table:
  • View inline
  • View popup
  • Download powerpoint
Table 1:

We present for Sweden, Germany, France and Spain the values of the vector Embedded Image for each day of the week, the efficiency measure, ℐ, and the effective reproduction number in the current time before and after the administrative noise removal given by Embedded Image and Embedded Image respectively.

In Fig. 3 - 6 we plot on the one hand the quotient Embedded Image and its periodic approximation Embedded Image, and on the other hand the values of it, its update ît after the removal of the weekly administrative noise and the expected value using Embedded Image. For Sweden, Germany and France we observe a quite a good agreement between the quotient Embedded Image and its periodic approximation Embedded Image, which supports the validity of the proposed method and our assumption that the evolution of Embedded Image follows a 7-day periodic dynamic. For these countries, Embedded Image the number of new cases after the removal of the administrative noise, is less oscillating than it and very close to its expected value following the renewal equation Embedded Image. In the case of Spain, the obtained improvement is minor. On the one hand, Spain does not report data during the weekend, which introduces an additional disturbance in the data that hinders the application of the proposed method. In addition, in two of the weeks included in the period of the study, Embedded Image diverges significantly from its periodic approximation Embedded Image.

Figure 3:
  • Download figure
  • Open in new tab
Figure 3:

In the case of Sweden, we plot, between September 9 and October 28: (up) Embedded Image (dotted line) and its periodic approximation Embedded Image (solid line). (down): it (solid line), its update ît (dashed line) removing the administrative noise and the expected value using Embedded Image (dotted line).

Figure 4:
  • Download figure
  • Open in new tab
Figure 4:

In the case of Germany, we plot, between September 9 and October 28: (up) Embedded Image (dotted line) and its periodic approximation Embedded Image (solid line). (down): it (solid line), its update ît (dashed line) removing the administrative noise and the expected value using Embedded Image (dotted line).

Figure 5:
  • Download figure
  • Open in new tab
Figure 5:

In the case of France, we plot, between September 9 and October 28: (up) Embedded Image (dotted line) and its periodic approximation Embedded Image (solid line). (down): it (solid line), its update ît (dashed line) removing the administrative noise and the expected value using Embedded Image (dotted line).

Figure 6:
  • Download figure
  • Open in new tab
Figure 6:

In the case of Spain, we plot, between September 9 and October 28: (up) Embedded Image (dotted line) and its periodic approximation Embedded Image (solid line). (down): it (solid line) its update ît (dashed line) removing the administrative noise and the expected value using Embedded Image (dotted line).

1 CONCLUSION

We proposed a technique to remove the weekly administrative noise in the incidence curves of COVID-19, based on the improvement of the agreement between it and its expected value using the renewal equation F (i, R, Φ, t). The method boils down to multiplying it by a factor Embedded Image which depends on the day of the week. The main assumption supporting the validity of this approach is that the quotient of it and F (i, R, Φ, t) follows approximately a 7-day periodic dynamic. We verified this assumption on Sweden, Germany and France. In the case of Spain the method brings less improvement, due to the fact that Spain does not report data on weekends. The proposed method updates iteratively it and the effective reproduction number Rt. The number of cases after the removal of the weekly administrative noise Embedded Image is much less oscillating than it and very close to its expected value following the used renewal equation. From ît we obtain a more accurate value of the effective reproduction number Rt. An implementation of this technique is available at www.ipol.im/ern. In this online interface we compare the final estimation of Rt with the estimate obtained by EpiEstim. As shown in [3], in practice, EpiEstim uses a 7-day moving average to remove the administrative weekly noise. As can be observed using the online interface, this 7-day moving average introduces a shift towards the past that is not observed in our estimate. Therefore our method seems to provide a more to date estimate of Rt than EpiEstim.

Data Availability

All data referred to in the manuscript are publicly available at the European Centre for Disease Prevention and Control.

https://www.ipol.im/ern

Appendix

Technical issues

Here, we show some technical details of the proposed technique. First we notice that, in general, the only data we use is it and Φs. So initially, we do not know the actual day of the week which corresponds to each datum it. For the computation of Embedded Image, we will assume, without loss of generality, that the first day of the period we consider for the estimation, that is tc − T + 1 corresponds to the time t = 0. This means that we are initially assuming that this day corresponds to a Monday, but in fact, this is not relevant because we can reorganize at any time the values of Embedded Image to fit the actual days of the week.

The minimization problem (7) given by the quadratic energy: Embedded Image can be expressed in matrix form as Embedded Image where ‖.‖ is the usual Euclidean norm, A ∈ RT×7 is defined by Embedded Image

b ∈ RT is defined by bk = F (i, R, Φ, k). It is well-known that the above minimization problem has a closed form solution given by Embedded Image

The constraint (8) given by Embedded Image can be expressed as the following additional linear equation: Embedded Image where Embedded Image this constraint can be included in the minimization procedure by removing one of the unknowns using the above equation. However, we use another approach which consists of adding the above equation, multiplied by a certain weight w, to the expression (11). In this way we can control the weight that is assigned to this constraint. If w is large, we force the restriction to be fulfilled and if w = 0 we remove this constraint. In the experiments performed in this work we use w = 105, so we choose that the constraint is satisfied.

As explained in the Fig. 1, in the proposed method, we update iteratively it, Rt and Embedded Image. Let us denote by Embedded Image and Embedded Image these updates for each iteration n starting at n = 0. Following the flowchart of Fig. 1 these updates are computed using the following algorithm:

Algorithm 1:

Algorithm to estimate Embedded Image. MaxIter represents the maximum number of iterations allowed.

Figure
  • Download figure
  • Open in new tab

Using the online interface

In the online interface, available at www.ipol.im/ern, one can test the method proposed in [1] to estimate Rt as well as the method proposed here to remove the weekly administrative noise. When the DEMO is executed with the option “remove weekly administrative noise” activated, the method proposed in this paper is used to compute ît, Embedded Image and Embedded Image. A comparison with the EpiEstim method is showed. We also show (in the plot on the right) the original curve it as well as the estimation using the renewal equation Embedded Image. In the output file named “Rn.csv” you will find in the first row, the final value of the efficiency measure ℐ (see (9)) and the vector Embedded Image. The elements of the vector are organized in such a way that the last one Embedded Image corresponds to the multiplicative factor of the last element of it (that is Embedded Image). In this way, to obtain ît we have just to do Embedded Image and, in general: Embedded Image. Moreover, in the first column of the output file “Rn.csv” you will find the values of Embedded Image, in the second column, Embedded Image, in the third column, the original number of infected it, and in the forth column a measure of the variability in the estimation of Embedded Image, as explained in [1].

Footnotes

  • ↵1 To test this assumption on many more countries, the reader is invited to the online demo www.ipol.im/ern

Abbreviations

EpiEstim
Software to compute the effective reproduction number proposed by Cori et al. in the paper: A new framework and software to estimate time-varying reproduction numbers during epidemics published in the American Journal of Epidemiology.
R(t)
Effective Reproduction Number. To differentiate between the continuous and discrete cases, we use the notation R(t) in the continuous case and Rt in the discrete case.
i(t)
incidence curve, the number of daily tested positive registered. To differentiate between the continuous and discrete cases, we use the notation i(t) in the continuous case and it in the discrete case.
Φ
serial interval.

References

  1. [1].↵
    L. Alvarez, M. Colom, and J.-M. Morel, A variational model for computing the effective reproduction number of SARS-CoV-2, MedRxiv, (2020).
  2. [2].↵
    L. Bettencourt and R. Ribeiro, Real time bayesian estimation of the epidemic potential of emerging infectious diseases, PLoS ONE, 3 (2008).
  3. [3].↵
    T. Z. Boulmezaoud, L. Alvarez, M. Colom, and J.-M. Morel, A daily measure of the SARS-CoV-2 daily reproduction number for all countries, IPOL Journal. Image Processing On Line, submitted, (2020).
  4. [4].↵
    A. Cori, n. M. Ferguson, C. Fraser, and S. Cauchemez, A new framework and software to estimate time-varying reproduction numbers during epidemics, American journal of epidemiology, 178 (2013), pp. 1505–1512.
    OpenUrlCrossRefPubMed
  5. [5].↵
    Z. Du, X. Xu, Y. Wu, L. Wang, B. J. Cowling, and L. A. Meyers, The serial interval of COVID-19 from publicly reported confirmed cases, medRxiv, (2020).
  6. [6].↵
    K. Gostic, L. McGough, E. Baskerville, S. Abbott, K. Joshi, C. Tedi-janto, R. Kahn, R. Niehus, J. Hay, P. De Salazar, J. Hellewell, S. Meakin, J. Munday, n. Bosse, K. Sherratt, R. Thompson, L. White, J. Huisman, J. Scire, S. Bonhoeffer, T. Stadler, J. Wallinga, k. S. Fun, M. Lipsitch, and S. Cobey, Practical considerations for measuring the effective reproductive number, Rt, MedRxiv, (2020).
  7. [7].↵
    S. Ma, J. Zhang, M. Zeng, Q. Yun, W. Guo, Y. Zheng, S. Zhao, M. H. Wang, and Z. Yang, Epidemiological parameters of coronavirus disease 2019: a pooled analysis of publicly reported individual data of 1155 cases from seven countries, Medrxiv, (2020).
  8. [8].↵
    H. Nishiura, Time variations in the transmissibility of pandemic influenza in Prussia, Germany, from 1918–19, Theoretical Biology and Medical Modelling, 4 (2007), p. 20.
    OpenUrl
  9. [9].↵
    H. Nishiura and G. Chowell, The Effective Reproduction Number as a Prelude to Statistical Estimation of Time-Dependent Epidemic Trends, Springer Netherlands, Dordrecht, 2009, pp. 103–121.
  10. [10].↵
    H. Nishiura, n. M. Linton, and A. R. Akhmetzhanov, Serial interval of novel coronavirus (covid-19) infections, International journal of infectious diseases, (2020).
  11. [11].↵
    R. Thompson, J. Stockwin, R. D. van Gaalen, J. Polonsky, Z. Kamvar, P. De-marsh, E. Dahlqwist, S. Li, E. Miguel, T. Jombart, et al., Improved inference of time-varying reproduction numbers during infectious disease outbreaks, Epidemics, 29 (2019), p. 100356.
    OpenUrlCrossRefPubMed
  12. [12].↵
    M. A. Vink, M. C. J. Bootsma, and J. Wallinga, Serial intervals of respiratory infectious diseases: a systematic review and analysis, American journal of epidemiology, 180 (2014), pp. 865–875.
    OpenUrlCrossRefPubMedWeb of Science
  13. [13].↵
    J. Wallinga and P. Teunis, Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures, American Journal of epidemiology, 160 (2004), pp. 509–516.
    OpenUrlCrossRefPubMedWeb of Science
Back to top
PreviousNext
Posted November 18, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Removing weekly administrative noise in the daily count of COVID-19 new cases. Application to the computation of Rt
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Removing weekly administrative noise in the daily count of COVID-19 new cases. Application to the computation of Rt
Luis Alvarez, Miguel Colom, Jean-Michel Morel
medRxiv 2020.11.16.20232405; doi: https://doi.org/10.1101/2020.11.16.20232405
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Removing weekly administrative noise in the daily count of COVID-19 new cases. Application to the computation of Rt
Luis Alvarez, Miguel Colom, Jean-Michel Morel
medRxiv 2020.11.16.20232405; doi: https://doi.org/10.1101/2020.11.16.20232405

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (215)
  • Allergy and Immunology (495)
  • Anesthesia (106)
  • Cardiovascular Medicine (1095)
  • Dentistry and Oral Medicine (196)
  • Dermatology (141)
  • Emergency Medicine (274)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (500)
  • Epidemiology (9766)
  • Forensic Medicine (5)
  • Gastroenterology (480)
  • Genetic and Genomic Medicine (2308)
  • Geriatric Medicine (222)
  • Health Economics (462)
  • Health Informatics (1557)
  • Health Policy (734)
  • Health Systems and Quality Improvement (603)
  • Hematology (236)
  • HIV/AIDS (503)
  • Infectious Diseases (except HIV/AIDS) (11639)
  • Intensive Care and Critical Care Medicine (617)
  • Medical Education (237)
  • Medical Ethics (67)
  • Nephrology (257)
  • Neurology (2142)
  • Nursing (134)
  • Nutrition (336)
  • Obstetrics and Gynecology (426)
  • Occupational and Environmental Health (517)
  • Oncology (1175)
  • Ophthalmology (364)
  • Orthopedics (128)
  • Otolaryngology (220)
  • Pain Medicine (146)
  • Palliative Medicine (50)
  • Pathology (311)
  • Pediatrics (695)
  • Pharmacology and Therapeutics (300)
  • Primary Care Research (267)
  • Psychiatry and Clinical Psychology (2180)
  • Public and Global Health (4654)
  • Radiology and Imaging (776)
  • Rehabilitation Medicine and Physical Therapy (457)
  • Respiratory Medicine (623)
  • Rheumatology (274)
  • Sexual and Reproductive Health (225)
  • Sports Medicine (210)
  • Surgery (251)
  • Toxicology (43)
  • Transplantation (120)
  • Urology (94)