## Abstract

The coronavirus disease 2019 (COVID-19) has grown up to be a pandemic within a short span of time. The quantification of COVID-19 transmissibility is desired for purposes of assessing the potential for a place to start an outbreak and the extent of transmission in the absence of control measures. It is well known that the transmissibility can be measured by reproduction number. For this reason, the large amount of research focuses on the estimations of reproduction number of COVID-19. However, these previous results are controversial and even misleading. To alleviate this problem, Liu et al advised to use averaging technique. Unfortunately, the fluctuant consequence principally arises from data error or model limitations rather than stochastic noise, where the averaging technique doesn’t work well. The most likely estimation in USA and Wuhan is about 8.21 and 7.9. However, no enough evidence demonstrates the transmissibility increase of infectious agent of COVID-19 throughout the world.

## Background

The coronavirus disease 2019 (COVID-19) has grown up to be a pandemic within a short span of time. The quantification of COVID-19 transmissibility is desired for purposes of assessing the potential for a place to start an outbreak and the extent of transmission in the absence of control measures[1]. It is well known that the transmissibility can be measured by reproduction number. For this reason, the large amount of research focuses on the estimations of reproduction number of COVID-19. However, these previous results are controversial and even misleading. To alleviate this problem, Liu et al[2] advised to use averaging technique. Unfortunately, the fluctuant consequence principally arises from data error or model limitations rather than stochastic noise, where the averaging technique doesn’t work well. This has inspired a collection of studies on reproduction number of COVID-19.

## Objective

To find out the reason for estimation change and then the most reliable estimation of base reproduction number.

## Definitions

Reproduction number can be subdivided into basic reproduction number and effective reproductive number. In this paper, our discussion will be confined to basic reproduction number since it is harder to calculate than the other[3]. Even so, the definitions of both are presented here for comparison.

Basic reproduction number (BRN)

*R*_{0}is defined as the expected number of secondary infectious cases generated by an average infectious case in an otherwise uninfected population[1, 4].Effective reproductive number (ERN),

*R*_{e}, is the number of secondary cases generated by an infectious case once an epidemic is underway[1].

BRN *R*_{0} can be expressed as *R*_{0}=*kpd*, where *k* is the number of contacts each infectious individual has per unit time, *p* is the probability of transmission per contact between an infectious case and a susceptible individual, and *d* is the mean duration of infectiousness[1]. In the absence of control measures, *R*_{e}=*rR*_{0}, where *r* is the proportion of the population susceptible [1]. Here, *r* < 1.

## Data

Google Scholar and Science Citation Index were used to search for eligible studies about BRN of COVID-19. The estimations of BRN range from 0.3[5] to 8.213[6]. An extensive and resulting description is given in Table 1.

## Discussion

### BRN increases with time

In order to prove the BRN varies with time, the data is divided into two groups: (1) Group I consists of the first 22 estimations of BRN in Table 1; (2) Group II is composed of the last 22 estimations. A statistical hypothesis testing is carried out here. The null hypothesis is that the mean of Group I is no less than Group II. The alternative hypothesis is that the mean of Group II is more than Group I.

Suppose the estimations follows a normal distribution, if the null hypothesis is really true, the test statistic *t* which follows by the Student’s t-distribution should be more than -1.732 with probability 95%, while it is -2.256 according to the data. Therefore, the null hypothesis is rejected and the conclusion is that BRN increases with time.

### Difference between methods

For more than two categories, some of which have small numbers, statistic analysis tends to pool some of the categories together[35]. For this reason, the estimation methods of BRN are divided into three broad categories: (1) methods based on exponent-growth-like models, (2) methods based on SIR epidemic models, and (3) others. For convenience, we will hereafter use ‘EX method’ to denote the method based on exponent-growth-like models and ‘SIR method’ to indicate method based on susceptible–infectious–removed (SIR) model.

At first, we suspect there is a significant difference between the groups in variance. This is confirmed by using Levene’s test (on the means) since the Levene test statistic 12.28 is far more than 3.23, which is the upper critical value of the F distribution with 2 and 40 degrees of freedom at a significance level of 0.05.

For the reason that the variances and sample sizes are unequal across groups, Welch’s Test is used to perform an ANOVA analysis. Here, the null hypothesis is that the means of the three groups are equal. The alternative hypothesis is that more than one group is different from others. From the data, the Welch’s test statistic is 3.25, which more than the upper critical value F(2, 14.165) at a significance level of 0.1. As a result, the conclusion can be drawn that more than one group is different from others.

Further, the results of three methods are shown by points in Figure 1. Here, the estimation (3.2-3.6) provided by Davies [28] is substituted by 3.4 for the convenience of pictorial display, and 2.76-3.25 by 3.005 for the same reason. Regression analysis is implemented separately for each method, as the lines shown in Figure 1. They all show that BRN increases with time. Note that BRN 0.3 is rejected since an epidemic can occur if and only if.

The mean and standard deviation of BRN calculated by different classes of method are tabulated in Table 2. It can be observed that the output of SIR methods is higher but more divergent than others. The reason for this is complicated.

At the beginning of the break of COVID-19, the data reported by Wuhan in China is less than the ground truth[15, 36, 37]. Note that EX methods are only competent to calculate BRN from early transmission data[8, 38]. In contrast, SIR methods can be applicable to the whole development process of an epidemic and hence more data can be available. For this reason, the EX methods suffer more ill effects of under-reporting than the SIR methods.

On the other hand, for parameter estimation of epidemic models, the SIR methods usually involve nonlinear optimization, which can easily get stuck into improper local minima[39]. Worse, the parameters are frequently estimated at the cost of model simplification. It is hard to obtain a balance between simplicity and practicality. For this reason, the results of SIR methods are relatively unstable. This may provide an explanation of why the output of SIR methods is more divergent than others.

### Transmissibility increases with time?

From Figure 1, it is a consensus that the estimations of BRN increase slightly with time. Does the transmissibility of COVID-19 increase as well? In fact, the possible causes of BRN increase are threefold: (1) behavior changes of population, (2) increase of transmissibility of infectious agent and (3) computational errors. Among them, behavioral changes such as wearing mask typically reduce the probability of transmission and ultimately lower rather than enhance BRN. So, if the third cause is impossible either, it is reasonable that the transmissibility of infectious agent of COVID-19 increases with time.

To explore the reason, we carried out a further investigation into the estimation process of BRN. Among them, we found that up to 27 estimates are derived directly from original data reported by Wuhan authority, which is less than the ground truth due to delays in diagnosis and laboratory confirmation in the early stage [15, 36, 37]. Therefore, under-reporting results in an underestimation, which can give a reasonable explanation for BRN increase. As a consequence, no enough evidence demonstrates the transmissibility increase of infectious agent of COVID-19 throughout the world.

### Magnitude of BRN

According to the aforementioned definition, because *k* and *p* are related to numerous biological, sociobehavioral, and environmental factors in the special geographical location and historical period, BRN *R*_{0} is not a biological constant for a pathogen, or a measure of disease severity[3]. Nevertheless, under the same or similar sociobehavioral and environmental conditions, BRN, as an epidemiologic metric, should be higher for the stronger infectious agents. For this reason, we compared the outbreak of COVID-19 with SARS, as tabulated in Table 3. Obviously, the outbreak size of COVID-19 has already far exceeded SARS. In this sense, the BRN of COVID-19 should be far more than that of SARS, i.e., 2.3-3.6[1, 40].

BRN *R*_{0} is originally designed to reflect the characteristic of an infectious agent and hence it is required to be measured from the data in the early stage and in the absence of control measures. In this sense, the data of Wuhan, in which COVID-19 began[42], should be the optimal choice to measure the BRN. On the other hand, BRN is more likely to be closer to the estimation from the data of countries with fewer control measures, such as USA. For this reason, here we focus on the BRN of COVID-19 in USA and Wuhan.

The BRN of COVID-19 in USA is estimated to be 8.21 by Xu et al [6]. In fact, after hearing the dangerousness of COVID-19 from Wuhan, American masses may change their behaviors more or less. Therefore, if COVID-19 spreads from USA, the BRN should be more than the estimation 8.21.

With regard to the BRN of COVID-19 in Wuhan, it is a more complex problem due to under-reporting. Despite a large amount of underestimation appearing in the literature, the several estimations are still similar to the BRN in USA. How to do so? We investigated the data and method, as given in Table 4. It can be observed that most of them utilized compartment models and all of them did not use the raw number of confirmed cases reported by Wuhan authority. Therefore, it is possible to avoid misestimating.

Unlike EX methods, SIR methods can be tested by the predicted results. And, Zhao et al[16] and Zhu et al[27] made a prediction about the end time of COVID-19 in Wuhan, as presented in Table 4. The fact is that, on April 15, the last support medical team was evacuated from Wuhan[43] and Wuhan Thunder Mountain Hospital was officially closed on the same day[44]. For this reason, we believe that the model presented by Zhu et al[27] is more reasonable. In addition, the result computed by Tang[45] demonstrates that the BRN in Wuhan should be more than 6.47, since BRN is typically more than control reproduction number. Given these points, there is strong possibility that the BRN of COVID-19 in Wuhan is 7.9.

In summary, the most likely estimation of BRN of COVID-19 in USA and Wuhan is about 8.21 and 7.9. Arguably,it is timely hospitalization that is the reason why the BRN in Wuhan is lower than USA. It is worth noting that limited evidence supports the applicability of BRN outside the region where the value was calculated[47]. For this reason, BRN just reflects the transmissibility of an infection in the special location and period and no BRN is fit for everywhere. To compare with other infections, an alternative term, which is independent of population density, social organization, seasonality and folkway, is desired but still open.

### Findings

The early data is crucial to estimate BRN. Underreporting tends to make an underestimation. In terms of BRN estimation of COVID-19, the SIR-like models are more useful than exponential growth model. The most likely estimation in USA and Wuhan is about 8.21 and 7.9. However, no enough evidence demonstrates the transmissibility increase of infectious agent of COVID-19 throughout the world. Note that it is dangerous to use BRN outside the region where the value was calculated. To compare with other infections, an alternative term, which is independent of population density, social organization, seasonality and folkway, is desired but still open.

## Data Availability

No

## Acknowledgment

This work is supported by National social science foundation of China (16BXW005), Chongqing Basic Science and Frontier Research Project, China (No. cstc2017jcyjAX0007, cstc2017jcyjAX0386), Project Foundation of Chongqing Municipal Education Committee, China (Grant No. 17SKG050).