## Abstract

We estimated the fraction and timing of presymptomatic transmissions of COVID19 with mathematical models combining the available data of the incubation period and serial interval. We found that up to 79.7% transmissions could be presymptomatic among the imported cases in China outside Wuhan. The average timing of presymptomatic transmissions is 3.8 days (SD = 6.1) before the symptom onset, which is much earlier than previously assumed.

## Introduction

The pandemic of COVID19 is ongoing. Till April 2 2020, COVID19 has caused more than 965 thousand infections and 49 thousand deaths world widely. Current evidences indicate COVID19 can be transmitted presymptomatically [1-2]. But several crucial features of COVID19 transmission behavior remain uncertain: what was the proportion of transmissions happened before symptom onset, and when did the infected persons become infectious? The answers to these questions may significantly impact the disease control strategy [3].

We tried to answer these questions by estimating the infection time distribution of COVID19. The infection time is the time of the secondary case infection relative to the symptom onset of the index case. The fraction and timing of presymptomatic transmissions can be straightforwardly obtained from the distribution of infection time.

As illustrated in Figure 1, the infection time is related to the other two important epidemic features as

where negative value of infection time stands for the presymptomatic transmission. Given this relationship, the infection time distribution could be estimated using statistical methods combining the up to date data of the incubation period and the serial interval of COVID19.

### Estimate the mean and standard deviation of COVID19 infection time

There have been several reports estimating the incubation period or serial interval of COVID19 [4-8]. Based on these reports and using method of moments (see Appendix 1) we estimated the mean and standard deviation of COVID19 infection time in two scenarios as listed in Table 1: Scenario 1, early transmission in Wuhan, China before Jan. 22, 2020 [4]; Scenario 2, imported transmission in China outside Wuhan after Jan. 20, 2020 [5-6] (the Wuhan lockdown was initialed on Jan. 23, 2020).

We estimated that the infection time was 2.3±2.7 days in Scenario 1, and was **-** 2.4±4.2 days in Scenario 2, which implied that the majority of transmissions were post-symptomatic in Scenario 1 but were presymptomatic in Scenario 2. The dramatic change of infection time distribution between these two scenarios may due to effective case isolation and quarantine of people with Wuhan travel history which could significantly reduce transmissions after symptom onset. The method of moments estimations do not rely on any specific assumption of the distribution forms of the incubation period or serial interval.

### Modeling the timing of presymptomatic transmission

We tried to model the timing of COVID19 presymptomatic transmission. We assumed the probability for patients being infectious is given by (more details are provided in Appendix 2)
where *α* is a parameter shaping the transmission behavior. As showed in Figure 2, the patients’ probability of being infectious increases from zero at their first day ofinfection to one at the day of symptom onset. *α* = 1 stands for the assumption that the onsets of infectiousness uniformly distributed in the incubation period, therefore the probability of being infectious increases linearly with time. When *α* > 1, more patients would only become infectious shortly before their symptom onsets, which was previously assumed for COVID19. On the other hand, when *α* < 1, more patients would become infectious shortly after their infections and much earlier than their symptom onsets.

Then serial interval distribution could be calculated using convolution (see Appendix 3).

Where *h*(*S*), *f*(*I*), *g*(*T*) are the probability density functions of serial interval, infection and incubation period respectively.

A distinctive feature of our presymptomatic transmission model is to naturally reproduce the “negative serial intervals”, a situation that the symptom onset of the secondary case is earlier than the index case. Widely used statistical models that describing serial interval distributions, including the Gamma, Weibull, log-normal distributions etc., ignore the possibility of negative serial intervals. However, Du et al. reported 12.6% negative serial intervals in the 468 pairs of cases of imported transmissions in China outside Wuhan, which indicated that our model was more proper for COVID19 modeling.

As demonstrated in Figure 3, when *α* = 1/4, 1 and 4, we estimated that the incidence of negative serial intervals were 15.5%, 9.7% and 3.2% respectively among presymptomatic transmissions. Sensitivity analysis using various incubation period distributions was given in Appendix Table S1. Comparing with the observed and model predicted negative serial incidence, it was more likely that *α* < 1, which contradicted with current assumption of COVID19 transmission behavior.

### Estimate the fraction of presymptomatic transmission

Two approaches were applied to estimate the presymptomatic transmission fraction. A raw estimation was based on the mean and standard deviation of infection time estimated previously with normal distribution assumption. The more sophisticated estimation is based on our presymptomatic transmission model and use the maximum convolution likelihood method [9] (see Appendix 4) to estimate the model parameters.

We estimated the distribution of infection time and serial interval in Scenario 2 (as mentioned before) via two approaches as shown in Figure 4. The fraction of presymptomatic transmissions was estimated 79.7% with our model and 67.5% with normal distribution assumption. The *α* parameter was estimated as 0.078, which implied that the COVID19 infected persons tended to become infectious shortly after infection. The average timing of presymptomatic transmission was estimated 3.8 days (SD = 6.1) before symptom onsets.

Because of the unavailability of data, we could not apply the maximum convolution likelihood estimation on Scenario 1, but the estimation with normal distribution assumption showed 19.7% transmissions were presymptomatic.

## Conclusion and Discussion

In conclusion, we estimated the fraction and timing of COVID19 presymptomatic transmissions. Through several different approaches (method of moments estimation, expected negative serial interval incidence, and maximum convolution likelihood estimation), we achieve largely consistent results: a large portion of COVID19 transmissions may happen presymptomatically. The percentage is roughly 19.7% when there is no effective control, and can go up to 79.7% when control measures effectively reducing transmissions after symptom onsets are performed. The timing of presymptomatic transmission is on average 3.8 days (SD=6.1) before symptom onset. Patients is likely to become infectious in the early stage of their infections instead of just before their symptom onsets.

To our knowledge, our model combining incubation period and serial interval data is a novel approach to estimate the presymptomatic transmissions of infectious disease. The distinctive feature of this approach is that it naturally predict the occurrence of negative serial intervals. The high incidence (12.6%) of negative serial intervals in the observed data of COVID19 (much higher than other diseases like SARS, MERS, etc.) implies that our model is particularly important to correctly understand the COVID19 transmission behavior.

The infection time distribution investigated in this study is closely related to the biologically infectious period of patients during the course of disease. Both provide insights of the transmissibility of COVID19, but from two different aspects. Our findings are largely consistent with the currently available data of viral shedding studies in the close contacts and the confirmed cases [10-11]. However, a difference should be noticed that the infection time distribution is more likely to be affected by the sociological factors and non-pharmaceutical interventions performed by local public health authorities.

Our findings should be interpreted with caution. The epidemic data of COVID19 patients that describing the time and tracing details is very limited. Therefore both estimations of the incubation period and the serial interval of COVID19 have considerable uncertainty. Important biases in our data source include the detection bias(i.e. cases with severe symptoms are more likely to be detected), and reporting bias(i.e. cases with clearly reported tracing details are more likely from areas with plenty of public health resources, and cannot represent the situation in those poorly controlled areas).

Another important concern of COVID19 is the transmissions induced by the “true asymptomatic carrier”, which refers to the infected person who keep asymptomatic during the entire course of disease [12]. Our study is based on data of confirmed cases which did not include the “true asymptomatic carrier”. However either presymptomatic transmission or asymptomatic carrier suggest that the isolations based on symptom surveillance alone may be not enough. Aggressive testing, isolation of close contacts and social distancing to prevent presymptomatic or asymptomatic transmissions are crucial to combat COVID19.

## Data Availability

All data and code to reproduce this paper are available online

https://github.com/witjump/COVID19-presymptomatic-transmission

## Acknowledgement

We appreciate Zhanwei Du et al. collected and cleaned the serial interval data of confirmed cases reported in China, and made the data open for public.

## Appendix Mathematical methods

### 1. Method of moments estimation of infection time

We have

Where *I*_{0} is the infection time of index case, *T*_{1} is the incubation period of secondary case, and *S*_{0,1} is the serial interval. All these three are random variables. It is reasonable to assume *I*_{0} and *T*_{1} are independent. Therefore, we have following equations:

Hence we can estimate the mean and standard deviation of infection time *I*_{0} by solving Eq. (2). Note that, the method of moments estimation does not rely on any assumption of specific distribution forms of the *I*_{0}, *T*_{1} and *S*_{0,1}, so the conclusion should be robust. But the estimation of infection time can be affected by the bias in previous estimation of incubation period and serial interval.

### 2. Presymptomatic transmission model

For presymptomatic transmission, the infection time of an index case *I*_{0} is restricted by its own incubation period *T*_{0} by *I*_{0} > −*T*_{0}. So we model *F*_{i} (*I*_{i} | *T*_{i}), the cumulative probability for individual *i* becoming infectious given condition of its incubation period *T*_{i}, with formula:

In this model, the probability of infectiousness for an index case increases monotonically from 0 at the time of initial infection (*I*_{i} = −*T*_{i}), to 1 at the time of symptom onset (*I*_{i} = 0).

To better understand this infectiousness behavior, let us see a specific case: if an index case develops infectiousness with equal probability at any time during its incubation period, then its probability of being infectious should increase linearly with time. That is the case when *α* = 1. If *α* > 1, the index case is more likely to develop infectiousness shortly before symptom onset, which is the commonly assumed behavior of COVID19 in previous literature. On the contrary, if*α* < 1, the index case is more likely to develop infectiousness shortly after being infected despite the symptom onset might be much later.

The population averaged infection time *f* (*I*) is given by
where A is a normalization coefficient. The following Fig. S1 (A) demonstrates the cumulative probability of infectiousness with variety of *α* parameters. The x-axis of Fig. S1(A) is the “infected time over incubation period” (i.e., which is an alternative representation of *F*_{i} (*I*_{i} | *T*_{i}). Fig. S1(B) demonstrates the population averaged infection time *f* (*I*).

### Part 3. Estimating the serial interval distribution and the incidence of negative serial intervals

Denote the distribution density functions of infection time (population averaged), incubation period and serial interval by *f* (*I*), *g*(*T*), and *h*(*S*) respectively. According to equation (1), we have
or simply *h* = *f* * *g* with convolution notation. The serial interval distribution *h*(*S*) is calculated using Eq. (5). The expected negative serial fraction is estimated by

Here we use the discrete sum instead of integration in order to match the empirical data in which the serial time is discrete in days. Besides, we think the negative serial interval less than one day is not observable in realistic setting.

### Part 4: Maximum convolution likelihood estimation

Now we use the empirical serial interval data of COVID19 in China to estimate the infection time distribution.

We construct the whole transmission model with both presymptomatic and post-symptomatic transmission. We use negative binomial distribution to model the infection after symptom onset. Hence the combined infection time distribution is given by
with *α, r, p* and *A* / *B* as undetermined parameters.

Combining Eq. (3),(7), we use maximum likelihood method to estimate the parameters in (7) by maximizing the log-likelihood function
where the *s*_{i} (*i* = 1…*N*) are the serial intervals of imported COVID19 cases obtained from China outside Hubei.