## Abstract

To test whether acute infection with B.1.1.7 is associated with higher or more sustained nasopharyngeal viral concentrations, we assessed longitudinal PCR tests performed in a cohort of 65 individuals infected with SARS-CoV-2 undergoing daily surveillance testing, including seven infected with B.1.1.7. For individuals infected with B.1.1.7, the mean duration of the proliferation phase was 5.3 days (90% credible interval [2.7, 7.8]), the mean duration of the clearance phase was 8.0 days [6.1, 9.9], and the mean overall duration of infection (proliferation plus clearance) was 13.3 days [10.1, 16.5]. These compare to a mean proliferation phase of 2.0 days [0.7, 3.3], a mean clearance phase of 6.2 days [5.1, 7.1], and a mean duration of infection of 8.2 days [6.5, 9.7] for non-B.1.1.7 virus. The peak viral concentration for B.1.1.7 was 19.0 Ct [15.8, 22.0] compared to 20.2 Ct [19.0, 21.4] for non-B.1.1.7. This converts to 8.5 log_{10} RNA copies/ml [7.6, 9.4] for B.1.1.7 and 8.2 log_{10} RNA copies/ml [7.8, 8.5] for non-B.1.1.7. These data offer evidence that SARS-CoV-2 variant B.1.1.7 may cause longer infections with similar peak viral concentration compared to non-B.1.1.7 SARS-CoV-2. This extended duration may contribute to B.1.1.7 SARS-CoV-2’s increased transmissibility.

## Main text

The reasons for the enhanced transmissibility of SARS-CoV-2 variant B.1.1.7 are unclear. B.1.1.7 features multiple mutations in the spike protein receptor binding domain^{1} that may enhance ACE-2 binding^{2}, thus increasing the efficiency of virus transmission. A higher or more persistent viral burden in the nasopharynx could also increase transmissibility. To test whether acute infection with B.1.1.7 is associated with higher or more sustained nasopharyngeal viral concentrations, we assessed longitudinal PCR tests performed in a cohort of 65 individuals infected with SARS-CoV-2 undergoing daily surveillance testing, including seven infected with B.1.1.7, as confirmed by whole genome sequencing.

We estimated (1) the time from first detectable virus to peak viral concentration (proliferation time), (2) the time from peak viral concentration to initial return to the limit of detection (clearance time), and (3) the peak viral concentration for each individual (**Supplementary Appendix**).^{3} We estimated the means of these quantities separately for individuals infected with B.1.1.7 and non-B.1.1.7 SARS-CoV-2 (**Figure 1**). For individuals infected with B.1.1.7, the mean duration of the proliferation phase was 5.3 days (90% credible interval [2.7, 7.8]), the mean duration of the clearance phase was 8.0 days [6.1, 9.9], and the mean overall duration of infection (proliferation plus clearance) was 13.3 days [10.1, 16.5]. These compare to a mean proliferation phase of 2.0 days [0.7, 3.3], a mean clearance phase of 6.2 days [5.1, 7.1], and a mean duration of infection of 8.2 days [6.5, 9.7] for non-B.1.1.7 virus. The peak viral concentration for B.1.1.7 was 19.0 Ct [15.8, 22.0] compared to 20.2 Ct [19.0, 21.4] for non-B.1.1.7. This converts to 8.5 log_{10} RNA copies/ml [7.6, 9.4] for B.1.1.7 and 8.2 log_{10} RNA copies/ml [7.8, 8.5] for non-B.1.1.7. Data and code are available online.^{4}

These data offer evidence that SARS-CoV-2 variant B.1.1.7 may cause longer infections with similar peak viral concentration compared to non-B.1.1.7 SARS-CoV-2, and this extended duration may contribute to B.1.1.7 SARS-CoV-2’s increased transmissibility. The findings are preliminary, as they are based on seven B.1.1.7 cases. However, if borne out by additional data, a longer isolation period than the currently recommended 10 days after symptom onset^{5} may be needed to effectively interrupt secondary infections by this variant. Collection of longitudinal PCR and test positivity data in larger and more diverse cohorts is needed to clarify the viral trajectory of variant B.1.1.7. Similar analyses should be performed for other SARS-CoV-2 variants such as B.1.351 and P.1.

## Data Availability

Data and code are available online at https://github.com/skissler/CtTrajectories_B117

## Supplementary Appendix

### Ethics

Residual de-identified viral transport media from anterior nares and oropharyngeal swabs collected from players, staff, vendors, and associated household members from a professional sports league were obtained from BioReference Laboratories. In accordance with the guidelines of the Yale Human Investigations Committee, this work with de-identified samples was approved for research not involving human subjects by the Yale Internal Review Board (HIC protocol # 2000028599). This project was designated exempt by the Harvard IRB (IRB20-1407).

### Study population

The data reported here represent a convenience sample including team staff, players, arena staff, and other vendors (e.g., transportation, facilities maintenance, and food preparation) affiliated with a professional sports league. Clinical samples were obtained by combined swabs of the anterior nares and oropharynx administered by a trained provider. Viral concentration was measured using the cycle threshold (Ct) according to the Roche cobas target 1 assay. For an initial pool of 298 participants who first tested positive for SARS-CoV-2 infection during the study period (between November 28^{th}, 2020 and January 20^{th}, 2021), a diagnosis of “novel” or “persistent” infection was recorded. “Novel” denoted a likely new infection while “persistent” indicated the presence of virus in a clinically recovered individual. A total of 65 individuals (90% male) had novel infections that met our inclusion criteria: at least five positive PCR tests (Ct < 40) and at least one test with Ct < 35. Seven of these individuals were infected with the B.1.1.7 variant as confirmed by genomic sequencing.

### Genome sequencing and lineage assignments

RNA was extracted from remnant nasopharyngeal diagnostic specimens and used as input for SARS-CoV-2 genomic sequencing as previously described.^{6} Samples were sequenced on the Oxford Nanopore MinION. Consensus sequences were generated using the ARTIC Network analysis pipeline^{7} and samples with >80% genome coverage were included in analysis. Individual SARS-CoV-2 genomes were assigned to PANGO lineages using Pangolin v.2.1.8.^{8} All viral genomes assigned to the B.1.1.7 lineage were manually examined for representative mutations.^{9}

### Converting Ct values to viral genome equivalents

To convert Ct values to viral genome equivalents, we first converted the Roche cobas target 1 Ct values to equivalent Ct values on a multiplexed version of the RT-qPCR assay from the US Centers for Disease Control and Prevention.^{10} We did this following our previously described methods.^{3} Briefly, we adjusted the Ct values using the best-fit linear regression between previously collected Roche cobas target 1 Ct values and CDC multiplex Ct values using the following regression equation:

Here, *y*_{i} denotes the *i*^{th} Ct value from the CDC multiplex assay, *x*_{i} denotes the *i*^{th} Ct value from the Roche cobas target 1 test, and *ε*_{i} is an error term with mean 0 and constant variance across all samples. The coefficient values are *β*_{0} = –6.25 and *β*_{1} = 1.34.

Ct values were fitted to a standard curve in order to convert Ct value data to RNA copies. Synthetic T7 RNA transcripts corresponding to a 1,363 b.p. segment of the SARS-CoV-2 nucleocapsid gene were serially diluted from 10^{6}-10^{0} RNA copies/μl in duplicate to generate a standard curve^{11} **(Supplementary Table 1)**. The average Ct value for each dilution was used to calculate the slope (−3.60971) and intercept (40.93733) of the linear regression of Ct on log-10 transformed standard RNA concentration, and Ct values from subsequent RT-qPCR runs were converted to RNA copies using the following equation:

Here, [RNA] represents the RNA copies /ml. The log_{10}(250) term accounts for the extraction (300 μl) and elution (75 μl) volumes associated with processing the clinical samples as well as the 1,000 μl/ml unit conversion.

### Model fitting

For the statistical analysis, we removed any sequences of 3 or more consecutive negative tests to avoid overfitting to these trivial values. Following our previously described methods,^{3} we assumed that the viral concentration trajectories consisted of a proliferation phase, with exponential growth in viral RNA concentration, followed by a clearance phase characterized by exponential decay in viral RNA concentration.^{12} Since Ct values are roughly proportional to the negative logarithm of viral concentration^{13}, this corresponds to a linear decrease in Ct followed by a linear increase. We therefore constructed a piecewise-linear regression model to estimate the peak Ct value, the time from infection onset to peak (*i*.*e*. the duration of the proliferation stage), and the time from peak to infection resolution (*i*.*e*. the duration of the clearance stage). The trajectory may be represented by the equation

Here, E[*Ct(t)*] represents the expected value of the Ct at time *t*, “l.o.d” represents the RT-qPCR limit of detection, *δ* is the absolute difference in Ct between the limit of detection and the peak (lowest) Ct, and *t*_{o}, *t*_{p}, and *t*_{r} are the onset, peak, and recovery times, respectively.

Before fitting, we re-parametrized the model using the following definitions:

Δ

*Ct(t)*= l.o.d. –*Ct(t)*is the difference between the limit of detection and the observed Ct value at time*t*.*ω*_{p}*= t*_{p}*- t*_{o}is the duration of the proliferation stage.*ω*_{c}*= t*_{r}*- t*_{p}is the duration of the clearance stage.

We constrained 0.25 ≤ *ω*_{p} ≤ 14 days and 2 ≤ *ω*_{p} ≤ 30 days to prevent inferring unrealistically small or large values for these parameters for trajectories that were missing data prior to the peak and after the peak, respectively. We also constrained 0 ≤ *δ* ≤ 40 as Ct values can only take values between 0 and the limit of detection (40).

We next assumed that the observed Δ*Ct(t)* could be described the following mixture model:
where E[Δ*Ct(t)*] = l.o.d. - E[*Ct(t)*] and *λ* is the sensitivity of the q-PCR test, which we fixed at 0.99. The bracket term on the right-hand side of the equation denotes that the distribution was truncated to ensure Ct values between 0 and the limit of detection. This model captures the scenario where most observed Ct values are normally distributed around the expected trajectory with standard deviation *σ(t)*, yet there is a small (1%) probability of an exponentially distributed false negative near the limit of detection. The log(10) rate of the exponential distribution was chosen so that 90% of the mass of the distribution sat below 1 Ct unit and 99% of the distribution sat below 2 Ct units, ensuring that the distribution captures values distributed at or near the limit of detection. We did not estimate values for *λ* or the exponential rate because they were not of interest in this study; we simply needed to include them to account for some small probability mass that persisted near the limit of detection to allow for the possibility of false negatives.

We used a hierarchical structure to describe the distributions of *ω*_{p}, *ω*_{r}, and *δ* for each individual based on their respective population means *μ*_{ωp}, *μ*_{ωr}, and *μ*_{δ} and population standard deviations *σ*_{ωp}, *σ*_{ωr}, and *σ*_{δ} such that

We inferred separate population means (*μ*_{•}) for B.1.1.7- and non-B.1.1.7-infected individuals. We used a Hamiltonian Monte Carlo fitting procedure implemented in Stan (version 2.24)^{14} and R (version 3.6.2)^{15} to estimate the individual-level parameters *ω*_{p}, *ω*_{r}, *δ*, and *t*_{p} as well as the population-level parameters *σ**, *μ*_{ωp}, *μ*_{ωr}, *μ*_{δ}, *σ*_{ωp}, *σ*_{ωr}, and *σ*_{δ}. We used the following priors:

*Hyperparameters:*

*Individual-level parameters:*

The values in square brackets denote truncation bounds for the distributions. We chose a vague half-Cauchy prior with scale 5 for the observation variance *σ**. The priors for the population mean values (*μ*_{•}) are normally distributed priors spanning the range of allowable values for that parameter; this prior is vague but expresses a mild preference for values near the center of the allowable range. The priors for the population standard deviations (*σ*_{•}) are half Cauchy-distributed with scale chosen so that 90% of the distribution sits below the maximum value for that parameter; this prior is vague but expresses a mild preference for standard deviations close to 0.

We ran four MCMC chains for 1,000 iterations each with a target average proposal acceptance probability of 0.8. The first half of each chain was discarded as the warm-up. The Gelman R-hat statistic was less than 1.1 for all parameters. This indicates good overall mixing of the chains. There were no divergent iterations, indicating good exploration of the parameter space. The posterior distributions for *μ*_{δ}, *μ*_{ωp}, and *μ*_{ωr}, were estimated separately for individuals infected with B.1.1.7 and non-B.1.1.7. These are depicted in **Figure 1** (main text). Draws from the individual posterior viral trajectory distributions are depicted in **Supplementary Figure 1**. The mean posterior viral trajectories for each individual are depicted in **Supplementary Figure 2**.

### Checking for influential outliers

To examine whether the posterior distributions for the B.1.1.7-infected individuals reflected the influence of a single outlier, we re-fit the model seven times, omitting one of the B.1.1.7 trajectories each time. The inferred parameter values were fairly consistent, though omitting either of two of the B.1.1.7 cases (cases 5 and 6 in **Supplementary Table 2**). yields an infection duration with a 90% credible interval that overlaps with that of the non-B.1.1.7 90% credible interval for infection duration.

## Footnotes

↵† denotes co-senior authorship