ABSTRACT
Background Information on COVID-19 in representative community surveillance is limited, particularly regarding cycle threshold (Ct) values (a proxy for SARS-CoV-2 viral load) and symptoms.
Methods We included all positive nose and throat swabs between 26 April-11 October 2020 from the UK’s national COVID-19 Infection Survey, tested by RT-PCR for the N, S and ORF1ab genes. We investigated predictors of median Ct value using quantile regression.
Results 1892(0.22%) of 843,851 results were positive, 1362(72%), 185(10%) and 345(18%) for 3, 2 or 1 genes respectively. Ct for different genes were strongly correlated (rho=0.99) with overall median Ct 26.2 (IQR 19.7-31.1; range 10.3-37.6), corresponding to ∼2,500 dC/ml (IQR 80-240,000). Ct values were independently lower in those reporting symptoms, with more genes detected, and in first (vs. subsequent) positives per-participant, with no evidence of independent effects of sex, ethnicity, age, deprivation or other test characteristics (p>0.20). Whilst single-gene positives without reported symptoms almost invariably had Ct>30, triple-gene positives without reported symptoms had widely varying Ct. Incorporating pre-test probability and Ct values, 1547(82%) and 112(6%) positives had “higher” or “lower” supporting evidence for genuine infection. Ct values, symptomatic percentages and supporting evidence changed over time. With lower positivity in the summer, there were proportionally more “lower” evidence positives, and “higher” evidence positives had higher Ct values (p<0.0001), suggesting lower viral burden. Declines in mean/median Ct values were apparent throughout August and preceded increases in positivity rates.
Conclusions Community SARS-CoV-2 infections show marked variation in viral load. Ct values could be a useful epidemiological early-warning indicator.
SUMMARY Cycle threshold (Ct) values in community SARS-CoV-2 infections from national surveillance vary markedly, including over time and by symptoms (1892 (0.22%) positive nose and throat swabs, 843,851 tested). Ct values could be a useful epidemiological early warning indicator.
INTRODUCTION
After initial reductions in SARS-CoV-2 cases in mid-2020, following release of large-scale lockdowns[1], infection rates are re-surging in many countries worldwide. Proposed control strategies include new local or national lockdowns of varying intensity and mass testing, but have major economic and practical limitations. In particular, mass testing of large numbers without symptoms[2], and hence low pre-test probability of positivity, can mean most positives are false-positives depending on test specificity. For example, with 0.1% true prevalence, testing 100,000 individuals with a 99.9% specific test with perfect sensitivity gives 100 true-positives, but also 100 false-positives (positive predictive value (PPV) 50%), whereas specificity of 99.5% increases false-positives to 500 (PPV=17%), and of 99.0% to 999 (PPV=9%), with even lower PPV with imperfect sensitivity[3].
Mathematical models are powerful tools for evaluating the potential effectiveness of different control strategies, but rely on population-level estimates of infectivity and other parameters. However, there are few unbiased community-based surveillance studies, including individuals both with and without symptoms. Estimates of asymptomatic infection rates vary, being only 17-20% overall in recent reviews[4, 5], but these included many studies of contacts of confirmed cases. Higher prevalence of asymptomatic infection has been reported in screening of defined populations (30%[4]) and community surveillance (e.g. 42%[6], 72%[7]). Studies have generally indicated lower rates of transmission from asymptomatic infection[4, 5]; this may be a proxy for SARS-CoV-2 viral load, as a key determinant of transmission. Finally, most studies rely on “average” estimates of the asymptomatic infection percentage, independent of characteristics and viral load, and have not quantified temporal variation in these key parameters for mathematical models across the community.
Here we therefore characterise variation in SARS-CoV-2 positive tests in the first five months of the UK’s national COVID-19 Infection Survey (CIS), which is based on a representative sample of households with longitudinal follow-up. We estimate predictors of RT-PCR cycle threshold (Ct) values (as a proxy for viral load), propose a classification for the strength of evidence supporting positive RT-PCR test results in the community, and demonstrate how this has changed over time.
METHODS
This study included all positive SARS-CoV-2 RT-PCR results between 26 April and 11 October 2020 from nose and throat swabs taken from participants in the Office for National Statistics (ONS) CIS (ISRCTN21086382). The survey randomly selects private households on a continuous basis from address lists and from previous surveys to provide a representative UK sample. If anyone aged 2 years or older currently resident in an invited household agreed verbally to participate, a study worker visited the household to take written informed consent, which was obtained from parents/carers for those 2-15 years; those aged 10-15 years provided written assent. The study protocol is available at https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/protocol-and-information-sheets.
Individuals were asked about demographics, symptoms, contacts and relevant behaviours (https://www.ndm.ox.ac.uk/covid-19/covid-19-infection-survey/case-record-forms). To reduce transmission risks, self-taken nose and throat swabs were obtained following study worker instructions. Parents/carers took swabs from children under 12 years. At the first visit, participants were asked for (optional) consent for follow-up visits every week for the next month, then monthly for 12 months from enrolment. In a random 10-20% households, those 16 years or older were invited to provide blood monthly for assays of anti-trimeric spike protein IgG using an immunoassay developed by the University of Oxford[8]. The study received ethical approval from the South Central Berkshire B Research Ethics Committee (20/SC/0195).
Swabs were analysed at the UK’s national Lighthouse Laboratories at Milton Keynes (National Biocentre) (from 26 April) and Glasgow (from 16 August) using identical methodology, with swabs from specific regions sent consistently to one laboratory. RT-PCR for three SARS-CoV-2 genes (N protein, S protein and ORF1ab) used the Thermo Fisher TaqPath RT-PCR COVID-19 Kit, analysed using UgenTec Fast Finder 3.300.5 (TaqMan 2019-nCoV Assay Kit V2 UK NHS ABI 7500 v2.1). The Assay Plugin contains an Assay specific algorithm and decision mechanism that allows conversion of the qualitative amplification Assay PCR raw data from the ABI 7500 Fast into test results with minimal manual intervention. Samples are called positive in the presence of at least single N gene and/or ORF1ab but may be accompanied with S gene (1, 2 or 3 gene positives). S gene is not considered a reliable single gene positive (as of mid-May 2020).
Twelve specific symptoms were elicited at each visit (cough, fever, myalgia, fatigue, sore throat, shortness of breath, headache, nausea, abdominal pain, diarrhoea, loss of taste, loss of smell), as was whether participants thought they had (unspecified) symptoms compatible with COVID-19. From 26 April through 22 July, questions referred to current symptoms, and from 23 July to the preceding 7 days. Any positive response to any symptom question at the swab-positive visit defined the case as symptomatic “at” the test; we also separately defined any positive response at the swab-positive visit or visits either side (regardless of time between visits) as symptomatic “around” the test.
To investigate the potential increasing contribution of false-positives as population prevalence declines, from 2 August we arbitrarily classified in real-time each positive as:
“Higher” evidence: two or three genes detected (irrespective of Ct).
“Moderate” evidence: single-gene detected and (i) Ct below the 97.5th percentile of “higher” evidence positives (<34; supporting this threshold, whole genome sequences had been obtained from three single gene positives with Ct 30.8-33.1 by 2 August) or (ii) higher pre-test probability of infection, defined as any symptoms at/around the test or reporting working in a patient-facing healthcare or care/residential home.
“Lower” evidence: all other positives; by definition single-gene detected at Ct≥34 in individuals not reporting symptoms/working in relevant roles.
We assessed independent predictors of Ct values using median (quantile) regression (details in Supplementary Methods).
RESULTS
Number and percentage of positive swabs
From 26 April to 11 October 2020, 232,189 participants from 116,008 households had results from median 3 (IQR 2-5, range 1-13) nose and throat swabs each (53,517(23%) had only one result to date). 21,747(9%), 16,186(7%), 17,348(7%), 37,356(16%), 95,184(41%) and 44,368(19%) were recruited in April/May, June, July, August, September and October respectively. Of 843,851 swab test results, 1892 (0.224%, 95% CI 0.214-0.234%) were positive, in 1,516 individuals from 1,209 households. Of these participants, 625(41%) were positive at their first test in the study and 891 (59%) subsequently, after median 2 negative tests (IQR 1-4, range 1-9).
Viral characteristics
Overall, 1362(72%), 185(10%) and 345(18%) swabs were positive for three, two or one gene(s) respectively (Table 1). Where multiple genes were detected, the Cts were highly correlated (Spearman rho=0.99, p<0.0001). Taking the per-swab mean Ct across positive genes, the overall median Ct was 26.2 (IQR 19.7-31.1; range 10.3-37.6), varying strongly by number of genes detected (Kruskal-Wallis p=0.0001), but not by their specific pattern after adjusting for number (p=0.58). Based on linearity data (Supplementary Figure 1), this corresponds to a median viral load of ∼2,500 dC/ml (IQR 80-240,000). Only four Ct values >37 were recorded (one S positive only (May); three N positive only (October)).
Evidence supporting positive results
1547(82%), 233(12%) and 112(6%) positive tests had “higher”, “moderate” or “lower” evidence supporting genuine positivity (Table 2; definitions in Methods). Even though “higher” evidence was based only on number of genes detected, “higher” evidence positives were more likely to be symptomatic than “moderate” evidence positives (p<0.0001), but were similarly likely to have occupational risk factors (p=0.98). “Higher” evidence positives were more likely to occur in households with other positives (p<0.0001).
Predictors of Ct values
In multivariable regression models, Ct values were independently lower (i.e. viral loads higher) with more genes detected (10.4 lower in triple-gene vs single-gene positives (95% CI 11.3 to 9.5)), if symptoms were reported around the test (1.7 lower (2.4 to 1.0)), and at the first (vs. subsequent) positive per-participant (2.8 lower (3.7 to 2.0) all p<0.0001; Supplementary Table 1A), with by far the strongest effect associated with triple-gene positives. There was weak evidence of lower values in those reporting cough/fever/anosmia (2.1 lower (3.0 to 1.3)) vs. other symptoms (1.1 (2.1 to 0.0); heterogeneity p=0.06). Associations were similar but slightly attenuated for symptoms at the positive test. After adjusting for these factors, there was no evidence of independent effects of sex, ethnicity, age, deprivation or whether the positive result was the first test in the study (p>0.20, Supplementary Table 1A). Even after adjusting for number of genes detected, symptoms and first (vs. subsequent) positive, Ct values were 1.7 (2.4 to 1.0) lower in individuals where another household member was positive at any point in the study (p<0.0001; other effects similar).
However, number of genes detected and symptoms are both potential mediators of effects of demographic factors (Supplementary Figure 2); and there were strong effects of calendar time on positivity in different demographic subgroups[9] and on Ct values (Figure 1A). Excluding the potential mediators (number of genes detected, symptoms) and adjusting for visit date, Ct values remained independently lower (i.e. viral loads higher) at the first (vs. subsequent) positive in the study, but were also lower in those in the most deprived quintile (2.6 lower vs least deprived (4.2 to 1.1) p=0.001) and when the positive was not the first test in the study (1.4 lower (2.5 to 0.2), p=0.02) (Supplementary Table 1B). However, there was still no evidence of effects of sex (p=0.31), age (p=0.24) or ethnicity (p=0.14).
Of note, whilst single-gene positives without reported symptoms almost invariably had Ct>30, triple-gene positives without reported symptoms had widely varying Ct (Figure 1A). Further, whilst the percentage reporting symptoms increased linearly as Ct values dropped from 35 to 25, below 25, the percentages reporting symptoms stayed approximately constant (Figure 2).
Temporal changes in Ct values, evidence and symptomatic percentages
There were also strong effects of calendar time on proportions with any evidence of symptoms, or reporting cough/fever/anosmia (Figure 1C), and strength of supporting evidence (Figure 1D; all p<0.0001), with markedly fewer positives with Ct <30 (Figure 1B), very low percentages with symptoms at/around positive tests, and more “lower” evidence positives in July/August. However, during this period, even “higher” evidence positives had higher Ct vs. earlier and later (p=0.0001; heterogeneity; Figure 3A). “Lower” evidence positives also formed a larger percentage of all tests during this period (0.017%; Figure 3B) and April/early May (when swabs with only the S gene detected were called positive) (0.039%). However, interestingly, from September, the percentage of “lower” evidence positives increased proportionately with “moderate” and “higher” evidence positives (Figure 3B).
Relationship with serostatus
Antibody results were available for 88(6%) participants with positive swabs, but relatively few both before and after the first positive swab (Supplementary Figure 3). The majority (45/59, 76%) of those with antibody results before the first swab-positive were antibody-negative (Supplementary Figure 4), but seven participants appeared to have become infected despite antecedent high anti-spike antibody titres (Figure 4): five single positives without reported symptoms (three “higher”, two “lower” evidence), one with three positives with symptoms (“higher” evidence”) and one with two positives without reported symptoms (“higher” evidence) more than two months apart and separated by four negative intervening RT-PCR swabs. Plausible seroconversion events were seen in three RT-PCR positive cases (Figure 4), with no evidence of seroconversion in 15; eight “higher”, two “moderate” and four “lower” evidence positives without reported symptoms, and one “moderate” evidence symptomatic positive (Figure 4). The remainder had unclear patterns or insufficient data (Supplementary Figure 5).
DISCUSSION
In this large community surveillance study, we found wide variation in Ct values (a proxy for viral load). Whilst Ct values were independently associated with symptoms at/around the test, as previously reported[10, 11], and with the number of genes detected, there was no evidence of association with sex, ethnicity or age (as previously reported for sex and age[12, 13]) and effects of symptoms were small compared with population-level variability. Notably triple-gene positives without reported symptoms had widely varying Ct, including many low levels (Figure 1A), potentially explaining variation in dispersion (“k”) and super-spreading events, particularly from those without symptoms but with low Ct/high viral loads[14].
Ct values also varied strongly over time, as did symptoms and evidence supporting positives, suggesting changing viral burden in infection cases, with less severe phenotypes during July/early August 2020. This strongly refutes hypotheses that viral fitness has declined. During this time, higher Ct values were also noted in the English point-prevalence surveillance study, REACT[7], and lower virus levels in Lausanne, Switzerland[13]. However, Ct values were higher even in “higher” evidence positives during this period, consistent with shifting viral burden. Such a shift may also explain the preceding shift towards “moderate” evidence positives and the concurrent higher percentage of “lower” evidence positives, since the less virus present, the less likely it is to be detected on multiple genes. Whilst these findings are consistent with lower viral inoculum during this period[15], we cannot assess whether this is predominantly due to behaviour (e.g. increased time outdoors, face mask use[16]) or other reasons (e.g. environmental/climatic factors).
We used laboratory, clinical and demographic evidence to classify our confidence in positive results. Around 80% had 2 or 3 genes detected (“higher” evidence), providing assurance in overall results, and all but four Ct values were under 37. Whilst Ct values are not directly comparable between studies, REACT has also validated a Ct threshold of 37 for single-gene positives for their test performed in Germany[17], and in the Public Health England (PHE) Schools study, only samples with Ct<37 were positive on repeat testing of the same swab at PHE laboratories[18]. However, every diagnostic test has false-positives, and so some of our single gene “lower”, or even “moderate”, evidence positives are inevitably false. However, the false-positive rate would be expected to be approximately constant over time, since it is either random or driven by external factors. Variation in the percentage of all tests accounted for by “lower” evidence positives, and in particular proportionate increases in “lower” evidence positives as “higher” evidence positives increased during September supports more genuinely lower-level infections occurring during the summer, and an overall false-positive rate for this test of below ∼0.005% i.e. at least 99.995% specificity.
Since RT-PCR and antigen assays test for viral presence, it is more relevant to consider limits of detection, rather than “false-positives” per se. Although they were a small minority (6%), one question is whether single-gene positives with high Ct (≥34 in our study) solely represent long-term shedding of non-transmissible virus[19], with, for example, infectious virus recovered from only 8% (95% CI 3-18%) of samples with Ct>35 in a PHE study[20] and studies reporting no growth of virus for Ct thresholds from >24 to >34 or higher[21]. Whilst we have not directly assessed household transmission in this study, it was notable that Ct values were significantly lower in positives where anyone else in the same household was ever positive, supporting a role for greater within-household transmission with lower Ct values. Ct values were 1.4 higher in those positive at their first study test (where long-term shedders would be expected to be overrepresented), but these formed only 33% of the positives.
Although numbers are small, our evaluation of serological responses is one of few in the community to our knowledge, and highlights that a significant proportion of these RT-PCR-positive cases do not appear to seroconvert. Unfortunately whole genome sequence data was not available to confirm potential re-infections in a small proportion of individuals, but one case had “higher evidence” positive tests spanning four negative swabs with a long sampling interval (>30 days) between positive swabs (number 21, Figure 4), and six cases had positive swabs after negative swabs on a background of high anti-spike IgG titres (Figure 4). Presumed re-infections have been reported elsewhere[22], including in individuals without previous functional and/or durable antibody responses[23, 24], and may remain relevant to virus transmission, whether they occur with or without symptoms. Our data suggest that these may occur in the presence of anti-spike antibodies, which correlate with neutralising antibody titres. These antibody titres are unlikely to have been false-positives, given the context, persistence, and known diagnostic and analytical specificity of the assay[8], or to all reflect laboratory identifier errors, but further studies are clearly needed.
A major study strength is its design, being a large-scale community survey. However, this is also a limitation, since we were not able to comprehensively characterise individual positives. We may have underestimated the initial prevalence of symptoms due to originally asking about current symptoms, although this was predominantly at the earliest weekly visits (so only very transient symptoms between visits would have been missed). Similar rates of symptom reporting in the first and last third of the study suggests that this question was likely generously interpreted in any case. We made no attempt to collect additional information on symptoms after positives were identified to minimise recall bias. This may partly explain why we observed higher rates of positive tests without reported symptoms than recent reviews[4, 5]; however, many studies in these reviews tested close contacts of index cases identified through symptoms and therefore might plausibly have higher viral loads.
Ultimately the importance of asymptomatic and low virus level infections depends on their transmissibility and their prevalence; regardless of limitations in symptom ascertainment, infection without recognition has the potential for onward transmission and unascertained infections are likely critical for avoiding resurgence after lifting lockdown[25]. Our findings support the use of Ct values and genes detected more broadly in public testing programmes, predominantly testing symptomatic individuals and case contacts, as an “early warning” system for shifts in potential infectious load and hence transmission, and hence the risks posed by individuals to others. This has recently also been proposed on the basis of theoretical work linking effective reproduction numbers to population level Ct[26]. For example, declines in mean and median Ct values were apparent throughout August (Figure 1B), although positivity rates in the survey were only noted to increase in early September[9]. Ct data are widely available within laboratory management systems and could be used alongside available risk factor and symptom information to facilitate more informed and effective individual-level and public health responses to the SARS-CoV-2 pandemic.
Data Availability
De-identified study data are available for access by accredited researchers in the ONS Secure Research Service (SRS) for accredited research purposes under part 5, chapter 5 of the Digital Economy Act 2017. For further information about accreditation, contact Research.Support{at}ons.gov.uk or visit the SRS website.
FUNDING
This study is funded by the Department of Health and Social Care. ASW, EP, JVR, TEAP, NS, DE, KBP are supported by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Healthcare Associated Infections and Antimicrobial Resistance at the University of Oxford in partnership with Public Health England (PHE) (NIHR200915). ASW and TEAP are also supported by the NIHR Oxford Biomedical Research Centre. EP and KBP are also supported by the Huo Family Foundation. ASW is also supported by core support from the Medical Research Council UK to the MRC Clinical Trials Unit [MC_UU_12023/22] and is an NIHR Senior Investigator. PCM is funded by Wellcome (intermediate fellowship, grant ref 110110/Z/15/Z) and holds an NIHR BRC Senior Fellowship award. The views expressed are those of the authors and not necessarily those of the National Health Service, NIHR, Department of Health, or PHE.
CONFLICTS OF INTEREST
DWE declares lecture fees from Gilead, outside the submitted work. No other author has a conflict of interest to declare.
DATA AVAILABILITY
De-identified study data are available for access by accredited researchers in the ONS Secure Research Service (SRS) for accredited research purposes under part 5, chapter 5 of the Digital Economy Act 2017. For further information about accreditation, contact Research.Support{at}ons.gov.uk or visit the SRS website.
ACKNOWLEDGEMENTS
Office for National Statistics: Iain Bell, Ian Diamond, Alex Lambert, Pete Benton, Emma Rourke, Stacey Hawkes, Sarah Henry, James Scruton, Peter Stokes, Tina Thomas.
Office for National Statistics, Analysis: John Allen, Russell Black, Heather Bovill, David Braunholtz, Dominic Brown, Sarah Collyer, Megan Crees, Colin Daglish, Byron Davies, Hannah Donnarumma, Julia Douglas-Mann, Antonio Felton, Hannah Finselbach, Eleanor Fordham, Alberta Ipser, Joe Jenkins, Joel Jones, Katherine Kent, Geeta Kerai, Lina Lloyd, Victoria Masding, Ellie Osborn, Alpi Patel, Elizabeth Pereira, Tristan Pett, Melissa Randall, Donna Reeve, Palvi Shah, Ruth Snook, Ruth Studley, Esther Sutherland, Eliza Swinn, Heledd Thomas, Anna Tudor, Joshua Weston.
Office for National Statistics, Secure Research Service: Shayla Leib, James Tierney, Gabor Farkas, Raf Cobb, Folkert van Galen, Lewis Compton, James Irving, John Clarke, Rachel Mullis, Lorraine Ireland, Diana Airimitoaie, Charlotte Nash, Danielle Cox, Sarah Fisher, Zoe Moore, James McLean, Matt Kerby.
University of Oxford, Nuffield Department of Medicine: Ann Sarah Walker, Derrick Crook, Philippa C Matthews, Tim Peto, Emma Pritchard, Nicole Stoesser, Karina-Doris Vihta, Alison Howarth, George Doherty, James Kavanagh, Kevin K Chau, Stephanie B Hatch, Daniel Ebner, Lucas Martins Ferreira, Thomas Christott, Brian D Marsden, Wanwisa Dejnirattisai, Juthathip Mongkolsapaya, Sarah Hoosdally, Richard Cornall, David I Stuart, Gavin Screaton.
University of Oxford, Nuffield Department of Population Health: Koen Pouwels.
University of Oxford, Big Data Institute: David W Eyre.
University of Oxford, Radcliffe Department of Medicine: John Bell.
Oxford University Hospitals NHS Foundation Trust: Stuart Cox, Kevin Paddon, Tim James.
University of Manchester: Thomas House.
Public Health England: John Newton, Julie Robotham, Paul Birrell.
IQVIA: Helena Jordan, Tim Sheppard, Graham Athey, Dan Moody, Leigh Curry, Pamela Brereton.
National Biocentre: Ian Jarvis, Kirsty Howell, Bobby Mallick, Phil Eeles.
Glasgow Lighthouse Laboratory: Jodie Hay, Harper Vansteenhouse.
Footnotes
See Acknowledgements for the Coronavirus Infection Survey team