Contact Tracing of COVID-19 in Karnataka, India: Superspreading and Determinants of Infectiousness and Symptomaticity

Background: India has experienced the second largest outbreak of COVID-19 globally, yet there is a paucity of studies analysing contact tracing data in the region. Such studies can elucidate essential transmission metrics which can help optimize disease control policies. Methods: We analysed contact tracing data collected under the Integrated Disease Surveillance Programme from Karnataka, India between 9 March and 21 July 2020. We estimated metrics of disease transmission including the reproduction number (R), overdispersion (k), secondary attack rate (SAR), and serial interval. R and k were jointly estimated using a Bayesian Markov Chain Monte Carlo approach. We evaluated the effect of age and other factors on the risk of transmitting the infection, probability of asymptomatic infection, and mortality due to COVID-19. Findings: Up to 21 July, we found 111 index cases that crossed the super-spreading threshold of [≥]8 secondary cases. R and k were most reliably estimated at R 0.75 (95% CI, 0.62-0.91) and k 0.12 (0.11-0.15) for confirmed traced cases (n=956); and R 0.91 (0.72-1.15) and k 0.22 (0.17-0.27) from the three largest clusters (n=394). Among 956 confirmed traced cases, 8.7% of index cases had 14.4% of contacts but caused 80% of all secondary cases. Among 16715 contacts, overall SAR was 3.6% (3.4-3.9) and symptomatic cases were more infectious than asymptomatic cases (SAR 7.7% vs 2.0%; aRR 3.63 [3.04-4.34]). As compared to infectors aged 19-44 years, children were less infectious (aRR 0.21 [0.07-0.66] for 0-5 years and 0.47 [0.32-0.68] for 6-18 years). Infectors who were confirmed [≥]4 days after symptom onset were associated with higher infectiousness (aRR 3.01 [2.11-4.31]). Probability of symptomatic infection increased with age, and symptomatic infectors were 8.16 (3.29-20.24) times more likely to generate symptomatic secondaries. Serial interval had a mean of 5.4 (4.4-6.4) days with a Weibull distribution. Overall case fatality rate was 2.5% (2.4-2.7) which increased with age. Conclusion: We found significant heterogeneity in the individual-level transmissibility of SARS-CoV-2 which could not be explained by the degree of heterogeneity in the underlying number of contacts. To strengthen contact tracing in over-dispersed outbreaks, testing and tracing delays should be minimised, retrospective contact tracing should be considered, and contact tracing performance metrics should be utilised. Targeted measures to reduce potential superspreading events should be implemented. Interventions aimed at children might have a relatively small impact on reducing SARS-CoV-2 transmission owing to their low symptomaticity and infectivity. There is some evidence that symptomatic cases produce secondary cases that are more likely to be symptomatic themselves which may potentially cause a snowballing effect on infectiousness and clinical severity across transmission generations; further studies are needed to confirm this finding. Funding: Giridhara R Babu is funded by an Intermediate Fellowship by the Wellcome Trust DBT India Alliance (Clinical and Public Health Research Fellowship); grant number: IA/CPHI/14/1/501499.


Introduction
COVID-19, a pneumonia caused by the novel coronavirus SARS-CoV-2 originated in Wuhan, China (1). As of 18th October 2020, the disease has spread to over 200 countries and territories, causing over 39.81M cases and 1.11M deaths of which India contributed to 7.5M cases with over 100,000 deaths. (2) Karnataka, a south-Indian state inhabited by more than 61 million people, (3) confirmed its first COVID-19 case on 9th March 2020. By 18th October, Karnataka had the third-highest COVID-19 case burden of all states in India with 765,586 cumulative cases.
Contact tracing remains one of the key public health responses in infectious disease control with a history that can be traced back to the late 19th century. (4) In the present COVID-19 pandemic, contact tracing may achieve significant outbreak control when effective reproduction number is lower due to social distancing and nonpharmaceutical interventions (NPIs). (5) In addition, primary contact tracing data is extremely valuable in elucidating transmission characteristics of an infectious disease which can be used to inform policy.
A disease's basic reproductive number R0 describes the 'average' number of secondary infections (offsprings) generated by a single infected individual. Super-spreader events (SSEs) highlight a major limitation of the concept of R0, which is an average and does not capture the heterogeneity of infectiousness. (6) Each infected case does not produce R0 offspring; a small number of individuals may be responsible for a large percentage of secondary infections, whereas most others infect no one. When this occurs, the offspring distribution is said to be overdispersed. The dispersion parameter, k is smaller when superspreading plays a larger role in transmission. Studying overdispersion is crucial since most of the transmission can be eliminated if events and settings conducive to superspreading can be limited by implementing targeted measures, as opposed to overarching policies that would be needed if overdispersion was low and transmission was homogeneous. (7,8) Serial interval, defined as the interval between symptom onset of the index case and the secondary case in a transmission chain, is a key epidemiological measure that determines the spread of an infectious disease. The serial interval is an essential metric in epidemic transmission models and in estimating reproduction numbers used to evaluate the impact of interventions and to inform policy response. (9) Studies across the world estimate the serial interval of SARS-CoV-2 between 3.9-7.5 days. (10) Estimation requires high quality data from primary contact tracing which establishes linkage between transmission pairs and thus, data-backed evidence of serial interval of SARS-CoV-2 in India and other low-resource settings is extremely limited if any. Whereas the secondary attack rate which measures the risk of infection in contacts has been studied in India by Laxminarayanan et al., (11) a comprehensive study looking at superspreading, overdispersion and serial interval has not yet been done in India, mostly owing to a lack of data.
Contact tracing of COVID-19 in Karnataka was driven by multi-sectoral teams and backed by technology, enabling the state to have one of India's most effective contact tracing systems, at least during the early . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint epidemic. (12) Among all Indian states, Karnataka was found to have the highest quality of COVID-19 data reporting in daily bulletins released by the state government. (13) In this study, we aimed to gain insights for disease control by understanding the transmission dynamics of SARS-CoV-2 based on surveillance and tracing data from Karnataka. We estimated the reproduction number (R), overdispersion (k), secondary attack rate (SAR), and serial interval from data and reconstructed major transmission networks. We evaluated the effect of age and other factors on the probability of asymptomatic infection, risk of transmitting the infection further, and mortality due to COVID-19. We also attempt to quantify the effect of contact tracing on reducing onward transmission and on minimising testing delays.

Data sources
We used data generated through surveillance activities undertaken by the Integrated Disease Surveillance Program (IDSP) and the Department of Health and Family Welfare in accordance with national and state policies. Data was de-identified before extraction and analysis. We created a composite dataset by combining daily COVID-19 bulletins (14) released by the Government of Karnataka (up to 21 July 2020, after which individual case details were not available in daily bulletins) and linelist contact tracing data maintained by the state's Integrated Disease Surveillance Programme (up to 1 June 2020).
(15) For confirmed COVID-19 cases from 9 March to 21 July 2020 (n=71068), the government bulletins included the unique patient ID, age, sex, district of reporting, case outcome (followed up till 23 August), the unique ID of upstream contact (if known), travel history (if any), and case surveillance definition (if assigned). For confirmed cases from 9 March to 1 June 2020 (n=3404), the IDSP linelist added the symptom status (asymptomatic or symptomatic) at time of sample collection, date of symptom onset (n=261/308 symptomatic cases), date of sample collection and test results, and number of contacts traced (n=956/3404 cases). The two datasets were merged using the unique patient IDs which were common to both datasets. The study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines. (16)

Case identification and contact tracing
The criteria for administering a COVID-19 test were periodically updated by the Indian Council of Medical Research (table S1). (17) A contact was defined as any individual who has been exposed to a confirmed case anytime between 2 days prior to symptom onset (in the positive case) and date of isolation (or maximum 14 days after onset of symptoms). All contacts were quarantined and monitored for 14 days. The duration (>15 minutes), the proximity (<1 meter), and the nature of exposure were taken into consideration when defining a contact. (18) Contacts were further divided into high-risk or low-risk contacts (supplementary S1). A COVID-19 test was administered to all symptomatic contacts and highrisk asymptomatic contacts between day 5 to 10 after contact. (19) For interstate and international travellers, screening was done at all ports of entry including airports, train stations and road checkposts . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint

Case categorisation and ascertainment of transmission pairs
We divided all cases into four mutually exclusive categories based on origin of infection: imported international (history of international travel within 14 days of symptom onset), imported domestic (history of travel from other Indian states into Karnataka), local with known origin (contact with a known COVID-19 case or present at a location or event linked to COVID-19 transmission, and no travel history outside Karnataka), and local cases with unknown origin (no travel history outside Karnataka and could not be traced to any known COVID-19 case or event). Additionally, we categorised cases on the basis of their surveillance case definition at the time of first presentation: Influenza-like illness (ILI: fever ≥38 C° and cough with onset within the last ten days), or Severe acute respiratory infection (SARI: fever ≥ 38C° and cough with onset within the last 10 days and requires hospitalization). This categorisation was independent of the categories based on origin and was assigned for only cases for whom this information was available. If a newly confirmed case had a history of contact with a previously confirmed case, we assumed that the new case acquired the infection from the previously known case; and hence linked them together as secondary case and index case respectively (probable infector-infectee pair). If a secondary case was in contact with 2 or more index cases, we linked the secondary case to the earliest index case.

Reproduction number (R) and overdispersion parameter (k)
We fit a negative binomial distribution to the observed offspring distribution using a Bayesian Markov Chain Monte Carlo approach and estimated effective reproduction number (R) as the mean, and degree of heterogeneity in transmission as the overdispersion parameter (k) of the fitted distribution. Sample mean and 2.5% and 97.5% percentiles based on post warmup samples were used to evaluate the posterior point estimates and 95% confidence intervals respectively. Since estimates of R and k are sensitive to the completeness of contact tracing, we separately analysed subgroups where contact tracing was known or expected to be comprehensive (supplementary S5 and table S2). Specific metrics of contact tracing adequacy and performance were not available in the dataset. (22) A cluster was defined as two or more confirmed infections with reported close contact. The superspreading threshold was defined as an index case causing eight or more secondary infections. (6,23)

Secondary attack rate, determinants of risk of infection among contacts and risk of symptomatic infection among cases, and case fatality rate
We computed secondary attack rates by dividing the number of positive contacts by the number of contacts traced. Only high-risk or close contacts were included (supplementary S1). We did a retrospective cohort analysis using a Poisson regression model to estimate adjusted relative risks of infection in contacts, and symptomatic infection in cases as a function of the predictor variables (supplementary S3 and S4). (24) Case fatality rate was calculated as the number of deceased cases divided by total reported cases.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint

Serial interval
The serial interval was calculated as the difference between the symptom onset dates of index case and secondary case in each transmission pair. We then fitted parametric distributions (gamma, lognormal, and weibull) to serial interval data using the maximum likelihood method. Akaike Information Criterion (AIC) was used to choose the optimal distribution, and confidence intervals were estimated using 10000 bootstraps.

Results
From 9 March to 21 July 2020, Karnataka reported 71068 cases with a median age of 37 years (IQR 27-50) of which 63.1% were males. Table 1 and Figure 1 give an epidemiologic summary of the studied outbreak with time and demographic characteristics. Early in the epidemic, most cases occurred in middle-aged adults which can be explained by their high mobility. A 'V' shaped pattern can be seen in is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint

Figure 1: Epidemiological summary of the COVID-19 outbreak in Karnataka, India from 9 March to 21 July 2020 (n=71068). [A] Incidence by daily confirmed cases across time. [B] Density scatterplot showing daily confirmed cases for various ages across time. [C] Distribution of population, COVID-19 cases and COVID-19 deaths across age groups. 10year age brackets have been used starting from 0-9 years, and the last bracket is ≥80 years. [D] Daily confirmed cases across time stratified by origin of infection-imported international, imported domestic, local with known origin, and local with unknown origin.
figure 1C, indicating that more older adults and children were infected with time. The incidence increased throughout the study period, with 55826 (78.55%) cases recorded in July alone. Most cases in March were internationally imported followed by low incidence local transmission in April and May. A majority of the 5105 local cases whose origin was known occurred from April to mid-June. The source of infection of 57670 (81.15%) locally transmitted cases could not be ascertained; most of these cases occurred after . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint mid-June. 20.3% of all cases were detected through symptomatic surveillance (17.1% ILI and 3.2% SARI). The median age of ILI cases was 40 (IQR 29-53) and SARI cases was 54 years (IQR 43-65). The mean delay from symptom onset to lab confirmation (testing delay) was 5.1 (4.7-5.6) days (n=261). Local cases with unknown origin had the longest delay among all categories at 5.7 (5.1-6.2) days (n=112). Detailed results of various delays are given in Table S4.

Overdispersion (k), reproduction number (R), and major transmission clusters
We found 111 instances where the index case crossed the super-spreading threshold of ≥8 secondary cases upto 21 July 2020 in Karnataka. A total of 1277 clusters were identified with 6424 linked cases till 21 July 2020 in Karnataka. There were 7 clusters with a size larger than 50 cases and 106 clusters with more than 10 cases each. We characterised the three largest clusters including the Bellary cluster (221 cases), the Delhi convention cluster (97 cases), and the Pharmaceutical company cluster (76 cases). The reconstructed transmission networks for these three clusters are shown in Figure 2. For 394 cases linked to these three largest clusters, the estimated R was 0.91 (0.72-1.15) and k was 0.22 (0.17-0.27). 12.4% of infectious cases caused 80% of all transmission, while 71.2% of cases did not lead to any further transmission among these three clusters ( figure 3D). In late-March, a cluster of infections was seeded by travellers returning from the religious convention at Nizammudin mosque in Delhi. In the Bellary cluster, the infection was seeded by a worker at a steel plant in Bellary (26), followed by rapid spread among the employees living in close-contact in dormitories. The initial infections occurred in the management department which had a high contact rate with employees from other departments thus facilitating spread. A similar cluster was reported in Nanjangud, Mysore (27) where a pharmaceutical company became the origin of a large cluster.

Figure 2: Transmission networks of three largest clusters of SARS-CoV-2 cases in Karnataka, India up to 21 July 2020.
The networks indicate the heterogeneity in transmission from infected cases, with a few patients causing most secondary cases. The Bellary cluster occurred in June and July, during which symptomatic status of cases was not available.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint

Figure 3: Transmission characteristics of SARS-CoV-2 in Karnataka, India. [A,B] Observed offspring distribution of [A] 956 cases with confirmed forward contact tracing and [B] 394 cases linked to the three largest clusters (Bellary cluster, Delhi convention cluster, and Pharmaceutical company cluster). Bars show observed frequency of the number of individuals infected by each case. [C,D] Proportion of all onward transmission (solid line) and proportion of all contacts (dashed line) due to a given proportion of infectious cases, where cases are ranked by infectiousness, for [C] 956 cases with confirmed forward contact tracing and [D] 394 cases linked to the three largest clusters. [E,F] Serial interval distribution of SARS-CoV-2 infections in Karnataka, India and the fitted gamma, lognormal and weibull distributions,
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020.

among [E] 53 infector-infectee pairs and [F] 51 infector-infectee pairs with positive serial intervals. 10000 bootstraps were used for estimating the 95% confidence limits shown by the shaded bands.
For 956 cases whose contacts were confirmed to have been traced, the estimated R was 0.75 (0.62-0.91) and k was 0.12 (0.11-0.15). 8

Secondary attack rate and determinants of risk of infection among contacts
The secondary attack rate (SAR) among all close contacts was 3.6% (95%CI, 3.4-3.9). Results are summarised in table 2 and 3. The median number of contacts identified for an index case were 11 (IQR 5-21). The median number of contacts per case were higher (15 [IQR ) when the index case was confirmed ≥4 days after symptom onset. Symptomatic index cases were more infectious than asymptomatic cases (SAR 7.7% vs 2.0%; aRR 3.63 [3.04-4.34]). As compared to infectors aged 19-44 years, children were less infectious even after controlling for other factors including symptomatic status (aRR=0.21 [0.07-0.66] for 0-5 years and 0.47 [0.32-0.68] for 6-18 years). Adults aged ≥45 years seemed to be more infectious, but this association disappeared when controlling for increased symptomaticity of older adults. Male infectors were found to be less infectious than females (aRR 0.78 [0.66-0.92]). Infectors who had a delay from symptom onset to confirmation of ≥4 days were associated with higher infectiousness (aRR=3.01 [2.11-4.31]). Local cases with unknown origin had the highest infectivity of the four categories by origin, with R of 1.04 (0.87-1.25) and SAR of 8.5% (7.7 -9.4). Cases identified through symptomatic surveillance had the highest SAR of any subgroup (14% and 9.8% for ILI and SARI respectively).

Serial Interval
After excluding asymptomatic cases, 54 infector-infectee pairs were identified where symptom onset dates for both infector and infectee were available. 35, 14 and 5 pairs were from March, April and May respectively. One pair was dropped for having a large negative serial interval (-19 days). Estimated parameters for the serial interval distribution are shown in table S3 and the fit is shown in figures 3E,F and S3. When analysing the 53 pairs, Weibull distribution was the best fit (AIC=-291) with a mean of 5.4 (4.4-6.4) and SD of 4.3 (3.0-5.1) days. When analysing serial intervals with positive values only (51 pairs), the Weibull distribution was again the best fit (AIC=-249) with a mean of 5.9 (5.0-6.9) and SD of 3.3 (2.4-4.1) days.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint

Determinants of presence of clinical symptoms in SARS-CoV-2 infection
Among 3404 cases till 1 June, 9.0% were symptomatic at the time of sample collection. Increasing age, male sex, and having a symptomatic infector were associated with a higher probability of being . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint  1.04-1.62) times more likely to be symptomatic than females. A secondary case was 8.16 (3.29-20.24) times more likely to be symptomatic if the index case was also symptomatic compared to if it was asymptomatic. Presence of one or more comorbidities was associated with increased symptomaticity but was not statistically significant (aRR=1.13 [0.92-1.39]). Overall (n -3404) 307 (9.0) --*Adjusted for age group, sex and presence of comorbidity in affected case, and symptom status of the index case. **Symptom status of index case was not available for 2579 out of 3404 cases whose infector was unknown (these cases were included in the model by making a separate category not shown in the table).

Case Fatality Rate
The case fatality rate (CFR) across time and case categories is presented in Table 1 and across age and sex in Figure 4. Among 71068 cases, the overall CFR was 2.5% (95% CI, 2.4-2.7%). Local cases with unknown origin had the highest CFR (3.0%), followed by local cases with known origin (0.84%), imported domestic cases (0.46%), and imported international cases (0.15%). In surveillance case . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint categories, SARI cases had a much higher CFR (23.66%) than ILI (2.42%). CFR steadily increased with age in both sexes, ranging from 0.07% in 0-9 years old to 16.31% in males and 17.55% in females of age 80 and above (Figure 4). The highest CFR was recorded for cases that were reported in the month of April (5.39%).

Discussion
Although the reproduction number R0 of the novel coronavirus has been well characterised, we find that R0 alone fails to capture the true picture of individual-level transmission dynamics. Overdispersion (k) ranged from 0.04 to 0.34 in our study, confirming that there is significant heterogeneity in transmission of the novel coronavirus. Importantly, we did not find significant underlying heterogeneity in the number of contacts. Figure 3C shows clustering in the number of secondary cases (concave line) while the number of contacts are homogenous as indicated by a relatively linear relation. Heterogeneity in transmission can be explained by heterogeneity in the number of contacts and/or the probability of infection per contact (infectivity level of index case and nature of exposure).(28, 29) Modelling studies indicate that SARS-CoV-2 SSEs occur when an infected person is briefly shedding at a very high viral load and has a high concurrent number of exposed contacts. (30) Since the patients which caused a majority of secondary cases did not have a concurrent larger share of total contacts (8.7% of infectious cases had 14.4% of contacts but caused 80% of transmission), our findings underscore the importance of the high infectivity of index case (at the time of exposure) and the nature of exposure in causing a successful SSE.
The most reliable estimates of overdispersion (k) in our study were 0.12 (0.11-0.15) for confirmed traced cases (n=956) and 0.22 (0.17-0.27) from the three clusters (n=394). Our estimates of k align with the lower range of current global estimates, albeit with smaller confidence intervals due to larger sample . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint size. A modelling study analysing global clusters estimated k at 0.10 (0.05-0.20). (31) A study of 1288 cases estimated k at 0.06 (0.05-0.07) and 0.20 (0.09-0.31) in two states of Indonesia. (32) In China, k was estimated at 0.25 (0.13-0.88) from 135 cases in Tianjin (33) and at 0·58 (0·35-1·18) from 391 cases in Shenzhen. (34) In Hong Kong, k was estimated at 0.43 (0.29-0.67) from 290 cases. (23) In Georgia USA, the overall k ranged from 0.32 to 0.49, with even lower values after shelter-in-place orders were issued. (35) Our findings suggest that super-spreading played a more dominant role in transmission in Karnataka, India, as compared to most high-income countries. Interestingly, phylogenetic studies indicate that remote clusters can be retrospectively linked to a previous SSE, indicating that superspreading may play an even larger role in the overall propagation of the epidemic than is detected through surveillance and tracing. (36) Ideally then, the true picture of epidemics can only be understood by analysing contact tracing, serological and phylogenetic data together, which can help plan adequate control measures.
Modelling studies indicate that the delay from symptom onset to confirmation (testing delay) is an essential determinant of the effectiveness of contact tracing. (5,37) Indeed, we found that infectors diagnosed four or more days after symptom onset led to a higher SAR among their contacts (9.6% vs 3.8%) and also had a higher number of contacts (median 15 vs 10.5). We found that this delay was lower for cases who had an early first contact with surveillance or contact tracing systems, namely, cases screened at entry ports (imported cases) and local cases from known contact lists. Imported cases had a low reproduction number compared to local cases (Table S2) which suggests that screening and registration of all persons at entry ports leading to early detection of these cases prevented most onward transmission. Cases captured through symptom-based surveillance and those whose source of infection was unknown had a higher SAR than all other case categories ( Table 2) and had a concurrently higher testing delay (table S4). SARI had lower infectiousness than ILI cases which could be explained by their hospitalisation preventing some transmission. Cases that were from known contact lists were confirmed 0.88 days (0.60-2.04) earlier and caused 60% less secondary infections than local cases whose origin was unknown (R 1.04 vs 0.42). These findings highlight the effect of contact tracing in reducing transmission and reassert the importance of minimising testing and tracing delays.
We estimated the relative risk (aRR) of infectivity at 3.63 (3.04-4.34) for symptomatic infectors as compared to asymptomatic carriers (SAR 7.7 vs 2.0%; R 2.04 vs 0.41) (table 2 and S2). Other studies have found that infectiousness also increases with disease severity. (38) This suggests that predictors of symptomatic infection and/or severe disease (like age) are by extension also predictors of increased infectiousness (Table 4). Accordingly, interventions aimed at children might have a relatively small impact on reducing SARS-CoV-2 transmission, as suggested by their low symptomaticity and severity and thus lower infectiousness than adults. (39) Our study also finds that children aged 6-18 are less infectious even after adjusting for lower symptomatic infections in this group. Interestingly, we found a strong association between the symptom status of the index and secondary case; symptomatic infectors were 8.16 (3.29-20.24) times more likely to generate symptomatic secondaries. These findings are corroborated by He D et al., who estimated relative risk of infectivity of symptomatics against that of asymptomatics at 3.9 (1. 5-11.8), and at 6.6 (2.0-34.7) when focusing on symptomatic secondaries. (40) With symptomatic infectors both, more likely to produce secondary cases in general and also more likely . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review) preprint
The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint to produce symptomatic secondaries who are themselves more infectious, it seems that a cascading effect of high transmission potential may play a role in amplifying COVID-19 outbreaks. Though it is known that nasopharyngeal viral load is higher in symptomatic cases and increases with severity in COVID-19, (41) it would be pertinent to explore whether a higher infectious dose from the infector influences symptom status and infection severity in the secondary case, something that could explain findings from this and He D et al. (40,42) We found one study estimating the serial interval of SARS-CoV-2 from contact tracing data in India; however, this study assumed the date of sample collection in asymptomatic cases as a proxy for symptom onset which heavily affects the reliability of their estimate. (43) Although we present reliable estimates of serial interval for COVID-19 for the first time from India, enhanced data sharing enabling real-time estimation to inform policy decisions is recommended to account for the temporal variation of serial interval as observed in China. (44) Given that the reproduction number is sensitive to the value of serial interval used for its estimation, it is prudent to select the serial interval distribution that fits best in context of the location and time phase of the epidemic. (45) Our estimated mean serial interval of 5.4 (4.4-6.4) agrees with existing evidence from global studies. (10) Our study has certain limitations. Firstly, symptomatic status was based on data collected at the time of sample collection and hence some cases recorded as asymptomatic may have developed symptoms later. (46)(47)(48) This would overestimate the proportion of asymptomatic infections and also the relative transmissibility by asymptomatics since presymptomatic cases have been shown to be more infectious than asymptomatic carriers. (40,47,49) Second, any amount of case and/or contact under-ascertainment during surveillance and contact tracing carries the potential to bias our results. Although we have attempted to minimise this bias by analysing subgroups with high reliability of data (Table S2), some degree of bias can still be expected. Since the completeness of contact tracing is inherently limited by memory recall and logistics, R estimated from contact tracing data cannot capture the entirety of the outbreak and can thus be expected to underestimate the true R. Third, details of settings of transmission and timing of exposure of contact to index case were not available for the vast majority of cases which precluded any insightful analysis on the same. Finally, the dates of symptom onset in our study may be subject to recall bias. Additionally, our results should be interpreted in the context of the NPIs in place at the time of the study. From 25 March to 8 June, physical distancing was mandated in Karnataka with closures of schools and most public places, restrictions on large gatherings, and mask wearing in public settings.
Our findings have a few important implications for optimizing policy. We find that even though surveillance, tracing, and social distancing may keep the reproduction number and hence transmission at low levels, super spreading is common in COVID-19 and carries the potential to acutely overwhelm surveillance and tracing systems. However, this presents an opportunity as well, in that outbreaks where a minority of cases cause most further transmission (high dispersion, low k) are much more amenable to control through measures that target the high-risk groups and settings responsible for most of the transmission. (6-8) Existing measures that limit potential super-spreading including bans on large gatherings and capping capacity in closed spaces are expected to remain beneficial. (28) Specifically, . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint emphasis should be given on backward or retrospective contact tracing which becomes increasingly effective as overdispersion increases, and tracing and testing delays should be minimized. (31,37) Evidence on the effect of symptom status on transmission suggests that measures targeted at children will not reduce transmission significantly. (39,49) Although more studies are needed, there is increasing evidence that symptomatic cases beget more symptomatic secondaries and may cause a snowballing effect on transmission across generations, (40) which has significant implications for both-transmission and morbidity control in COVID-19 outbreaks.

Declaration of interests:
The authors declare no conflicts of interest.

Acknowledgement:
We thank the India COVID-19 Apex Research Team (iCART) for enabling this research. GRB is funded by an Intermediate Fellowship by the Wellcome Trust DBT India Alliance (Clinical and Public Health research fellowship); grant number IA/CPHI/14/1/501499. We also thank one of the developers of www.covidtoday.in, Mr. Pratik Mandlecha for sharing his expertise. We thank the multi-sectoral contact tracing and surveillance teams and the Health Department of Karnataka whose efforts made this study possible.
Ethics approval: Surveillance and contact tracing activities wherein the data was collected were part of ongoing outbreak investigation mandated by state and national health authorities. The requirement for full review was waived by the Institutional Ethics Committee, Indian Institute of Public Health, Bengaluru of Public Health Foundation of India (IIPHHB/TRCIEC/211/2020 dated 25/12/2020).

Data sharing and code availability:
The bulletins dataset and all codes used to produce the results are available at github.com/CovidToday/covid19-karnataka. Interactive visualisations of cluster transmission networks can be found at kaclusters.covidtoday.in. The IDSP dataset can be made available upon reasonable request to the office of Commissioner (PP).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint

Role of funding source
The funding agency of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the article. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.    29 We fit a Poisson regression model with robust error variance to data on contacts' 30 recorded infection status (binary outcome: contact infected OR not infected) and 31 attributes of each contact's index cases to estimate adjusted relative risks for infection 32 of contacts as a function of these exposures. 6 For a subset of this data that included 33 only symptomatic index cases and their contacts, we fit another model which also 34 included the delay from symptom onset to confirmation of the index case (a proxy for 35 delay in case isolation and increasing infectious period spent in community) as a 36 predictor variable. Only high-risk/close contacts were included because the recording of 37 'low-risk contact' status was not deemed reliable in the accessed dataset. 38

39
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint 40 We fit a Poisson regression model with robust error variance to data on confirmed 41 COVID-19 cases' symptom status (binary outcome: case is symptomatic OR 42 asymptomatic) and attributes of each case and their index case (where known) to 43 estimate adjusted relative risks for developing symptoms in a COVID-19 case as a 44 function of these exposures. 6 45 46 47 S5. Minimizing bias due to incomplete contact tracing 48 The accuracy of reproduction number (R) and overdispersion (k) estimates from data 49 depends on the completeness and adequacy of the contact tracing efforts during the 50 sampled time. For example, cases with zero linked secondary cases in the dataset 51 could either be due to no further transmission from the case after adequate tracing, or 52 due to inadequate or absent contact tracing of the case leading to failure of identification 53 of secondary cases. To account for the possible bias arising out of inadequate contact 54 tracing specially at later stages of the study, we estimated R and k for multiple time 55 periods, and for cases from each category and from major clusters separately.

92
Direct and high-risk contacts include those who live in the same household with a confirmed case.

93
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted December 30, 2020. ; https://doi.org/10.1101/2020.12.25.20248668 doi: medRxiv preprint