Abstract
Introduction and Goals SARS-CoV-2 is transmitted both in the community and within households. Social distancing and lockdowns reduce community transmission but do not directly address household transmission. We provide quantitative measures of household transmission based on empirical data, and estimate the contribution of households to overall spread. We highlight policy implications from our analysis of household transmission, and more generally, of changes in contact patterns under social distancing.
Methods We investigate the household secondary attack rate (SAR) for SARS-CoV-2, as well as Rh, which is the average number of within-household infections caused by a single index case. We identify previous works that estimated the SAR. We correct these estimates based on the false-negative rate of PCR testing and the failure to test asymptomatics. Results are pooled by a hierarchical Bayesian random-effects model to provide a meta-analysis estimate of the SAR. We estimate Rh using results from population testing in Vo’, Italy and contact tracing data that we curate from Singapore. The code and data behind our analysis are publicly available1.
Results We identified nine studies of the household secondary attack rate. Our modeling suggests the SAR is heterogeneous across studies. The pooled central estimate of the SAR is 30% but with a posterior 95% credible interval of (0%, 67%) reflecting this heterogeneity. This corresponds to a posterior mean for the SAR of 30% (18%, 43%) and a standard deviation of 15% (9%, 27%). If results are not corrected for false negatives and asymptomatics, the pooled central estimate for the SAR is 20% (0%, 43%). From the same nine studies, we estimate Rh to be 0.47 (0.13, 0.77). Using contact tracing data from Singapore, we infer an Rh value of 0.32 (0.22, 0.42). Population testing data from Vo’ yields an Rh estimate of 0.37 (0.34, 0.40) after correcting for false negatives and asymptomatics.
Interpretation Our estimates of Rh suggest that household transmission was a small fraction (5%-35%) of R before social distancing but a large fraction after (30%-55%). This suggests that household transmission may be an effective target for interventions. A remaining uncertainty is whether household infections actually contribute to further community transmission or are contained within households. This can be estimated given high-quality contact tracing data.
More broadly, our study points to emerging contact patterns (i.e., increased time at home relative to the community) playing a role in transmission of SARS-CoV-2. We briefly highlight another instance of this phenomenon (differences in contact between essential workers and the rest of the population), provide coarse estimates of its effect on transmission, and discuss how future data could enable a more reliable estimate.
1 Introduction
Lockdowns and stay-at-home orders have been deployed across the world to curb the growth of SARS-CoV-2, the virus that causes COVID-19. These policies reduce transmission by reducing overall mobility.
Since the introduction of lockdowns, community mobility has decreased by 28% −70% (Google, 2020), and the effective reproduction number R has decreased by an average of 48% (IQR 38-65%) across 16 US states (shown in Figure 4). At the same time, these policies have increased time spent at home and therefore likely increased household transmission of SARS-CoV-2: Google (2020) reports 15% −29% increases in time spent at home in these states. In the UK, Shanghai, and Wuhan, the majority of social contacts occurred in households after lockdown. The estimated proportion of contacts is 58% for the UK (up from 34% pre-lockdown from Jarvis et al. (2020)) and 79% and 94% for Shanghai and Wuhan respectively (Zhang et al., 2020). Household transmission played an important role in previous epidemics such as the 2009 H1N1 pandemic (Endo et al., 2019; Tsang et al., 2016; Lau et al., 2012). It may now also be an important factor in SARS-CoV-2 transmission, especially since community transmission has decreased and household transmission has increased under stay-at-home policies.
Estimates for secondary attack rate (SAR) and household reproduction number (Rh). Dashed lines show estimates from the original studies. Solid lines show 95% credible intervals from a Bayesian hierarchical model, which adjusts estimates for false negatives and asymptomatics where appropriate. In the left plot the Meta Estimates (orange) are the model’s pooled credible intervals over the mean SAR and standard deviation of SAR. In the right plot the Meta Estimates refer to the credible intervals over the mean and standard deviation of Rh. The relatively wide intervals for the mean along with large variance of the meta estimate of SAR and Rh are due to high variability across studies. If a study has a single asterisk, this means it was unnecessary to adjust for asymptomatics (only false negatives). The double asterisk means no adjustment was necessary.
Estimation of Rh from annotated contact tracing data. Nodes represent positive cases and their horizontal position indicates confirmation date. We consider the subset of the graph containing cases with confirmation date prior to cut-off date: target nodes are infected by sources. R can be estimated as and hence R = 13/7 for this cluster. There are 3 distinct households in the cluster and thick borders denote secondary household infections. The household reproductive number is Rh = 5/7.
Daily aggregate estimates of effective and household reproduction number in Singapore based on contact tracing data. We observe that in early phases of the epidemic Reff was greater than 1. With increased public awareness and contact tracing efforts the effective reproduction number has decreased steadily through March 15th. Throughout this time household transmission stayed constant, with Rh values in the 0.2 − 0.3 range. The ratio of infections attributable to households decreased sharply at the end of March due to large outbreaks in migrant worker dormitories. Even though their infections are not annotated as households, this indicates that co-habitation and proximity play a large role in transmission dynamics.
Estimated values of the reproduction number R pre- and post-lockdown in a subset of US states (top) and other countries (bottom). The growth rate was estimated by fitting an overdispersed Poisson distribution onto daily death statistics, as described in Yadlowsky et al. (2020). This was translated into a reproduction number R via the generation time distribution (c.f. (Ferretti et al., 2020)). 95% confidence intervals are shown.
Our aim is to estimate household transmission and determine its overall contribution to growth of SARS- CoV-2. We do so both via a meta-analysis of previous studies, a direct analysis of data collected in Vo’, Italy (Lavezzo et al., 2020), and analysis of contact tracing data that we curated from Singapore (Singapore COVID-19 Dashboard, 2020). We correct these data for false negatives and in some cases for selective testing of only symptomatic cases. Code for our analysis, together with the Singapore dataset, are available at https://github.com/andrewilyas/covid-household-transmission.
We estimate that household transmission contributes between 0.22 and 0.42 infections to the overall reproduction number. In comparison, overall reproduction numbers during stay-at-home policies ranged from 0.65 to 1.25. We conclude that household transmission is now a significant contributor to overall SARS-CoV-2 reproduction (see Figure 5 for region-specific estimates).
Left: Reproduction numbers for community transmission (Rc) and intra-household transmission (Rh) for the regions whose R values are shown in Figure 4. The overlaid contour plot shows level sets of the overall reproduction number R = Rh + Rc. Right: Estimated share of transmission attributable to household infections (Rh/R). In both graphs we assume Rh = 0.3 pre-lockdown—to obtain post-lockdown Rh we multiply by a mobility factor M, from Google’s estimates of average time spent in residential areas.
We next examine two folk theories that have sometimes been used to argue against attending to household transmission. The two theories are that preventing transmission within a household is either (a) futile, since transmission is inevitable or (b) low-impact, since secondary infections are “contained” in the household (Horowitz, 2020):
Household inevitability: No interventions to reduce household transmission are likely to work.
Household containment : People infected at home will stay at home and not cause community spread.
Our meta-analysis of the household secondary attack rate yielded a central estimate of 30% and estimates for individual studies ranging from 11% − 55%. This implies that household infection is not a certainty and can be mediated by differences in setting, which casts doubt on inevitability. In Section 4.2 we discuss policies for reducing household infection. More generally, our estimates imply that infection is not certain even among individuals who are in regular contact.
We formalize the household containment theory in terms of ratios of secondary transmission, and show that these can be estimated from well-annotated contact tracing data. Unfortunately, existing data are too noisy to reliably estimate containment. In Section 4.2 we outline how these rates can be approximated from aggregate data, and suggest future work for estimating them directly.
Our results underscore that mobility is an imperfect measure of infection risk, since decreases in mobility likely increase household infections. We suggest instead basing policy on contact patterns, which mediate transmission both for COVID-19 (Liu et al., 2020; Jarvis et al., 2020) and previous epidemics (Edmunds et al., 1997; Wallinga et al., 1999; 2006; Mossong et al., 2008; Hens et al., 2009; Eames et al., 2012). To this end, we explore contact heterogeneity caused by stay-at-home orders, due to the divide between essential workers and the rest of the population. An initial simulation study (Section 4.3) forecasts that decreasing transmission risk between essential workers is 8−35 times as effective as reducing other types of transmission.
2 Methods
The dynamics of disease spread are described via the effective reproduction number R, which measures the average number of new infections caused by each infected person i. To quantify household transmission, we decompose R into two components R = Rc + Rh. The community reproduction number Rc is the average number of new infections caused by an infected individual i outside i’s household. The intra-household reproduction number Rh is the average number of new infections caused by an infected individual i inside i’s household. The ratio measures the fraction of transmission occurring within households.
In addition to Rh, we consider two measures of intra-household spread that quantify individual risk of infection. The household secondary attack rate (SAR) is the probability an infected person i infects a specific household member j. Additionally, we define the household conditional risk of infection (CRI) to be the probability that j is infected, conditioned on household member i being infected. In contrast to the SAR, the CRI allows for j having been infected by someone other than i, and so CRI can be estimated without attributing infections to particular primary cases.
2.1 Challenges in estimating Rh, SAR and CRI from empirical data
Estimating the growth and prevalence of household cases requires detailed household-level data. The relevant information to estimate Rh, SAR, and CRI includes:
Testing results for all members of a household, including both positive and negative cases.
Infection attribution within a household, which involves identifying the primary cases in a household and constructing the household transmission chains.
Clinical outcome at the end of the study. If a case is still infectious at the end of a study the data might undercount the number of infections attributed to the case.
Assuming accurate household level data we can estimate Rh, SAR, and CRI as follows:
Rh can be estimated as the ratio of secondary household cases to total household cases. It requires accurate attribution of infections within households, but does not require negative case counts.
SAR can be estimated as the ratio of secondary household cases to total susceptible household members. It requires accurate attribution as well as negative test counts in households.
CRI can be estimated as the ratio of total number of infected pairs in households to total number of household pairs. It requires positive and negative case count at household level but not direct attributions.
In practice, estimating these quantities is made difficult by asymptomatic infections and false negatives. Existing evidence suggests that asymptomatic cases constitute 18-43% of all infections and may have different households transmission dynamics than symptomatic cases (Lavezzo et al., 2020; Gudbjartsson et al.; Ferretti et al., 2020; Streeck et al., 2020). However most studies predominantly test symptomatic individuals, which leads to biases in the estimates. We can correct the estimates to account for asymptomatic cases. We calibrate our corrections on studies that have tested a representative sample of the population, as in studies of Vo’, Italy and Gangelt, Germany (Lavezzo et al., 2020; Streeck et al., 2020).
Regarding false negatives, recent studies have shown high false-negative rates (FNR) for RT-PCR tests, and this rate varies over the time course of the disease. Kucirka et al. (2020) find an FNR of 20% for patients tested on Day 8 of infection and 67% on Day 4. The average FNR in the 10 days following onset of symptoms is estimated as 17% by Wikramaratna et al. (2020) and 30% by Kucirka et al. (2020). The FNR also varies between different kinds of swab (Wikramaratna et al., 2020) and between different labs (Kucirka et al., 2020). The problem of false negatives can be mitigated by correcting estimates using the FNR (Wikramaratna et al., 2020). However, in recent studies of household transmission (see Section 2.2.1), estimates of SAR have not been corrected for the FNR.
2.2 Procedure for estimating Rh, SAR, and CRI
In order to estimate Rh, SAR, and CRI and account for asymptomatics and false negatives we propose two approaches. First we adjust estimates from previous studies to account for false negatives and asymptomatics (see Section 2.2.1). Second, we estimate CRI on the blanket testing dataset from Vo’, Italy (Lavezzo et al., 2020) (see Section 2.2.3). Additionally, we have assembled our own dataset based on contact tracing data from Singapore, which we use to estimate Rh (see Section 2.2.2).
2.2.1. Adjusting SAR literature estimates
For the SAR, we searched PubMed and Google Scholar to find any previous work that estimates the household SAR from empirical data. We found a total of nine papers. Whenever appropriate, we recalculated the estimated SAR for each study to correct for the false-negative rate (FNR) of RT-PCR testing and the proportion of cases that are asymptomatic (denoted “AR”). We used a hierarchical Bayesian random effects model both for correcting estimates from individual studies and for pooling results to compute a meta-analysis estimate of the SAR. In the model, the SAR for study i (denoted SARi) is drawn from a Beta distribution and each study has a false-negative rate FNRi drawn from a prior based on estimates in the literature (Section 2.1). The proportion of asymptomatics AR is shared across studies and is also drawn from a prior based on existing literature. The likelihood of a household member testing positive is pi = SARi ∗ (1 − FNRi) ∗ (1 − AR) for studies where only symptomatic contacts were tested. To estimate the household reproduction number Rh for each study i, we adjust the total number of secondary cases using FNRi and AR and divide by the number of primary cases. Results are shown in Figure 1 and a full description of the model is found in Appendix B.
2.2.2. Singapore contact tracing data for estimating Rh
Singapore’s Ministry of Health has collected information about COVID-19 cases tracked via contact tracing (Singapore MOH, 2020). UpCode Academy published this data in an interactive dashboard (Singapore COVID-19 Dashboard, 2020). We extracted associated metadata for each positive case along with a directed graph providing information about the infection source. This resulted in a case and transmission network for 6588 patients. Confirmation dates for cases ranged from January, 23rd to April 19th.
We used the data to construct a transmission graph, where nodes correspond to cases and edges to infections. As terminology, we distinguish source cases (which are the cause of infections) from target cases (which are infected by sources). We define a cut-off date, such that all infections with confirmation date prior to the cut-off are labeled as sources. All the nodes that have an incident edge from a source are labeled as targets. A node in the infection graph can be both source and target. When relationships are known, edge annotations reflect the relationship between source and target cases (e.g. family member, colleague, contact). Rh can be thus estimated as the ratio of total number of household infections to total number of source cases. Since there is no annotation for “household”, we use the annotation “family member” as a proxy.
A schematic infection graph is shown in Figure 2. In the case of real data some of the edges are not direct source-target attributions, but are mediated by ‘Cluster’ nodes. Despite not always having direct infection attributions we can use temporal information to determine the number of sources and targets.
2.2.3. Data from Vo’ for estimating Rh and CRI
Most of the population in the town of Vo’, Italy was tested both at the start of a lockdown and 14 days later (Lavezzo et al., 2020). The two phases of testing covered 86% (2812 subjects) and 72% (2343 subjects) of the population in Vo’, respectively.
We analyzed data collected in order to estimate Rh and CRI. Following the same terminology as in the previous section, the date of the second round of testing acts as a cut-off date. Since the data does not trace infections to sources, we assume that all secondary household infections were caused by the primary case.
We estimate the FNR by assuming that symptomatic cases in households with other confirmed infections are positive cases.
3 Results
3.1 SAR and Rh estimates from previous studies
Estimates of the SAR for each study are shown in Figure 1 and Table 4 and precise counts and estimates in Tables 3 and 4. Our Bayesian model allows for the possibility of heterogeneity in the SAR across studies and we find substantial heterogeneity. Figure 1 shows that some 95% credible intervals for the SAR do not overlap. The pooled central estimate of the SAR is 30% with 95% credible interval (0%, 67%), which captures the model’s posterior prediction for the SAR for a new study. If we do not correct estimates for false-negatives and asymptomatics, the central estimate is 20% (0%, 43%). The heterogeneity of SAR across studies is partly due to variation in the false-negative rate, which we model but do not observe directly. It may also be due to variation in the cohort of primary cases and their behavior in the household. As a point of comparison, the household SAR has been estimated to be 15% for H1N1 influenza (Carcione et al., 2011) and 8% for SARS (Lau et al., 2004).
Estimates of household transmission quantities. Literature estimates were corrected for FNR and AR, whenever appropriate, and pooled via a hierarchical Bayesian model. Vo’ and Singapore estimates are based on original analysis of the respective datasets. The Rh range for Singapore is the range of central estimates for different cut-off dates. The Rh interval for Vo’ is the confidence intervals derived from normal approximations. We do not report confidence intervals for CRI.
Estimates of CRI and Rh and SAR assuming different FNR values for Vo’ data. lack of awareness of infection risk in early February and violation of single index case assumption. In the other direction, Vo’ has a smaller-than-typical mean household size of 2.1.
One concern about previous studies is that asymptomatics are under-represented among the primary cases. This could also contribute to the observed heterogeneity. Table 4 shows estimates of the CRI based on random population testing. These estimates (0.31 for Gangelt, Germany and 0.50 for Vo’, Italy) are consistent with our central estimate of SAR, which predicts CRI ≈ 0.41. This provides some evidence that our SAR estimates are robust to under-representation of asymptomatics.
3.2 Estimates from Singapore contact tracing data
In Singapore, the quality of annotations appears to have degraded over time. Therefore, we limited our analysis to consider source cases that were confirmed prior to March 27th 2020, along with their corresponding target cases (even if the target cases are confirmed later). We believe this cut-off date was early enough to ensure data quality. This subset of the infection graph contains a total of 1076 nodes and 599 edges. Out of the total number of cases, 710 are connected to other known cases forming 111 clusters. Among these, 417 out of 710 are source cases with confirmation date prior to the cut-off. The remaining 366 out of 1076 are not traced to any other case.
The effective reproductive number R is the average number of new infections generated by each case. We can estimate R from our transmission graph by computing the ratio of total number infections to total number of source cases. The data contains 599 total infections and 417 source cases, which results in an estimated R of 1.44 (0.77-2.31). Figure 3 contains estimates for cut-off dates other than March 27th, central estimates for which range between 0.90 and 4.93. Regarding household transmission, there are 108 household infections. Assuming FNR of 20% the corrected estimate for Rh is 0.32 (0.22-0.42). The corrected central estimate ranges from 0.19 to 0.34 depending on the cut-off date (see Figure 3; also Figure 8 and Table 5 in Appendix A).
3.3 Estimates from blanket testing data
In Vo’, Italy, the first round of testing recorded 73 positive cases among 53 households with a total of 137 members. The second round of testing identified 7 more positive cases, 3 of which were in households with prior reported cases. The CRI can be estimated as the ratio between the total number of ordered pairs among infected individuals in the same household (60) and the total number of ordered household pairs where the first individual in the pair is infected (136). This yields a point estimate for.
Assuming that secondary infections are caused by the primary case in a household, there 73 sources, 23 (73+3-53) targets (household infections), and 84 (137-53) susceptible individuals (household contacts). This yields a point estimate for Rh of . Due to the nature of blanket testing we cannot determine the number of index cases, which makes estimates of SAR unreliable.
Correcting for false negatives and asymptomatics
Out of subjects residing in households with at least one conformed case, 7 have experienced symptoms but have tested negative. We can adjust for false negatives by considering the distribution of case counts across households. Assuming an asymptomatic rate of 0.41 Lavezzo et al. (2020) (among confirmed cases), and a false negative rate (FNR) of 22%, the expected number of false negatives in infected households is 6.77, which corresponds to the observed value.
For FNR value of 22%, and average household size of 2.1.2, the corrected estimates are CRI= 0.54 and Rh = 0.37 ((0.34 − 0.40)). We estimate confidence intervals for Rh by making normal approximations to the conditional distribution of the corrected Rh values for each individual. In the case of CRI we cannot accurately quantify uncertainty due to heterogeneity in household sizes and difficulties in estimating the number of index cases in the community. (see Table 2 for other FNR values).
Confounds and heterogeneity
We might be concerned that testing decreased the household transmission due to earlier household isolation, as in (Bi et al., 2020). However, if we restrict to early cases (where the first household member has symptoms prior to the start of testing), the adjusted CRI is essentially unchanged at 48%.
Next we might ask about heterogeneity in CRI: perhaps spouses have a very high CRI rate. But among presumed partners (adult co-habitants in the same ten-year age range), CRI was still only 52%. Possibly more important is age: CRI is only 24% among individuals under 50 when adjusting for 15% FNR. There is evidence that younger people have a higher FNR, but even an FNR of 40% would only lead to an adjusted CRI of 33%.
Each of these subgroup analyses is on only a small number of total cases and so should be interpreted with caution. In addition, the Vo’ data may overestimate both Rh and the CRI due to an older population,
4 Discussion
4.1 The contribution of household transmission to reproduction number
In the previous section, we estimated the contribution of household transmission to the spread of COVID-19 through SAR, CRI, and the household reproduction number Rh from a variety of sources. Estimates for Rh based on contact tracing data and blanket testing data ranged from 0.2-0.37. Here we examine the extent to which this household transmission explains overall transmission levels. More formally, we study the ratio of the effective household reproductive number Rh to the effective reproductive number R.
We estimated R across geographic regions where enough data was available3 for both pre- and post-lockdown time periods (Figure 4). Specifically, we used an overdispersed (α = 0.1) Poisson model (Yadlowsky et al., 2020) fit to death data (Dong et al., 2020) to estimate the growth rate, which was translated into R using the generation time distribution ω(t) estimated by Ferretti et al. (2020).
Based on our results in Table 4, we assume that Rh = 0.3 for all regions pre-lockdown. After lockdown, location tracking data from Google shows that people spent more time in their households (an 11% increase in the United States (Google, 2020)). Hence we estimate post-lockdown Rh as 0.3 M, where M is the increase in average time spent inside residential areas (Google, 2020).
Figure 5 (left) shows a scatter plot of estimated community vs household reproduction numbers for each region we study. Figure 5 (right) shows a histogram of the estimated contribution of household transmission to total reproduction number both pre- and post-lockdown. The share of R attributed to household transmission increased from 0-25% pre-lockdown to 25-50% post-lockdown. This shows there could be meaningful benefits from interventions that reduce Rh.
4.2 Implications for household inevitability and containment
We previously identified two common arguments against intervening against household transmission: house-hold inevitability and household containment. We revisit these theories in light of our estimates.
Inevitability
Household inevitability posits that once a household member is infected, the other members will also become infected, thus making interventions futile. Our central estimate of the household SAR is 30% with less than 5% posterior probability assigned to the SAR being above 67%. This shows that household transmission is not inevitable. There is also evidence that the SAR can be reduced by behavioral interventions. Wang et al. (2020a) found that the SAR was lower in households where people wore masks at home, cleaned regularly with disinfectant, and avoided close contact with the primary cases. Li et al. (2020) found the SAR was 0% for households where the primary case was isolated on symptom onset. These results comport with what is known about the incubation and generation time (Jing et al., 2020; Ferretti et al., 2020) and the effectiveness of PPE (Feng et al., 2020). However, these results are both observational studies and we await more rigorous experiments.
Containment
Household containment posits that people infected in the household (as opposed to the community) are unlikely to cause further community infection. If true, this would imply that interventions to reduce household transmission are less valuable for reducing overall spread.
We can quantify this effect, if it exists, by considering secondary transmission rates from household and community infections. To simplify matters we refer to those infected via household transmission as “household cases” and those infected via community transmission as “community cases.” We then let Rc|h be the average number of community infections caused by a household case, and Rc|c the average number of community infections caused by a community case. Household containment asserts that Rc|h/Rc|c ≪ 1.
To our knowledge, there is no public dataset that allows us to test household containment empirically. Accurate contact tracing data would enable us to measure Rc|h and Rc|c directly by counting cases, but existing data does not reliably distinguish between primary and secondary infections within a household. Instead, we provide an a priori analysis. There are two relevant differences between people infected at home and those infected in the community:
Household cases may be demographically different (e.g., they could predominantly be children) or have different levels of exposure to the community (e.g., they could predominantly be non-essential workers).
Household cases may self-isolate earlier than community cases, based on observing that the primary case in the household is symptomatic.
These demographic and contact differences could decrease Rc|h/Rc|c: for example, if all household cases are non-essential workers and all community cases are essential workers, the differences in contact would decrease Rc|h/Rc|c by a factor of 3.5 (based on self-reported contact data from (Rothwell, 2020)). We also expect early isolation protocols to have an effect. The effect size should depend on the proportion of households that decide to use early isolation, and the probability of secondary transmission under the protocol. The former is dependent on policy and population behaviour, whereas the latter can be estimated from disease characteristics (generation time distribution, incubation time distribution, and asymptomatic rate; c.f. (Ferretti et al., 2020)).
More empirical data on these quantities, or more fine-grained contact tracing and testing data, is needed for better estimates and will facilitate a proper cost-benefit analysis around reducing household transmission.
4.3 Other types of contact heterogeneity
Our results highlight the fact that net community mobility gives an incomplete picture of overall transmission, and thus a contact-based view of transmission dynamics (Hens et al., 2009; Eames et al., 2012; Mossong et al., 2008) may be helpful. Contact-based models may be especially relevant in the presence of interventions that drastically shift contact patterns, such as the lockdown orders prompted by COVID-19 (Zhang et al., 2020). For example, lockdowns have amplified differences in contact between essential workers and the rest of the population: a recent Gallup panel (Rothwell, 2020) found that essential workers report 3.5 times as many contacts per day as other individuals. In this section, we briefly show how even outside of households, studying contact may help shed light on the effectiveness of policy interventions.
While the most accurate way to study contact in an epidemic is through a network or graph model (Meyers et al., 2005; Lloyd-Smith et al., 2005; Kiss et al., 2006), prior work has shown that even simple stratified compartmental models can be effective (Wallinga et al., 2006; Jarvis et al., 2020; Xia et al., 2013; Liu et al., 2020; Zhang et al., 2020) at predicting epidemic growth. These models partition the population into groups and are specified by the average level of contact between groups, rather than between individuals; a common choice is to assign groups to age ranges: e.g., the POLYMOD study (Mossong et al., 2008), or in the context of COVID-19, compartmental age models of the UK (Jarvis et al., 2020) and seven Chinese cities (Liu et al., 2020; Zhang et al., 2020).
Here, we use a compartmental model with only two groups: people staying at home (‘low contact’ or LC) and essential workers (‘high contact’ or HC). Transmission under our HC/LC model is determined by a two-by-two contact matrix M, where Mij is the number of contacts made by a member of group j ∈ {LC, HC} with members of group I ∈ {LC, HC}. Since we are not aware of high-quality data measuring these quantities directly during lockdowns, we instead obtain coarse estimates from available survey data. Using self-reported contact data (Rothwell, 2020; Jarvis et al., 2020), we obtain point estimates for the contact matrix in the United States (see Figure 6; methodology, sensitivity analysis, and an analagous estimate based on data from the UK are shown in the Appendix).
Contact matrix estimates for the United States using data from (Rothwell, 2020). Derivation can be found in the Appendix.
A common assumption for respiratory diseases such as SARS-CoV-2 is that exposure levels are proportional to contact levels (Edmunds et al., 1997; Wallinga et al., 1999; 2006). An additional implicit assumption of compartmental models is that infection rates between two groups are independent of the groups themselves. A more accurate model would account for disparities in infection rates (e.g., Zhang et al. (2020) find that social contacts tend to have a higher infection rate than workplace contacts)—this is also true of age-compartmental models, where the relative infectiousness of children and adults is not yet known (Jarvis et al., 2020). Under this assumption, the dominant eigenvalue of the contact matrix is a constant multiple of the reproductive number (more precisely, the contact matrix is a constant multiple of the “next-generation matrix” (Diekmann et al., 1990; den Driessche and Watmough, 1992), whose dominant eigenvalue is R). We can therefore estimate how changes in contact affect disease transmission, by looking at how each entry of the contact matrix affects the dominant eigenvalue (Caswell, 2006; Klepac et al., 2020).
Since the two groups in our model are easily identifiable, policymakers can target interventions to specific modes of interaction (e.g. by enforcing more stringent physical distancing in the workplace to reduce HC-HC contact, or providing PPE to workers with public-facing occupations to reduce HC-LC contact). To forecast the effect of such interventions, we consider the decrease in reproduction number R caused by a 10% decrease in each type of contact (Figure 7). The results predict that reducing contact between high-contact individuals will have disproportionate effect on reducing overall transmission. Specifically, our point estimate predicts a 10% reduction in contact between high-contact individuals (HC-HC contact) being 35x more effective than a 10% reduction in LC-LC contact, and 8x more effective than a 10% reduction in HC-LC contact.
The effect of reducing each contact mode by 10% under the simple compartmental model presented here.
These predictions are based on rather crude parameter estimates, and could be greatly improved by more direct measurements of contact structures, and by incorporating heterogeneity from other sources (e.g. by also stratifying the model on age or contact location (Jarvis et al., 2020; Liu et al., 2020; Zhang et al., 2020)).
Data Availability
Public data was used for the analysis. Contact tracing data, which was compiled by automatically aggregating data from an interactive dashboard, is available in a consolidated form in our public codebase.
A Data Tables
A.1 Case counts
Case counts for index cases, household contacts and household infections.
A.2 Intra-household reproduction number and SAR estimates
Estimates of household transmission quantities. Literature estimates were corrected for FNR and AR. The case counts are taken from the cited studies (see Table A.1). The credible intervals for the SAR and Rh estimates are obtained via MCMC sampling.
A.3 Contact tracing estimates by cut-off date
Reff and Rh values for ranging values of cut-off date.
A.4 Reproduction number estimates by cut-off date
Daily aggregate estimates of effective and household reproduction number in Singapore for cutoff dates ranging from the begining of epidemic to April 7th.
B. Hierarchical Model for Literature Estimates
We use the following Bayesian Hierarchical Model:
Graphical Model
Data
Ni − number of household contacts considered in each study
Ci − number of confirmed cases
1ARi − indicator; 0 - the study tested asymptomatics, 1 - otherwise
1FNRi − indicator; 0 - the study corrected for false negatives, 1 - otherwise
Priors
FNRi ∼ Uniform(0.15, 0.35)
AR ∼ Uniform(0.18, 0.43)
SARi ∼ Beta(α, β)
α, β ∼ Half Flat()
Likelihood
pi := SARi(1 − AR · 1ARi)(1 − FNRi · 1FNRi)
equivalently:
Ci|SARi, FNRi, AR ∼ Binomial(Ni, SARi(1 − AR · 1ARi)(1 − FNRi · 1FNRi))
We perform inference via MCMC sampling using PyMC34 with the the built-in NUTS Hoffman and Gelman (2014) sampler. We use 4 chains with 6000 iterations each. The burn-in period is 2000.
Figure 10 below contains histograms corresponding to posterior samples of SAR and Rh as well as the posterior samples of the mean of SAR and Rh respectively.
Samples from the posterior distribution of SAR and Rh.
C. Contact Matrix Estimation
We thus focus on a simple model analagous to the age stratification model, but with just two groups: high-contact individuals (HC) and low-contact individuals (LC). To simplify matters, we will treat the HC cluster as consisting of essential workers only, with the rest of the population belonging to the LC cluster. If we make assumptions analagous to those made in the age stratification model5, then once again the contact matrix M determines disease transmission via its dominant eigenvalue and eigenvector (Caswell, 2006). This observation leaves us with four relevant quantities to estimate: MHC,HC, MHC,LC, MLC,HC, and MLC,LC. Using self-reported contact data from the US (Rothwell, 2020) we provide estimates for each of these parameters below.
Total contacts
The most readily available statistic from the available data is the number of total contacts reported by members of each cluster. In the US, Rothwell (2020) find that those who are working report 13.9 contacts per day, whereas those not working report 4.0 contacts per day.
Population proportions
Allen et al. (2020) suggest that essential workers represent 40% of the US workforce, which as of 2018 represents 63% of the US population6. Since 29% of the US workforce is reportedly equipped to work from home7, we estimate that 18% of the population is doing essential work.
Estimating MHC,HC
This parameter corresponds to the number of essential workers that an essential worker will have close contact with, on average. This is equal to the number of contacts an essential worker has with co-workers, added to the average number of contacts an essential worker will have with other essential workers in the general population.
For coworker interactions, Rothwell (2020) report that 9.9 of the 13.9 interactions had by essential workers were at their places of work, or 71% of all contacts. This is likely to be an overestimate of coworker interaction, since it includes interactions with low-contact individuals at the essential worker’s workplace (e.g. clients at a bank or shoppers at a grocery store). To get a reasonable lower bound, we use the number of workplace contacts reported by essential workers in sectors that are not public-facing, namely construction, agriculture, and communications. Workplace contacts for these sectors range from 6.1-7.4, and so we take 6.8 as a central estimate, or 49% of all contacts. This is likely an underestimate, since it does not take into account high-contact jobs such as those in food processing or assembly lines.
To get MHC,HC, we have to add the number of interactions with coworkers to the number of interactions with essential workers in the general population. To estimate the latter, we take the remaining number of total essential worker contacts (e.g. if we estimate 9.9 coworker contacts, 4.0 remain) and use population proportions to estimate how many of them are with other essential workers (e.g. 0.18 · 4.0 = 0.7).
In the end, we give a central estimate of 9.35 for MHC,HC based on the range [8.1, 10.6].
Estimating MHC,LC, MLC,HC, and MLC,LC
Once we obtain an estimate of MHC,HC, the remaining parameters can be straightforwardly deduced. First, MLC,HC is the number of total essential worker contacts minus MHC,HC. This entry uniquely determines MHC,LC, since contact is symmetric and thus wHC · MLC,HC = (1 − wHC) · MHC,LC, where wHC is the proportion of essential workers in the population. After solving for MHC,LC, the final entry MLC,LC is obtained by subtracting MHC,LC from the total number of contacts reported by individuals in the LC cluster.
Footnotes
1 https://github.com/andrewilyas/covid-household-transmission
↵1 https://github.com/andrewilyas/covid-household-transmission
↵2 More accurately, 2.1 was the mean number of individuals in each household who were tested.
↵3 Regions were selected from a pool of all US states and over 100 other regions, and were included based on (a) availability of lockdown dates, (b) having enough pre-lockdown data to estimate R reliably. The exact calculations are given in our code release.
↵5 Concretely, that transmission depends only on the number of cluster-specific contacts, the infection probability of each contact, and the contact duration
↵6 https://www.bls.gov/emp/tables/civilian-labor-force-participation-rate.htm