Abstract
Introduction and Goals SARS-CoV-2 is transmitted both in the community and within households. Social distancing and lockdowns reduce community transmission but do not directly address household transmission. We provide quantitative measures of household transmission based on empirical data, and estimate the contribution of households to overall spread. We highlight policy implications from our analysis of household transmission, and more generally, of changes in contact patterns under social distancing.
Methods We investigate the household secondary attack rate (SAR) for SARS-CoV-2, as well as Rh, which is the average number of within-household infections caused by a single index case. We estimate the SAR through a meta-analysis of previous studies. Taking a Bayesian approach, we correct estimates for the false-negative rate (FNR) of testing and compute posterior credible intervals for SAR. We estimate Rh using results from population testing in Vo’, Italy and contact tracing data that we curate from Singapore. The code and data behind our analysis are publicly available1.
Results Our analysis of eight recent attack rate studies yields an SAR estimate of 0.28 (0.10, 0.61). From contact tracing data, we infer an Rh value of 0.32 (0.22, 0.42). Blanket testing data from Vo’ yields an Rh estimate of 0.37 (0.34, 040) and a CRI estimate of 0.5 after correcting for FNR and asymptomatic cases.
Interpretation Our estimates of Rh suggest that household transmission was a small fraction (5%-35%) of R before social distancing but a large fraction after (30%-55%). This suggests that household transmission may be an effective target for interventions. A remaining uncertainty is whether household infections actually contribute to further community transmission or are contained within households. This can be estimated given high-quality contact tracing data.
More broadly, our study points to emerging contact patterns (i.e., increased time at home relative to the community) playing a role in transmission of SARS-CoV-2. We briefly highlight another instance of this phenomenon (differences in contact between essential workers and the rest of the population), provide coarse estimates of its effect on transmission, and discuss how future data could enable a more reliable estimate.
1 Introduction
Lockdowns and stay-at-home orders have been deployed across the world to curb the growth of SARS-CoV-2, the virus that causes COVID-19. These policies reduce transmission by reducing overall mobility. Since the introduction of lockdowns, community mobility has decreased by 28% – 70% (Google, 2020), and the effective reproduction number R has decreased by an average of 48% (IQR 38-65%) across 16 regions studied.
At the same time, these policies have increased time spent at home (by 15% – 29% (Google, 2020)) and therefore likely increased household transmission of SARS-CoV-2. Household transmission played an important role in previous epidemics such as the 2009 H1N1 pandemic (Endo et al., 2019; Tsang et al., 2016; Lau et al., 2012). It may now also be an important factor in SARS-CoV-2 transmission, especially since community transmission has decreased and household transmission has increased under stay-at-home policies.
Our aim is to estimate household transmission and determine its overall contribution to growth of SARS-CoV-2. We do so both via a meta-analysis of previous studies, a direct analysis of data collected in Vo’, Italy (Ferretti et al., 2020), and analysis of contact tracing data that we curated from Singapore (Singapore COVID-19 Dashboard, 2020). We correct these data for false negatives and in some cases for selective testing (of only symptomatic cases). Code for our analysis, together with the Singapore dataset, are available at https://github.com/andrewilyas/covid-household-transmission.
We estimate that household transmission contributes between 0.22 and 0.42 infections to the overall reproduction number. In comparison, overall reproduction numbers during stay-at-home policies ranged from 0.65 to 1.25. We conclude that household transmission is now a significant contributor to overall SARS-CoV-2 reproduction (see Figure 4 for region-specific estimates).
We next examine two folk theories that have sometimes been used to argue against attending to household transmission. The two theories are that preventing transmission within a household is either (a) futile, since transmission is inevitable or (b) low-impact, since secondary infections are “contained” in the household (Horowitz, 2020):
Household inevitability: No interventions to reduce household transmission are likely to work.
Household containment: People infected at home will stay at home and not cause community spread.
Our meta-analysis of household secondary attack rate yielded a typical range of 10% – 61% with significant variability across studies. This implies that household infection is not a certainty and can be mediated by differences in setting, which casts doubt on inevitability. In Section 4.2 we discuss policies for reducing household infection. More generally, our estimates imply that infection is not certain even among individuals who are in regular contact.
We formalize the household containment theory in terms of ratios of secondary transmission, and show that these can be estimated from well-annotated contact tracing data. Unfortunately, existing data are too noisy to reliably estimate containment. In Section 4.2 we outline how these rates can be approximated from aggregate data, and suggest future work for estimating them directly.
Our results underscore that mobility is an imperfect measure of infection risk, since decreases in mobility likely increase household infections. We suggest instead basing policy on contact patterns, which mediate transmission both for COVID-19 (Liu et al., 2020; Jarvis et al., 2020) and previous epidemics (Edmunds et al., 1997; Wallinga et al., 1999; 2006; Mossong et al., 2008; Hens et al., 2009; Eames et al., 2012). To this end, we explore contact heterogeneity caused by stay-at-home orders, due to the divide between essential workers and the rest of the population. An initial simulation study (Section 4.3) forecasts that decreasing transmission risk between essential workers is 8 – 35 times as effective as reducing other types of transmission.
2 Methods
The dynamics of disease spread are described via the effective reproduction number R, which measures the average number of new infections caused by each infected person i. To quantify household transmission, we decompose R into two components R = Rc + Rh. The community reproduction number Rc is the average number of new infections caused by an infected individual i outside i’s household. The intra-household reproduction number Rh is the average number of new infections caused by an infected individual i inside i’s household. The ratio measures the fraction of transmission occurring within households.
In addition to Rh, we consider two measures of intra-household spread that quantify individual risk of infection. The household secondary attack rate (SAR) is the probability an infected person i infects a specific household member j. Additionally, we define the household conditional risk of infection (CRI) to be the probability that j is infected, conditioned on household member i being infected. In contrast to the SAR, the CRI allows for j having been infected by someone other than i, and so CRI can be estimated without attributing infections to particular primary cases.
2.1 Challenges in estimating Rh, SAR and CRI from empirical data
Estimating the growth and prevalence of household cases requires detailed household-level data. The relevant information to estimate Rh, SAR, and CRI includes:
Testing results for all members of a household, including both positive and negative cases.
Infection attribution within a household, which involves identifying the primary cases in a household and constructing the household transmission chains.
Clinical outcome at the end of the study. If a case is still infectious at the end of a study the data might undercount the number of infections attributed to the case.
Assuming accurate household level data we can estimate Rh, SAR, and CRI as follows:
Rh can be estimated as the ratio of secondary household cases to total household cases. It requires accurate attribution of infections within households, but does not require negative case counts.
SAR can be estimated as the ratio of secondary household cases to total susceptible household members. It requires accurate attribution as well as negative test counts in households.
CRI can be estimated as the ratio of total number of infected pairs in households to total number of household pairs. It requires positive and negative case count at household level but not direct attributions.
In practice, estimating these quantities is made difficult by asymptomatic infections and false negatives. Existing evidence suggests that asymptomatic cases constitute 18-43% of all infections and may have different households transmission dynamics than symptomatic cases (Lavezzo et al., 2020; Gudbjartsson et al.; Ferretti et al., 2020; Streeck et al., 2020). However most studies predominantly test symptomatic individuals, which leads to biases in the estimates. We can correct the estimates to account for asymptomatic cases. We calibrate our corrections on studies that have tested a representative sample of the population, as in studies of Vo’, Italy and Gangelt, Germany (Lavezzo et al., 2020; Streeck et al., 2020).
Regarding false negatives, recent studies have shown a high false-negative rates (FNR) for RT-PCR tests, and this rate varies over the time course of the disease. Kucirka et al. (2020) find an FNR of 20% for patients tested on Day 8 of infection and 67% on Day 1. The average FNR in the 10 days following onset of symptoms is estimated as 17% by Wikramaratna et al. (2020) and 30% by Kucirka et al. (2020). The FNR also varies between different kinds of swab (Wikramaratna et al., 2020) and between different labs (Kucirka et al., 2020). The problem of false negatives can be mitigated by correcting estimates using the FNR (Wikramaratna et al., 2020). However, in recent studies of household transmission (see Section 2.2.1), estimates of SAR have not been corrected for the FNR.
2.2 Procedure for estimating Rh, SAR, and CRI
In order to estimate Rh, SAR, and CRI and account for asymptomatics and false negatives we propose two approaches. First we adjust estimates from previous studies to account for false negatives and asymptomatics (see Section 2.2.1). Second, we estimate CRI on the blanket testing dataset from Vo’, Italy (Lavezzo et al., 2020) (see Section 2.2.3). Additionally, we have assembled our own dataset based on contact tracing data from Singapore, which we use to estimate Rh (see Section 2.2.2).
2.2.1 Adjusting SAR literature estimates
For the SAR, we searched PubMed and Google Scholar to find any previous work that estimates the household SAR from empirical data. We found a total of eight papers. Whenever appropriate, we recalculate the SAR to correct for the FNR of RT-PCR testing and the asymptomatic rate (AR). We define a Bayesian model that relates SAR, FNR and AR. We assume uniform priors for FNR and AR on intervals reported in literature (FNR (0.15-0.35) and AR (0.18, 0.43)). For studies that only testest symptomatic household members we model the probability that a household member is infected, as: p = SAR * (1 − FNR) * (1 − AR). After correcting the positive case counts, we recalculated 95% credible intervals via rejection sampling. After adjusting credible intervals for SAR we compute ranges for the corresponding Rh values.
2.2.2 Singapore contact tracing data for estimating Rh
Singapore’s Ministry of Health has collected information about COVID-19 cases tracked via contact tracing (Singapore MOH, 2020). UpCode Academy published this data in an interactive dashboard (Singapore COVID-19 Dashboard, 2020). We extracted associated metadata for each positive case along with a directed graph providing information about the infection source. This resulted in a case and transmission network for 6588 patients. Confirmation dates for cases ranged from January, 23rd to April 19th.
We used the data to construct a transmission graph, where nodes correspond to cases and edges to infections. As terminology, we distinguish source cases (which are the cause of infections) from target cases (which are infected by sources). When relationships are known, edge annotations reflect the relationship between source and target cases (e.g. family member, colleague, contact). Rh can be thus estimated as the ratio of total number of household infections to total number of source cases. Since there is no annotation for “household”, we use the annotation “family member” as a proxy.
A schematic infection graph is shown in Figure 1. In the case of real data some of the edges are not direct source-target attributions, but are mediated by ‘Cluster’ nodes. Despite not always having direct infection attributions we can use temporal information to determine the number of sources and targets.
2.2.3 Data from Vo’ for estimating Rh and CRI
Most of the population in the town of Vo’, Italy was tested both at the start of a lockdown and 14 days later (Lavezzo et al., 2020). The two phases of testing covered 86% (2812 subjects) and 72% (2343 subjects) of the population in Vo’, respectively.
We analyzed data collected in order to estimate Rh and CRI. Following the same terminology as in the previous section, the date of the second round of testing acts as a cut-off date. Since the data does not trace infections to sources, we assume that all secondary household infections were caused by the primary case.
We estimate the FNR by assuming that symptomatic cases in household with other confirmed infections are positive cases.
3 Results
3.1 Estimates from literature
Estimates of SAR are shown in Table 1. The estimates vary considerably, with some confidence intervals having no overlap. This could be explained by variation in FNR and the proportion of asymptomatic primary cases. Studies also varied in how quickly primary cases were isolated from their households (either at home or in hospitals), which likely explains part of the variation. Overall, we conclude that the SAR is in the range 10-61%, assuming some effort is made to isolate symptomatic household members. This is similar to the household SAR for H1N1 influenza (15%) and SARS (15%) (Streeck et al., 2020).
3.2 Estimates from Singapore contact tracing data
In Singapore, the quality of annotations appears to have degraded over time. Therefore, we limited our analysis to consider source cases that were confirmed prior to March 27th 2020, along with their corresponding target cases (even if the target cases are confirmed later). We believe this cut-off date was early enough to ensure data quality. This subset of the infection graph contains a total of 1076 nodes and 599 edges. Out of the total number of cases, 710 are connected to other known cases forming 111 clusters. Among these, 417 out of 710 are source cases with confirmation date prior to the cut-off. The remaining 366 out of 1076 are not traced to any other case.
The effective reproductive number R is the average number of new infections generated by each case. We can estimate R from our transmission graph by computing the ratio of total number infections to total number of source cases. The data contains 599 total infections and 417 source cases, which results in an estimated R of 1.44 (0.77-2.31). Figure 2 contains estimates for cut-off dates other than March 27th, central estimates for which range between 0.90 and 4.93. Regarding household transmission, there are 108 household infections. Assuming FNR of 20% the corrected estimate for Rh is 0.32 (0.22-0.42).
The corrected central estimate ranges from 0.19 to 0.34 depending on the cut-off date (see Figure 2; also Figure 7 and Table 4 in Appendix A).
3.3 Estimates from blanket testing data
In Vo’, Italy, the first round of testing recorded 73 positive cases among 53 households with a total of 137 members. The second round of testing identified 7 more positive cases, 3 of which were in households with prior reported cases. The CRI can be estimated as the ratio between the total number of ordered pairs among infected individuals in the same household (60) and the total number of ordered household pairs where the first individual in the pair is infected (136). This yields a point estimate for CRI of .
Assuming that secondary infections are caused by the primary case in a household, there 73 sources, 23 (73+3-53) targets (household infections), and 84 (137-53) susceptible individuals (household contacts). This yields a point estimate for Rh of . Due to the nature of blanket testing we cannot determine the number of index cases, which makes estimates of SAR unreliable.
Correcting for false negatives and asymptomatics Out of subjects residing in households with at least one conformed case, 7 have experienced symptoms but have tested negative. We can adjust for false negatives by considering the distribution of case counts across households. Assuming an asymptomatic rate of 0.41 Lavezzo et al. (2020) (among confirmed cases), and a false negative rate (FNR) of 22%, the expected number of false negatives in infected households is 6.77, which corresponds to the observed value. For FNR value of 22%, and average household size of 2.1.2, the corrected estimates are CRI= 0.54 and Rh = 0.37 ((0.34 − 0.40)). We estimate confidence intervals for Rh by making normal approximations to the conditional distribution of the corrected Rh values for each individual. In the case of CRI we cannot
Confounds and heterogeneity. We might be concerned that testing decreased the household transmission due to earlier household isolation, as in (Bi et al., 2020). However, if we restrict to early cases (where the first household member has symptoms prior to the start of testing), the adjusted CRI is essentially unchanged at 48%.
Next we might ask about heterogeneity in CRI: perhaps spouses have a very high CRI rate. But among presumed partners (adult co-habitants in the same ten-year age range), CRI was still only 52%. Possibly more important is age: CRI is only 24% among individuals under 50 when adjusting for 15% FNR. There is evidence that younger people have a higher FNR, but even an FNR of 40% would only lead to an adjusted CRI of 33%.
Each of these subgroup analyses is on only a small number of total cases and so should be interpreted with caution. In addition, the Vo’ data may overestimate both Rh and the CRI due to an older population, lack of awareness of infection risk in early February and violation of single index case assumption. In the other direction, Vo’ has a smaller-than-typical mean household size of 2.1.
4 Discussion
4.1 The contribution of household transmission to reproduction number
In the previous section, we estimated the contribution of household transmission to the spread of COVID-19 through SAR, CRI, and the household reproduction number Rh from a variety of sources. Estimates for Rh based on contact tracing data and blanket testing data ranged from 0.2-0.37. Here we examine the extent to which this household transmission explains overall transmission levels. More formally, we study the ratio of the effective household reproductive number Rh to the effective reproductive number R.
We estimated R across geographic regions where enough data was available3 for both pre- and post-lockdown time periods (Figure 3). Specifically, we used an overdispersed (α = 0.1) Poisson model (Yadlowsky et al., 2020) fit to death data (Dong et al., 2020) to estimate the growth rate, which was translated into R using the generation time distribution ω(t) estimated by Ferretti et al. (2020).
Based on our results in Table 1, we assume that Rh = 0.3 for all regions pre-lockdown. After lockdown, location tracking data from Google shows that people spent more time in their households (an 11% increase in the United States (Google, 2020)). Hence we estimate post-lockdown Rh as 0.3 · M, where M is the increase in average time spent inside residential areas (Google, 2020).
Figure 4 (left) shows a scatter plot of estimated community vs household reproduction numbers for each region we study. Figure 4 (right) shows a histogram of the estimated contribution of household transmission to total reproduction number both pre- and post-lockdown. The share of R attributed to household transmission increased from 0-25% pre-lockdown to 25-50% post-lockdown. This shows there could be meaningful benefits from interventions that reduce Rh.
4.2 Implications for household inevitability and containment
We previously identified two common arguments against reducing household transmission: household inevitability and household containment. We now revisit these theories in light of our estimates.
Inevitability. Household inevitability posits that once a household member is infected, the other members are sure to be infected, thus making interventions futile. While we estimate that household transmission contributes significantly to the effective reproduction number of COVID-19, household SAR is unlikely to be above 25% and so transmission is not inevitable. Based on what is known about the incubation and generation time (Jing et al., 2020; Ferretti et al., 2020), the SAR could be reduced by immediately isolating any symptomatic household members in a separate room. To our knowledge, this policy has not been systematically studied. Li et al. (2020) report the SAR was 0% for cases who isolated on symptom onset, but only study 14 primary cases. We expect interventions that reduce transmission outside the household-such as PPE (Feng et al., 2020), physical distancing, household hygiene, and spending social time in ventilated or outdoor spaces (Qian et al., 2020)-would also reduce the household SAR, but there is not yet strong evidence supporting this.
Containment. Household containment posits that people infected in the household (as opposed to the community) are unlikely to cause further community infection. If true, this would imply that interventions to reduce household transmission are less valuable for reducing overall spread.
We can quantify this effect, if it exists, by considering secondary transmission rates from household and community infections. To simplify matters we refer to those infected via household transmission as “household cases” and those infected via community transmission as “community cases.” We then let Rc|h be the average number of community infections caused by a household case, and Rc|c the average number of community infections caused by a community case. Household containment asserts that Rc|h/Rc|c ≪ 1.
To our knowledge, there is no public dataset that allows us to test household containment empirically. Accurate contact tracing data would enable us to measure Rc|h and Rc|c directly by counting cases, but existing data does not reliably distinguish between primary and secondary infections within a household. Instead, we provide an a priori analysis. There are two relevant differences between people infected at home and those infected in the community:
Household cases may be demographically different (e.g., they could predominantly be children) or have different levels of exposure to the community (e.g., they could predominantly be non-essential workers).
Household cases may self-isolate earlier than community cases, based on observing that the primary case in the household is symptomatic.
These demographic and contact differences could decrease Rc|h/Rc|c: for example, if all household cases are non-essential workers and all community cases are essential workers, the differences in contact would decrease Rc|h/Rc|c by a factor of 3.5 (based on self-reported contact data from (Rothwell, 2020)). We also expect early isolation protocols to have an effect. The effect size should depend on the proportion of households that decide to use early isolation, and the probability of secondary transmission under the protocol. The former is dependent on policy and population behaviour, whereas the latter can be estimated from disease characteristics (generation time distribution, incubation time distribution, and asymptomatic rate; c.f. (Ferretti et al., 2020)).
More empirical data on these quantities, or more fine-grained contact tracing and testing data, is needed for better estimates and will facilitate a proper cost-benefit analysis around reducing household transmission.
4.3 Other types of contact heterogeniety
Our results highlight the fact that net community mobility gives an incomplete picture of overall transmission, and thus a contact-based view of transmission dynamics (Hens et al., 2009; Eames et al., 2012; Mossong et al., 2008) may be helpful. Contact-based models may be especially relevant in the presence of interventions that drastically shift contact patterns, such as lockdown orders. For example, lockdowns have amplified differences in contact between essential workers and the rest of the population: a recent Gallup panel (Rothwell, 2020) found that essential workers report 3.5 times as many contacts per day as other individuals. In this section, we briefly show how even outside of households, studying contact may help shed light on the effectiveness of policy interventions.
While the most accurate way to study contact in an epidemic is through a network or graph model (Meyers et al., 2005; Lloyd-Smith et al., 2005; Kiss et al., 2006), prior work has shown that even simple stratified compartmental models can be effective (Wallinga et al., 2006; Jarvis et al., 2020; Xia et al., 2013; Liu et al., 2020) at predicting epidemic growth. These models partition the population into groups and are specified by the average level of contact between groups, rather than between individuals; a common choice is to assign groups to age ranges (e.g. the POLYMOD study (Mossong et al., 2008)).
Here, we use a compartmental model with only two groups: people staying at home (‘low contact’ or LC) and essential workers (‘high contact’ or HC). Transmission under our HC/LC model is determined by a two-by-two contact matrix M, where Mij is the number of contacts made by a member of group j ∈ {LC, HC} with members of group i ∈ {LC, HC}. Since we are not aware of high-quality data measuring these quantities directly during lockdowns, we instead obtain coarse estimates from available survey data. Using self-reported contact data (Rothwell, 2020; Jarvis et al., 2020), we obtain point estimates for the contact matrix in the United States (see Figure 5; methodology, sensitivity analysis, and an analagous estimate based on data from the UK are shown in the Appendix).
A common assumption for respiratory diseases such as SARS-CoV-2 is that exposure levels are proportional to contact levels (Edmunds et al., 1997; Wallinga et al., 1999; 2006). Under this assumption, the dominant eigenvalue of the contact matrix is a constant multiple of the reproductive number (more precisely, the contact matrix is a constant multiple of the “next-generation matrix” (Diekmann et al., 1990; den Driessche and Watmough, 1992), whose dominant eigenvalue is R). We can therefore estimate how changes in contact affect disease transmission, by looking at how each entry of the contact matrix affects the dominant eigenvalue (Caswell, 2006; Klepac et al., 2020).
Since the two groups in our model are easily identifiable, policymakers can target interventions to specific modes of interaction (e.g. by enforcing more stringent physical distancing in the workplace to reduce HC-HC contact, or providing PPE to workers with public-facing occupations to reduce HC-LC contact). To forecast the effect of such interventions, we consider the decrease in reproduction number R caused by a 10% decrease in each type of contact (Figure 6). The results predict that reducing contact between high-contact individuals will have disproportionate effect on reducing overall transmission. Specifically, our point estimate predicts a 10% reduction in contact between high-contact individuals (HC-HC contact) being 35x more effective than a 10% reduction in LC-LC contact, and 8x more effective than a 10% reduction in HC-LC contact.
These predictions are based on rather crude parameter estimates, and could be greatly improved by more direct measurements of contact structures, and by incorporating heterogeneity from other sources (e.g. by also stratifying the model on age (Jarvis et al., 2020) or contact location (Liu et al., 2020)).
Data Availability
Public data was used for the analysis. Contact tracing data, which was compiled by automatically aggregating data from an interactive dashboard, is available in a consolidated form in our public codebase.
B Contact Matrix Estimation
We thus focus on a simple model analagous to the age stratification model, but with just two groups: high-contact individuals (HC) and low-contact individuals (LC). To simplify matters, we will treat the HC cluster as consisting of essential workers only, with the rest of the population belonging to the LC cluster. If we make assumptions analagous to those made in the age stratification model4, then once again the contact matrix M determines disease transmission via its dominant eigenvalue and eigenvector (Caswell, 2006). This observation leaves us with four relevant quantities to estimate: MHC,HC, MHC,LC, MLC,HC, and MLC,LC. Using self-reported contact data from the US (Rothwell, 2020) we provide estimates for each of these parameters below.
Total contacts. The most readily available statistic from the available data is the number of total contacts reported by members of each cluster. In the US, Rothwell (2020) find that those who are working report 13.9 contacts per day, whereas those not working report 4.0 contacts per day.
Population proportions. Allen et al. (2020) suggest that essential workers represent 40% of the US workforce, which as of 2018 represents 63% of the US population5. Since 29% of the US workforce is reportedly equipped to work from home6, we estimate that 18% of the population is doing essential work.
Estimating MHC,HC. This parameter corresponds to the number of essential workers that an essential worker will have close contact with, on average. This is equal to the number of contacts an essential worker has with co-workers, added to the average number of contacts an essential worker will have with other essential workers in the general population.
For coworker interactions, Rothwell (2020) report that 9.9 of the 13.9 interactions had by essential workers were at their places of work, or 71% of all contacts. This is likely to be an overestimate of coworker interaction, since it includes interactions with low-contact individuals at the essential worker’s workplace (e.g. clients at a bank or shoppers at a grocery store). To get a reasonable lower bound, we use the number of workplace contacts reported by essential workers in sectors that are not public-facing, namely construction, agriculture, and communications. Workplace contacts for these sectors range from 6.1-7.4, and so we take 6.8 as a central estimate, or 49% of all contacts. This is likely an underestimate, since it does not take into account high-contact jobs such as those in food processing or assembly lines.
To get MHC,HC, we have to add the number of interactions with coworkers to the number of interactions with essential workers in the general population. To estimate the latter, we take the remaining number of total essential worker contacts (e.g. if we estimate 9.9 coworker contacts, 4.0 remain) and use population proportions to estimate how many of them are with other essential workers (e.g. 0.18 · 4.0 = 0.7).
In the end, we give a central estimate of 9.35 for MHC,HC based on the range [8.1,10.6].
Estimating MHC,LC, MLC,HC, and MLC,LC. Once we obtain an estimate of MHC,HC, the remaining parameters can be straightforwardly deduced. First, MLC,HC is the number of total essential worker contacts minus MHC,HC. This entry uniquely determines MHC,LC, since contact is symmetric and thus wHC · MLC,HC = (1 − wHC) · MHC,LC, where wHC is the proportion of essential workers in the population. After solving for MHC,LC, the final entry MLC,LC is obtained by subtracting MHC,LC from the total number of contacts reported by individuals in the LC cluster.
Footnotes
↵1 https://github.com/andrewilyas/covid-household-transmission
↵2 More accurately, 2.1 was the mean number of individuals in each household who were tested.
↵3 Regions were selected from a pool of all US states and over 100 other regions, and were included based on (a) availability of lockdown dates, (b) having enough pre-lockdown data to estimate Rreliably. The exact calculations are given in our code release.
↵4 Concretely, that transmission depends only on the number of cluster-specific contacts, the infection probability of each contact, and the contact duration
↵5 https://www.bls.gov/emp/tables/civilian-labor-force-participation-rate.htm