Estimating Household Transmission of SARS-CoV-2

,

Interpretation.Our estimates of R h suggest that household transmission was a small fraction (5%-35%) of R before social distancing but a large fraction after (30%-55%).This suggests that household transmission may be an effective target for interventions.A remaining uncertainty is whether household infections actually contribute to further community transmission or are contained within households.This can be estimated given high-quality contact tracing data.
More broadly, our study points to emerging contact patterns (i.e., increased time at home relative to the community) playing a role in transmission of SARS-CoV-2.We briefly highlight another instance of this phenomenon (differences in contact between essential workers and the rest of the population), provide coarse estimates of its effect on transmission, and discuss how future data could enable a more reliable estimate.

Introduction
Lockdowns and stay-at-home orders have been deployed across the world to curb the growth of SARS-CoV-2, the virus that causes COVID-19.These policies reduce transmission by reducing overall mobility.
Since the introduction of lockdowns, community mobility has decreased by 28% − 70% (Google, 2020), and the effective reproduction number R has decreased by an average of 48% (IQR 38-65%) across 16 US states (shown in Figure 4).At the same time, these policies have increased time spent at home and therefore likely increased household transmission of SARS-CoV-2: Google (2020) reports 15% − 29% increases in time spent at home in these states.In the UK, Shanghai, and Wuhan, the majority of social contacts occurred in households after lockdown.The estimated proportion of contacts is 58% for the UK (up from 34% pre-lockdown from Jarvis et al. (2020)) and 79% and 94% for Shanghai and Wuhan respectively (Zhang et al., 2020).Household transmission played an important role in previous epidemics such as the 2009 H1N1 pandemic (Endo et al., 2019;Tsang et al., 2016;Lau et al., 2012).It may now also be an important factor in SARS-CoV-2 transmission, especially since community transmission has decreased and household transmission has increased under stay-at-home policies.
Our aim is to estimate household transmission and determine its overall contribution to growth of SARS-CoV-2.We do so both via a meta-analysis of previous studies, a direct analysis of data collected in Vo', Italy (Lavezzo et al., 2020), and analysis of contact tracing data that we curated from Singapore (Singapore COVID-19 Dashboard, 2020).We correct these data for false negatives and in some cases for selective testing of only symptomatic cases.Code for our analysis, together with the Singapore dataset, are available at https://github.com/andrewilyas/covid-household-transmission.
We estimate that household transmission contributes between 0.22 and 0.42 infections to the overall reproduction number.In comparison, overall reproduction numbers during stay-at-home policies ranged from 0.65 to 1.25.We conclude that household transmission is now a significant contributor to overall SARS-CoV-2 reproduction (see Figure 5 for region-specific estimates).
We next examine two folk theories that have sometimes been used to argue against attending to household transmission.The two theories are that preventing transmission within a household is either (a) futile, since transmission is inevitable or (b) low-impact, since secondary infections are "contained" in the household (Horowitz, 2020): 1. Household inevitability: No interventions to reduce household transmission are likely to work.
2. Household containment: People infected at home will stay at home and not cause community spread.
Our meta-analysis of the household secondary attack rate yielded a central estimate of 30% and estimates for individual studies ranging from 11% − 55%.This implies that household infection is not a certainty and can be mediated by differences in setting, which casts doubt on inevitability.In Section 4.2 we discuss policies for reducing household infection.More generally, our estimates imply that infection is not certain even among individuals who are in regular contact.
We formalize the household containment theory in terms of ratios of secondary transmission, and show that these can be estimated from well-annotated contact tracing data.Unfortunately, existing data are too noisy to reliably estimate containment.In Section 4.2 we outline how these rates can be approximated from aggregate data, and suggest future work for estimating them directly.
Our results underscore that mobility is an imperfect measure of infection risk, since decreases in mobility likely increase household infections.We suggest instead basing policy on contact patterns, which mediate transmission both for COVID-19 (Liu et al., 2020;Jarvis et al., 2020) and previous epidemics (Edmunds et al., 1997;Wallinga et al., 1999;2006;Mossong et al., 2008;Hens et al., 2009;Eames et al., 2012).To this end, we explore contact heterogeneity caused by stay-at-home orders, due to the divide between essential workers and the rest of the population.An initial simulation study (Section 4.3) forecasts that decreasing transmission risk between essential workers is 8−35 times as effective as reducing other types of transmission.

Methods
The dynamics of disease spread are described via the effective reproduction number R, which measures the average number of new infections caused by each infected person i.To quantify household transmission, we decompose R into two components R = R c + R h .The community reproduction number R c is the average number of new infections caused by an infected individual i outside i's household.The intra-household reproduction number R h is the average number of new infections caused by an infected individual i inside i's household.The ratio R h R measures the fraction of transmission occurring within households.In addition to R h , we consider two measures of intra-household spread that quantify individual risk of infection.The household secondary attack rate (SAR) is the probability an infected person i infects a specific household member j.Additionally, we define the household conditional risk of infection (CRI) to be the probability that j is infected, conditioned on household member i being infected.In contrast to the SAR, the CRI allows for j having been infected by someone other than i, and so CRI can be estimated without attributing infections to particular primary cases.

Challenges in estimating R h , SAR and CRI from empirical data
Estimating the growth and prevalence of household cases requires detailed household-level data.The relevant information to estimate R h , SAR, and CRI includes: • Testing results for all members of a household, including both positive and negative cases.
• Infection attribution within a household, which involves identifying the primary cases in a household and constructing the household transmission chains.
• Clinical outcome at the end of the study.If a case is still infectious at the end of a study the data might undercount the number of infections attributed to the case.
Assuming accurate household level data we can estimate R h , SAR, and CRI as follows: • R h can be estimated as the ratio of secondary household cases to total household cases.It requires accurate attribution of infections within households, but does not require negative case counts.
• SAR can be estimated as the ratio of secondary household cases to total susceptible household members.It requires accurate attribution as well as negative test counts in households.
• CRI can be estimated as the ratio of total number of infected pairs in households to total number of household pairs.It requires positive and negative case count at household level but not direct attributions.
In practice, estimating these quantities is made difficult by asymptomatic infections and false negatives.Existing evidence suggests that asymptomatic cases constitute 18-43% of all infections and may have different households transmission dynamics than symptomatic cases (Lavezzo et al., 2020;Gudbjartsson et al.;Ferretti et al., 2020;Streeck et al., 2020).However most studies predominantly test symptomatic individuals, which leads to biases in the estimates.We can correct the estimates to account for asymptomatic cases.We calibrate our corrections on studies that have tested a representative sample of the population, as in studies of Vo', Italy and Gangelt, Germany (Lavezzo et al., 2020;Streeck et al., 2020).
Regarding false negatives, recent studies have shown high false-negative rates (FNR) for RT-PCR tests, and this rate varies over the time course of the disease.Kucirka et al. (2020) find an FNR of 20% for patients tested on Day 8 of infection and 67% on Day 4. The average FNR in the 10 days following onset of symptoms is estimated as 17% by Wikramaratna et al. (2020) and 30% by Kucirka et al. (2020).The FNR also varies between different kinds of swab (Wikramaratna et al., 2020) and between different labs (Kucirka et al., 2020).The problem of false negatives can be mitigated by correcting estimates using the FNR (Wikramaratna et al., 2020).However, in recent studies of household transmission (see Section 2.2.1), estimates of SAR have not been corrected for the FNR.

Procedure for estimating R h , SAR, and CRI
In order to estimate R h , SAR, and CRI and account for asymptomatics and false negatives we propose two approaches.First we adjust estimates from previous studies to account for false negatives and asymptomatics (see Section 2.2.1).Second, we estimate CRI on the blanket testing dataset from Vo', Italy (Lavezzo et al., 2020)  In the right plot the Meta Estimates refer to the credible intervals over the mean and standard deviation of R h .The relatively wide intervals for the mean along with large variance of the meta estimate of SAR and R h are due to high variability across studies.If a study has a single asterisk, this means it was unnecessary to adjust for asymptomatics (only false negatives).The double asterisk means no adjustment was necessary.

Adjusting SAR literature estimates
For the SAR, we searched PubMed and Google Scholar to find any previous work that estimates the household SAR from empirical data.We found a total of nine papers.Whenever appropriate, we recalculated the estimated SAR for each study to correct for the false-negative rate (FNR) of RT-PCR testing and the proportion of cases that are asymptomatic (denoted "AR").We used a hierarchical Bayesian random effects model both for correcting estimates from individual studies and for pooling results to compute a meta-analysis estimate of the SAR.In the model, the SAR for study i (denoted SAR i ) is drawn from a Beta distribution and each study has a false-negative rate FNR i drawn from a prior based on estimates in the literature (Section 2.1).The proportion of asymptomatics AR is shared across studies and is also drawn from a prior based on existing literature.The likelihood of a household member testing positive is for studies where only symptomatic contacts were tested.To estimate the household reproduction number R h for each study i, we adjust the total number of secondary cases using FNR i and AR and divide by the number of primary cases.Results are shown in Figure 1 and a full description of the model is found in Appendix B.

Singapore contact tracing data for estimating R h
Singapore's Ministry of Health has collected information about COVID-19 cases tracked via contact tracing (Singapore MOH, 2020).UpCode Academy published this data in an interactive dashboard (Singapore COVID-19 Dashboard, 2020).We extracted associated metadata for each positive case along with a directed graph providing information about the infection source.This resulted in a case and transmission network for 6588 patients.Confirmation dates for cases ranged from January, 23rd to April 19th.We used the data to construct a transmission graph, where nodes correspond to cases and edges to infections.As terminology, we distinguish source cases (which are the cause of infections) from target cases (which are infected by sources).We define a cut-off date, such that all infections with confirmation date prior to the cut-off are labeled as sources.All the nodes that have an incident edge from a source are labeled as targets.A node in the infection graph can be both source and target.When relationships are known, edge annotations reflect the relationship between source and target cases (e.g.family member, colleague, contact).R h can be thus estimated as the ratio of total number of household infections to total number of source cases.Since there is no annotation for "household", we use the annotation "family member" as a proxy.
A schematic infection graph is shown in Figure 2. In the case of real data some of the edges are not direct source-target attributions, but are mediated by 'Cluster' nodes.Despite not always having direct infection attributions we can use temporal information to determine the number of sources and targets.

Data from Vo' for estimating R h and CRI
Most of the population in the town of Vo', Italy was tested both at the start of a lockdown and 14 days later (Lavezzo et al., 2020).The two phases of testing covered 86% (2812 subjects) and 72% (2343 subjects) of the population in Vo', respectively.
We analyzed data collected in order to estimate R h and CRI.Following the same terminology as in the previous section, the date of the second round of testing acts as a cut-off date.Since the data does not trace infections to sources, we assume that all secondary household infections were caused by the primary case.
We estimate the FNR by assuming that symptomatic cases in households with other confirmed infections are positive cases.3 Results

SAR and R h estimates from previous studies
Estimates of the SAR for each study are shown in Figure 1 and Table 4 and precise counts and estimates in Tables 3 and 4. Our Bayesian model allows for the possibility of heterogeneity in the SAR across studies and we find substantial heterogeneity.Figure 1 shows that some 95% credible intervals for the SAR do not overlap.The pooled central estimate of the SAR is 30% with 95% credible interval (0%, 67%), which captures the model's posterior prediction for the SAR for a new study.If we do not correct estimates for false-negatives and asymptomatics, the central estimate is 20% (0%, 43%).The heterogeneity of SAR across studies is partly due to variation in the false-negative rate, which we model but do not observe directly.It may also be due to variation in the cohort of primary cases and their behavior in the household.As a point of comparison, the household SAR has been estimated to be 15% for H1N1 influenza (Carcione et al., 2011) and 8% for SARS (Lau et al., 2004).One concern about previous studies is that asymptomatics are under-represented among the primary cases.This could also contribute to the observed heterogeneity.Table 4 shows estimates of the CRI based on random population testing.These estimates (0.31 for Gangelt, Germany and 0.50 for Vo', Italy) are consistent with our central estimate of SAR, which predicts CRI ≈ 0.41.This provides some evidence that our SAR estimates are robust to under-representation of asymptomatics.

Estimates from Singapore contact tracing data.
In Singapore, the quality of annotations appears to have degraded over time.Therefore, we limited our analysis to consider source cases that were confirmed prior to March 27th 2020, along with their corresponding target cases (even if the target cases are confirmed later).We believe this cut-off date was early enough to ensure data quality.This subset of the infection graph contains a total of 1076 nodes and 599 edges.Out of the total number of cases, 710 are connected to other known cases forming 111 clusters.Among these, 417 out of 710 are source cases with confirmation date prior to the cut-off.The remaining 366 out of 1076 are not traced to any other case.
The effective reproductive number R is the average number of new infections generated by each case.We can estimate R from our transmission graph by computing the ratio of total number infections to total number of source cases.The data contains 599 total infections and 417 source cases, which results in an estimated R of 1.44 (0.77-2.31).Figure 3 contains estimates for cut-off dates other than March 27th, central estimates for which range between 0.90 and 4.93.Regarding household transmission, there are 108 household infections.Assuming FNR of 20% the corrected estimate for R h is 0.32 (0.22-0.42).
The corrected central estimate ranges from 0.19 to 0.34 depending on the cut-off date (see Figure 3; also Figure 8 and Table 5 in Appendix A).

Estimates from blanket testing data
In Vo', Italy, the first round of testing recorded 73 positive cases among 53 households with a total of 137 members.The second round of testing identified 7 more positive cases, 3 of which were in households with prior reported cases.The CRI can be estimated as the ratio between the total number of ordered pairs among infected individuals in the same household (60) and the total number of ordered household pairs where the first individual in the pair is infected (136).This yields a point estimate for CRI of 0.44 = 60 136 .Assuming that secondary infections are caused by the primary case in a household, there 73 sources, 23 (73+3-53) targets (household infections), and 84 (137-53) susceptible individuals (household contacts).This yields a point estimate for R h of 0.31 = 23 73 .Due to the nature of blanket testing we cannot determine the number of index cases, which makes estimates of SAR unreliable.
Correcting for false negatives and asymptomatics.Out of subjects residing in households with at least one conformed case, 7 have experienced symptoms but have tested negative.We can adjust for false negatives by considering the distribution of case counts across households.Assuming an asymptomatic rate of 0.41 Lavezzo et al. (2020) (among confirmed cases), and a false negative rate (FNR) of 22%, the expected number of false negatives in infected households is 6.77, which corresponds to the observed value.
Cut-off Date For FNR value of 22%, and average household size of 2.1.2, the corrected estimates are CRI= 0.54 and R h = 0.37 ((0.34 − 0.40)).We estimate confidence intervals for R h by making normal approximations to the conditional distribution of the corrected R h values for each individual.In the case of CRI we cannot accurately quantify uncertainty due to heterogeneity in household sizes and difficulties in estimating the number of index cases in the community.(see Table 2 for other FNR values).
Confounds and heterogeneity.We might be concerned that testing decreased the household transmission due to earlier household isolation, as in (Bi et al., 2020).However, if we restrict to early cases (where the first household member has symptoms prior to the start of testing), the adjusted CRI is essentially unchanged at 48%.Next we might ask about heterogeneity in CRI: perhaps spouses have a very high CRI rate.But among presumed partners (adult co-habitants in the same ten-year age range), CRI was still only 52%.Possibly more important is age: CRI is only 24% among individuals under 50 when adjusting for 15% FNR.There is evidence that younger people have a higher FNR, but even an FNR of 40% would only lead to an adjusted CRI of 33%.
Each of these subgroup analyses is on only a small number of total cases and so should be interpreted with caution.In addition, the Vo' data may overestimate both R h and the CRI due to an older population, lack of awareness of infection risk in early February and violation of single index case assumption.In the other direction, Vo' has a smaller-than-typical mean household size of 2.1.

The contribution of household transmission to reproduction number
In the previous section, we estimated the contribution of household transmission to the spread of COVID-19 through SAR, CRI, and the household reproduction number R h from a variety of sources.Estimates for R h based on contact tracing data and blanket testing data ranged from 0.2-0.37.Here we examine the extent to which this household transmission explains overall transmission levels.More formally, we study the ratio of the effective household reproductive number R h to the effective reproductive number R.
We estimated R across geographic regions where enough data was available3 for both pre-and postlockdown time periods (Figure 4).Specifically, we used an overdispersed (α = 0.1) Poisson model (Yadlowsky et al., 2020) fit to death data (Dong et al., 2020) to estimate the growth rate, which was translated into R using the generation time distribution ω(t) estimated by Ferretti et al. (2020).
Based on our results in Table 4, we assume that R h = 0.3 for all regions pre-lockdown.After lockdown, location tracking data from Google shows that people spent more time in their households (an 11% increase in the United States (Google, 2020)).Hence we estimate post-lockdown R h as 0.3 • M , where M is the increase in average time spent inside residential areas (Google, 2020).
Figure 5 (left) shows a scatter plot of estimated community vs household reproduction numbers for each region we study.Figure 5 (right) shows a histogram of the estimated contribution of household transmission to total reproduction number both pre-and post-lockdown.The share of R attributed to household transmission increased from 0-25% pre-lockdown to 25-50% post-lockdown.This shows there could be meaningful benefits from interventions that reduce R h .

Implications for household inevitability and containment
We previously identified two common arguments against intervening against household transmission: household inevitability and household containment.We revisit these theories in light of our estimates.
Inevitability.Household inevitability posits that once a household member is infected, the other members will also become infected, thus making interventions futile.Our central estimate of the household SAR is 30% with less than 5% posterior probability assigned to the SAR being above 67%.This shows that household transmission is not inevitable.There is also evidence that the SAR can be reduced by behavioral interventions.Wang et al. (2020a) found that the SAR was lower in households where people wore masks at home, cleaned regularly with disinfectant, and avoided close contact with the primary cases.Li et al. (2020) found the SAR was 0% for households where the primary case was isolated on symptom onset.These results comport with what is known about the incubation and generation time (Jing et al., 2020;Ferretti et al., 2020) and the effectiveness of PPE (Feng et al., 2020).However, these results are both observational studies and we await more rigorous experiments.The growth rate was estimated by fitting an overdispersed Poisson distribution onto daily death statistics, as described in Yadlowsky et al. (2020).This was translated into a reproduction number R via the generation time distribution (c.f.(Ferretti et al., 2020)).95% confidence intervals are shown.
Containment.Household containment posits that people infected in the household (as opposed to the community) are unlikely to cause further community infection.If true, this would imply that interventions to reduce household transmission are less valuable for reducing overall spread.We can quantify this effect, if it exists, by considering secondary transmission rates from household and community infections.To simplify matters we refer to those infected via household transmission as "household cases" and those infected via community transmission as "community cases."We then let R c|h be the average number of community infections caused by a household case, and R c|c the average number of community infections caused by a community case.Household containment asserts that R c|h /R c|c 1.To our knowledge, there is no public dataset that allows us to test household containment empirically.Accurate contact tracing data would enable us to measure R c|h and R c|c directly by counting cases, but existing data does not reliably distinguish between primary and secondary infections within a household.Instead, we provide an a priori analysis.There are two relevant differences between people infected at home and those infected in the community: 1. Household cases may be demographically different (e.g., they could predominantly be children) or have different levels of exposure to the community (e.g., they could predominantly be non-essential workers).
2. Household cases may self-isolate earlier than community cases, based on observing that the primary case in the household is symptomatic.
These demographic and contact differences could decrease R c|h /R c|c : for example, if all household cases are non-essential workers and all community cases are essential workers, the differences in contact would decrease R c|h /R c|c by a factor of 3.5 (based on self-reported contact data from (Rothwell, 2020)).We also expect early isolation protocols to have an effect.The effect size should depend on the proportion of households that decide to use early isolation, and the probability of secondary transmission under the In both graphs we assume R h = 0.3 pre-lockdown-to obtain post-lockdown R h we multiply by a mobility factor M , from Google's estimates of average time spent in residential areas.
protocol.The former is dependent on policy and population behaviour, whereas the latter can be estimated from disease characteristics (generation time distribution, incubation time distribution, and asymptomatic rate; c.f. (Ferretti et al., 2020)).
More empirical data on these quantities, or more fine-grained contact tracing and testing data, is needed for better estimates and will facilitate a proper cost-benefit analysis around reducing household transmission.

Other types of contact heterogeneity
Our results highlight the fact that net community mobility gives an incomplete picture of overall transmission, and thus a contact-based view of transmission dynamics (Hens et al., 2009;Eames et al., 2012;Mossong et al., 2008) may be helpful.Contact-based models may be especially relevant in the presence of interventions that drastically shift contact patterns, such as the lockdown orders prompted by COVID-19 (Zhang et al., 2020).For example, lockdowns have amplified differences in contact between essential workers and the rest of the population: a recent Gallup panel (Rothwell, 2020) found that essential workers report 3.5 times as many contacts per day as other individuals.In this section, we briefly show how even outside of households, studying contact may help shed light on the effectiveness of policy interventions.
While the most accurate way to study contact in an epidemic is through a network or graph model (Meyers et al., 2005;Lloyd-Smith et al., 2005;Kiss et al., 2006), prior work has shown that even simple stratified compartmental models can be effective (Wallinga et al., 2006;Jarvis et al., 2020;Xia et al., 2013;Liu et al., 2020;Zhang et al., 2020) at predicting epidemic growth.These models partition the population into groups and are specified by the average level of contact between groups, rather than between individuals; a common choice is to assign groups to age ranges: e.g., the POLYMOD study (Mossong et al., 2008), or in the context of COVID-19, compartmental age models of the UK (Jarvis et al., 2020) and seven Chinese cities (Liu et al., 2020;Zhang et al., 2020).
Here, we use a compartmental model with only two groups: people staying at home ('low contact' or LC) and essential workers ('high contact' or HC).Transmission under our HC/LC model is determined by a twoby-two contact matrix M , where M ij is the number of contacts made by a member of group j ∈ {LC, HC} with members of group i ∈ {LC, HC}.Since we are not aware of high-quality data measuring these quantities directly during lockdowns, we instead obtain coarse estimates from available survey data.Using self-reported contact data (Rothwell, 2020;Jarvis et al., 2020), we obtain point estimates for the contact matrix in the United States (see Figure 6; methodology, sensitivity analysis, and an analagous estimate based on data from the UK are shown in the Appendix).
A common assumption for respiratory diseases such as SARS-CoV-2 is that exposure levels are proportional to contact levels (Edmunds et al., 1997;Wallinga et al., 1999;2006).An additional implicit assumption of compartmental models is that infection rates between two groups are independent of the groups themselves.A more accurate model would account for disparities in infection rates (e.g., Zhang et al. (2020) find that social contacts tend to have a higher infection rate than workplace contacts)-this is also true of age-compartmental models, where the relative infectiousness of children and adults is not yet known (Jarvis et al., 2020).Under this assumption, the dominant eigenvalue of the contact matrix is a constant multiple of the reproductive number (more precisely, the contact matrix is a constant multiple of the "next-generation matrix" (Diekmann et al., 1990;den Driessche and Watmough, 1992), whose dominant eigenvalue is R).We can therefore estimate how changes in contact affect disease transmission, by looking at how each entry of the contact matrix affects the dominant eigenvalue (Caswell, 2006;Klepac et al., 2020).
Since the two groups in our model are easily identifiable, policymakers can target interventions to specific modes of interaction (e.g. by enforcing more stringent physical distancing in the workplace to reduce HC-HC contact, or providing PPE to workers with public-facing occupations to reduce HC-LC contact).To forecast the effect of such interventions, we consider the decrease in reproduction number R caused by a 10% decrease in each type of contact (Figure 7).The results predict that reducing contact between high-contact individuals will have disproportionate effect on reducing overall transmission.Specifically, our point estimate predicts a 10% reduction in contact between high-contact individuals (HC-HC contact) being 35x more effective than a 10% reduction in LC-LC contact, and 8x more effective than a 10% reduction in HC-LC contact.
These predictions are based on rather crude parameter estimates, and could be greatly improved by more direct measurements of contact structures, and by incorporating heterogeneity from other sources (e.g. by also stratifying the model on age or contact location (Jarvis et al., 2020;Liu et al., 2020;Zhang et al., 2020)).

B Hierarchical Model for Literature Estimates
We use the following Bayesian Hierarchical Model: Likelihood: We perform inference via MCMC sampling using PyMC34 with the the built-in NUTS Hoffman and Gelman (2014) sampler.We use 4 chains with 6000 iterations each.The burn-in period is 2000.
Figure 10 below contains histograms corresponding to posterior samples of SAR and R h as well as the posterior samples of the mean of SAR and R h respectively.

C Contact Matrix Estimation
We thus focus on a simple model analagous to the age stratification model, but with just two groups: high-contact individuals (HC) and low-contact individuals (LC).To simplify matters, we will treat the HC cluster as consisting of essential workers only, with the rest of the population belonging to the LC cluster.
If we make assumptions analagous to those made in the age stratification model5 , then once again the contact matrix M determines disease transmission via its dominant eigenvalue and eigenvector (Caswell, 2006).This observation leaves us with four relevant quantities to estimate: M HC,HC , M HC,LC , M LC,HC , and M LC,LC .Using self-reported contact data from the US (Rothwell, 2020) we provide estimates for each of these parameters below.
Total contacts.The most readily available statistic from the available data is the number of total contacts reported by members of each cluster.In the US, Rothwell (2020) find that those who are working report 13.9 contacts per day, whereas those not working report 4.0 contacts per day.
Population proportions.Allen et al. (2020) suggest that essential workers represent 40% of the US workforce, which as of 2018 represents 63% of the US population6 .Since 29% of the US workforce is reportedly equipped to work from home7 , we estimate that 18% of the population is doing essential work.
Estimating M HC,HC .This parameter corresponds to the number of essential workers that an essential worker will have close contact with, on average.This is equal to the number of contacts an essential worker has with co-workers, added to the average number of contacts an essential worker will have with other essential workers in the general population.
For coworker interactions, Rothwell (2020) report that 9.9 of the 13.9 interactions had by essential workers were at their places of work, or 71% of all contacts.This is likely to be an overestimate of coworker interaction, since it includes interactions with low-contact individuals at the essential worker's workplace (e.g.clients at a bank or shoppers at a grocery store).To get a reasonable lower bound, we use the number of workplace contacts reported by essential workers in sectors that are not public-facing, namely construction, agriculture, and communications.Workplace contacts for these sectors range from 6.1-7.4,and so we take 6.8 as a central estimate, or 49% of all contacts.This is likely an underestimate, since it does not take into account high-contact jobs such as those in food processing or assembly lines.
To get M HC,HC , we have to add the number of interactions with coworkers to the number of interactions with essential workers in the general population.To estimate the latter, we take the remaining number of total essential worker contacts (e.g. if we estimate 9.9 coworker contacts, 4.0 remain) and use population proportions to estimate how many of them are with other essential workers (e.g.0.18 • 4.0 = 0.7).
In the end, we give a central estimate of 9.35 for M HC,HC based on the range [8.1, 10.6].
Estimating M HC,LC , M LC,HC , and M LC,LC .Once we obtain an estimate of M HC,HC , the remaining parameters can be straightforwardly deduced.First, M LC,HC is the number of total essential worker contacts minus M HC,HC .This entry uniquely determines M HC,LC , since contact is symmetric and thus w HC • M LC,HC = (1 − w HC ) • M HC,LC , where w HC is the proportion of essential workers in the population.After solving for M HC,LC , the final entry M LC,LC is obtained by subtracting M HC,LC from the total number of contacts reported by individuals in the LC cluster.

Figure 2 :
Figure 2: Estimation of R h from annotated contact tracing data.Nodes represent positive cases and their horizontal position indicates confirmation date.We consider the subset of the graph containing cases with confirmation date prior to cut-off date: target nodes are infected by sources.R can be estimated as R = #targets #sources and hence R = 13/7 for this cluster.There are 3 distinct households in the cluster and thick borders denote secondary household infections.The household reproductive number is R h = 5/7.

Figure 3 :
Figure3: Daily aggregate estimates of effective and household reproduction number in Singapore based on contact tracing data.We observe that in early phases of the epidemic R ef f was greater than 1.With increased public awareness and contact tracing efforts the effective reproduction number has decreased steadily through March 15th.Throughout this time household transmission stayed constant, with R h values in the 0.2 − 0.3 range.The ratio of infections attributable to households decreased sharply at the end of March due to large outbreaks in migrant worker dormitories.Even though their infections are not annotated as households, this indicates that co-habitation and proximity play a large role in transmission dynamics.

Figure 4 :
Figure4: Estimated values of the reproduction number R pre-and post-lockdown in a subset of US states (top) and other countries (bottom).The growth rate was estimated by fitting an overdispersed Poisson distribution onto daily death statistics, as described inYadlowsky et al. (2020).This was translated into a reproduction number R via the generation time distribution (c.f.(Ferretti et al., 2020)).95% confidence intervals are shown.

Figure 5 :
Figure 5: Left: Reproduction numbers for community transmission (R c ) and intra-household transmission (R h ) for the regions whose R values are shown in Figure 4.The overlaid contour plot shows level sets of the overall reproduction number R = R h + R c .Right: Estimated share of transmission attributable to household infections (R h /R).In both graphs we assume R h = 0.3 pre-lockdown-to obtain post-lockdown R h we multiply by a mobility factor M , from Google's estimates of average time spent in residential areas.

Figure 6 :Figure 7 :
Figure6: Contact matrix estimates for the United States using data from(Rothwell, 2020).Derivation can be found in the Appendix.

Figure 8 :
Figure 8: Daily aggregate estimates of effective and household reproduction number in Singapore for cutoff dates ranging from the begining of epidemic to April 7th.

Figure 10 :
Figure 10: Samples from the posterior distribution of SAR and R h .
(see Section 2.2.3).Additionally, we have assembled our own dataset based on contact tracing data from Singapore, which we use to estimate R h (see Section 2.2.2).

Table 1 :
Estimates of household transmission quantities.Literature estimates were corrected for FNR and AR, whenever appropriate, and pooled via a hierarchical Bayesian model.Vo' and Singapore estimates are based on original analysis of the respective datasets.The R h range for Singapore is the range of central estimates for different cut-off dates.The R h interval for Vo' is the confidence intervals derived from normal approximations.We do not report confidence intervals for CRI.
h estimate.See Table A.1 for breakdown of each study and Table 4 for study-level estimates with and without corrections.See Figure 10 for histograms of posterior samples.Meta-analysis of 9 studies sd(SAR) sd(R h ) 0.17 (0.09-0.27) 0.15 (0.09-0.23)Central estimate and 95% credible intervals over standard deviation of SAR and R h estimate.

Table 2 :
Estimates of CRI and R h and SAR assuming different FNR values for Vo' data.

Table 3 :
Case counts for index cases, household contacts and household infections.

Table 4 :
Estimates of household transmission quantities.Literature estimates were corrected for FNR and AR.The case counts are taken from the cited studies (see TableA.1).The credible intervals for the SAR and R h estimates are obtained via MCMC sampling.