Abstract
Objective Establishing a social contact data sharing initiative and an online tool to assess mitigation strategies for COVID-19.
Results Using our online tool and the available social contact data, we illustrate that social distancing could have a considerable impact on reducing transmission for COVID-19. The effect itself depends on assumptions made about disease-specific characteristics and the choice of intervention(s).
Introduction
Given the expanding pandemic of SARS-CoV-2, which causes COVID-19 disease, it is of great importance to consider and plan intervention strategies to slow down SARS-CoV-2 spread, with the aim to flatten the epidemic peak, and thus decrease surge capacity problems arising to health care provision and essential supplies. This may also allow to buy time for interventions, such as specific antivirals (and perhaps vaccines) to become available for widespread use [1]. Social distancing on a large scale, first at the epicentre of the outbreak in Wuhan, and later in other locations was shown to slow down SARS-CoV-2 spread (e.g. in Shanghai [2]).
Social contact surveys have proven to be an invaluable source of information about how people mix in the population [3–5]. They have been shown to explain close contact infectious disease data well [6–8]. During the A(H1N1)v2009 pandemic, contact survey data were used to reproduce the observed incidence patterns of the emerging outbreak [9]. Hens et al. [10] used social contact data collected in the POLYMOD project [4] to quantify the impact of school closure on the spread of airborne infections. This was done by comparing the basic reproduction number, or the average number of secondary infections caused by a single infectious individual in a completely susceptible population, derived using mixing patterns observed on weekends or during a holiday period with those derived using mixing patterns observed on weekdays. By considering mixing patterns at different locations including or excluding the contribution of some of these locations, social distancing measures can be mimicked and their impact on disease spread can be investigated to potentially guide policy makers.
In this research note, we highlight a social contact data sharing initiative we recently launched and present an online tool to facilitate access to these data. We build upon the socialmixr R package [11] and hope to contribute to the analysis of social distancing measures. As a case study, we exploit the tool to quantify the potential impact of school closures and a shift of workers from a common workplace, to teleworking at home.
Data and Methods
Data collection and formatting
The social contact data sharing initiative started under the umbrella of the ERC consolidator grant “TransMID” (Grant number: 682540). Following a systematic review [3], the authors of publications describing different social contact surveys were contacted to share their data subject to ethical approvals and GDPR compliance. These authors were either requested to format their data according to guidelines we developed during a TransMID Social Contact Data Hackaton (on 6 & 7 November 2017), or the data was refactored by TVH and PC.
Each survey is split into multiple files to capture data on participants, contacts, survey days, households and time use. For each data type, there is one “common” file in which variables that are available in most contact surveys are included; and an “extra” file in which more specific variables related to the survey are included. Each data set contains a dictionary to interpret the columns correctly (see socialcontactdata.org for more information).
Social contact rates
To extrapolate survey data to the country level, we apply participant weights to account for age and the day of the week she/he participated. Reference data on demography is based on the United Nation’s World Population Prospects 2015 provided by the wpp2015 package [12]. Weights for type of day account for the proportion of week (5/7) and weekend days (2/7). We constrain weights to a maximum of 3 to limit the influence of single participants. We denote the weight for participant t of age i who was surveyed on day type d ∈ {weekday, weekend}.
The (i, j)th element of the (weighted) social contact matrix mij represents the mean number of contacts with people in age class j during one day reported by a respondent in age class i and can be estimated by: where yijt denotes the reported number of contacts made by participant t of age i with someone of age j.
By nature, contacts are reciprocal and thus mijNi should be equal to mjiNj. Due to differences in reporting, reciprocity needs to be imposed by considering with Ni the population size in age class i [13]. This reciprocal behavior might not be valid for specific contact types, e.g. contacts at work for retail workers are most likely not contacts at work for their customers. Therefore, reciprocity should not always be imposed.
Transmission dynamics
The next generation matrix G with elements gij indicates the average number of secondary infections in age class i through the introduction of a single infectious individual of age class j into a fully susceptible population [14]. The next generation matrix is defined by: with D the mean duration of infectiousness, M the contact matrix and q a proportionality factor [8; 10]. The proportionality factor q can be age-dependent and combines several characteristics that are related to susceptibility and infectiousness. It can also be considered a correction factor expressing to which extent the contact matrix represents a proxy for the circumstances under which transmission between infectious and susceptible persons occurs for the particular pathogen under analysis.
The basic reproduction number R0 can be calculated as the dominant eigenvalue of the next generation matrix. The expected incidence by age is proportional to the leading right eigenvector of G [4].
Interventions
We focus on interventions and how they affect R0 and the relative incidence. By cancelling disease specific features (though these could be readily implemented by allowing the proportionality factor q to be age dependent), we focus on the impact of adjusted social contact patterns only, in line with the so-called social contact hypothesis [6].
To estimate the relative change in R0, we used the R0 ratio: where indices a and b refer to the different conditions. The R0 ratio can be estimated using only social contact rates when assuming q to be constant since the normalizing constants cancel [10]. Under the same condition, the ratio of relative incidences is given by the ratio of normalised right eigenvectors for conditions a and b, respectively.
Based on the reported contact locations, it is possible to exclude or reduce subsets of the social contact data. To do so, contacts at multiple locations were assigned to a single location in the following hierarchical order: (1) contacts at home, (2) contacts at work, (3) contacts at school, (4) contacts during transportation, (5) contacts during leisure activities and (6) contacts in other locations. For example, school closure can be simulated by excluding all contacts that took place telework “at school” before calculating mij. To account for an increase in telework to proportion , we work account for the observed social contacts at work and the observed proportion of telework :
To simulate the effect of telework and school closure, the social contact matrix M is calculated as: work
R-Shiny Application
We used the R package shiny [15] to build an interactive web application to access and visualise the social contact data. This application consists of a user interface (UI) and server script that use data processing algorithms based on the socialmixr package [11]. The UI enables the selection of a country, age categories, type of day (weekday, weekend, holiday, regular), contact duration (<15min, >15min, >1h, >4h), contact intensity (physical or non-physical) and gender (female-male, male-female, male-male, female-female). Using a selection box, the user can opt to disable the assumption of reciprocity and participant weights by age and type of day. Finally, the user can enable reactive strategies such as school closure and increase the level of telework. Please note that the proportion of telework can only increase given a specified observed proportion. The extrapolation of social contact matrices given reductions in telework falls outside the scope of this project.
Based on the selected inputs shown on the left hand side of the screen, the social contact matrix M is plotted on the right hand side. We use a color scale to indicate the number of contacts and superimpose the numeric values to the figure. Below this figure, the principal results of the social contact analysis are printed: the elements of M along with participant info. For reciprocal matrices and/or weighted matrices, the demography data and weights used are also displayed. If reactive strategies are selected, the effect in terms of R0, M and the relative incidence ratios are presented. All results can be downloaded as RData file. Note that we will continue to develop this tool and thus the output/plots/scenarios might change in future editions.
COVID19 case study
We estimate the effect of school closure and telework on disease transmission dynamics. In order to do this, we use 3 age classes: 0–18 years, 19–60 years and over 60 years of age. For each country, we calculate contact rates between each age group after excluding data from holiday periods. We exclude compensation behavior if people do not go to work or school, to simulate quarantine-like scenarios. We fixed the reference proportion of telework at 5%, in line with European observations [Eurostat, 2020; https://ec.europa.eu/eurostat/data/]. We analyse the change in transmission dynamics with 20%, 35% and 50% telework with and without school closure, based on earlier survey-based responses on the possibilities of employees to conduct their work activities remotely as teleworker [16].
Results and Discussion
The http://www.socialcontactdata.org initiative, status 1 March 2020, includes data for Belgium, Finland, Germany, Italy, Luxembourg, Netherlands, Poland and the UK from the POLYMOD study [4], as well as data from further studies of social mixing in France [17], China [18], Hong Kong [19], Peru [20], UK [21], Russia [22], Zimbabwe [23], South Africa and Zambia [24]. All data are available on Zenodo [25–33] and can be retrieved from within R using the socialmixr package. Survey details are provided in the systematic review of Hoang et al [3]. The data sets for France and Zimbabwe contain multiple days per participant, hence we selected the first day for each participant (to minimise the effect of reporting fatigue).
Online tool
The SOcial Contact RATES (Socrates) data tool enables quick and convenient generation of social contact matrices, relevant for the spread of infectious diseases. Figure 1 presents a screenshot of the user interface. The analysis of school closure and increasing the proportion of telework is only one demonstration of the potential uses of this platform. The options and potential of using social contact patterns to simulate infectious disease transmission seem endless, and we hope with this initiative to support data-driven modeling endeavours. We provide the source code via github.com/lwillem/.
COVID19 case study
Figure 2 shows the effect of an increasing proportion of telework by country with and without school closure. For most countries, we predict a 10% decrease in the R0 with a telework proportion of 50%. In some countries, like China, Poland and Hong Kong, the reduction is slightly higher. The analysis for Peru shows little impact of telework. This can be explained by the observation of Grijalva et al that participants reported few contacts at work whereas a substantial proportion of contacts was reported at the market or street. Cultural differences in how “at work” is understood should be taken into account when interpreting results.
The effect of school closure is country-specific, e.g. 10% for Belgium and Vietnam, which appears to be similar in effect size to an increase in telework up to 50%. For other countries, e.g. Italy, Luxembourg and France, we predict school closure to decrease the R0 by 20%.
The relative incidence, as presented in Figure 3, shows the impact of school closure compared to an increase in telework. The predicted relative incidence in people 18-60 years of age decreases with an increasing proportion of telework. That is, this measure provides some protection from exposure, which might be of interest if these age groups are more vulnerable compared to children, as is the case for COVID-19 [34]. The relative incidence in the age group above 60 years of age increases in both situations compared to no intervention. This does not imply that the absolute number of cases in this age group would rise. This only means that the risk of infection in other age groups is more affected by the intervention (which reduces overall incidence) relative to their normal social contact behavior. Given that our intervention measures target only children and the population of working age, this observation is as expected.
Limitations
Most survey designs were derived from the POLYMOD survey design though each survey had additional features and objectives which could provide useful additional information. Therefore tools such as this one do not capture the full potential of each data set separately. The social contact analysis presented here focuses only on adapting school and work contacts. It does not capture compensation behavior due to not being at school or work, nor social distancing due to (pandemic) scares.
Our estimates only account for adapted social contact patterns and do not account for age-specific differences in susceptibility or shedding. For example, assuming susceptibility and infectivity is lower for children, would imply that school closure as an intervention would have less impact whereas tele-work would have a larger impact. Our estimates also do not take into account travel restrictions or cancellation of public events, both of which may well have a large impact.
The application contains a local version of each data set, with some additional data reformatting, though our future aim is to add an option to directly use data from the Zenodo repository. Note that other social contact surveys are available on Zenodo, though we have not yet included those surveys because they have a different set up. For example, in studies from China [18] and the UK [21] groups of contacts instead of unique records were recorded and only infants were recruited, respectively. In contrast, data from Zambia and South Africa had almost no information for individuals aged 0-18 years of age, and data from Zimbabwe did not include location. Therefore these data were omitted here.
Data Availability
All data used in the paper are available on the website: http://www.socialcontactdata.org/ with the detailed links on https://zenodo.org/
Ethics approval and consent to participate
The social contact data sharing initiative is part of the ERC consolidator grant “TransMID” which received ethical approval from the Hasselt University Medical Ethical Committee (CME2016/618)
Availability of data and materials
All data and material is open source.
Competing interests
The authors declare no competing interests.
Funding
This work is part of a project that has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 682540 — TransMID) (TVH, PC and NH). SF was funded by a Wellcome Trust Senior Research Fellowship (210758/Z/18/Z)
Authors’ contributions
NH conceived the study. TVH and PC collected and formatted the social contact data. LW and NH wrote a first draft of the paper. LW, TVH, SF and NH developed the online tool. All authors contributed to the final version of the paper and approved the final manuscript
Abbreviations
- UI
- User Interface
- Socrates
- SOcial Contact RATES