Introduction

SARS-CoV-2 was first identified in Wuhan, China1 in December 2019 and has since been imported into virtually every country and region in the world2,3. Understanding and tracking the sources of importation between countries can give important information for policymakers, and for managing the pandemic, by informing policies aimed at reducing the further spread of virus4. It is particularly important now as countries aim to mitigate the introduction of highly transmissible variants of concern with potentially reduced vaccine efficacy5,6. The available brakes on imported SARS-CoV-2 cases include travel bans, quarantine measures, and testing of returning travellers7. These can apply to all countries or targeted to high-risk countries, for variable durations, and with variable degrees of enforcement.

In England, from 17 March 2020 to 4 July 2020, the government advised against all non-essential travel worldwide8. Between 4 July 2020 and 1 February 2021, travel corridors to countries deemed to be low risk for COVID-19 disease (subject to assessment and change) were established in which returning travellers were no longer required to quarantine for 14 days (at home). Persons returning from countries outside this list (except for exemptions e.g. specific employment) were required to quarantine at home (Fig. 1). This policy aimed to reduce the impact of travel-related SARS-CoV-2 cases in England9 possibly through limiting onwards transmission of SARS-CoV-210 and deterring travel to those countries. Upon identification of an imported case, contact tracing and quarantine/self-isolation measures can limit onwards transmission11. The PHE Isolation Assurance Service identified up to 97% self-reported compliance with travel-specific quarantine12. These data do not include countries exempt from quarantine, contact-tracing data or link to genomic data to evaluate travel-related clusters.

Fig. 1: Case ascertainment and distribution during the study period.
figure 1

a Timeline of the study period (27 May 2020 to 13 September 2020) and associated policy changes on travel introduced in England. Travel-related quarantine measures were assigned on a country by country basis from 4 July 2020. Travellers returning from countries that were on the ‘closed travel corridors list58 were required to quarantine for 14 days (*reduced to 10 days on 15/12/20), or from the 15th December 2020, choose to self-isolate for 5 days and then pay for a SARS-CoV-2 diagnostic test (test and release). b Flow diagram and map of travel-related cases ascertained from Test and Trace data and subsequent genome availability. Cases were defined as ‘highly probable’ and ‘probable’. ‘Highly probable’ travel-related cases were defined as individuals who reported international travel as an activity in the two days before symptom onset/testing. On 12/08/2020 the additional facility to report international travel in the 7 days prior to symptom onset/testing became available, and also included in this study and defined as ‘probable’ travel-related cases. c Flow diagram relaying contacts ascertained of cases from Test and Trace data. d Countries where importations originated. Countries with less than five importations were excluded for confidentiality reasons. e Destinations of imported cases within England. Areas with less than three cases have been excluded. Q/C quality control.

Studies from numerous countries have used genome sequencing to complement epidemiological investigations in order to characterise importations of SARS-CoV-2 (Supplementary Table 1). Primarily these are in-depth case reports on small datasets but demonstrate the utility of genomics combined with contact tracing13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35. Here, we combine contact-tracing data from National Health Service (NHS) Test and Trace (T&T) for probable importation cases with genomes made available through the COVID-19 Genomics UK (COG-UK) consortium36 to characterise the known imported cases of SARS-CoV-2 into England and the effectiveness of 14-day quarantine on onwards transmission.

In this work, we compare the number of contacts reported per case prior to diagnosis between individuals returning from a country with a requirement to quarantine after travel to those who did not need to quarantine on return. We then identify unique genomes from imported cases and associated clusters of infections in the COG-UK genomic surveillance dataset in the four weeks following index case identification. This onward transmission is compared when the index case returned from a country with a requirement to quarantine on return compared to countries without a requirement. Finally, we use the epidemiological data to investigate the origin of a divergent cluster of SARS-CoV-2 cases identified using genomics.

Results

Cases identified

Between 27 May 2020 and 13 September 2020, using contact-tracing data for all individuals who had tested positive for SARS-CoV-2 between those dates, we identified 4207 international travel-related cases in England. These individuals reported a total of 18,856 contacts. During this period, we identified 105,794 non-travel related cases that reported 233,182 contacts.

From the travel-related cases, 888 sequenced genomes were available for comparison to all UK genomic data (see Fig. 1 and Methods for details of case definition and identification, Supplementary Tables 2 and 3 show the case characteristics). Sequencing of community and hospital cases across the UK was carried out with the aim of providing approximately equal geographical coverage, as much as possible.

Return from European countries accounted for 85.9% (3612/4207) of travel-related cases; 51.2% (2155/4207) had visited one of Greece (21.0%, 882/4207), Croatia (16.3%, 685/4207) or Spain (14.0%,589/4207) (Fig. 1 and Supplementary Table 4). For 284 cases the country of travel was unclear or unknown. Travel restrictions were first eased on 04/07/2020; only 2.9% of travel-related cases identified in this study were recorded before this date. For the countries associated with the highest numbers of imports, the number of cases per day imported from each country along with the timing of travel restriction to that country is shown in Fig. 2. Geographical variations in imported cases across England were apparent, with the greatest number (28.6%, 1205/4207) in Greater London (representing ~15% of the population of England) (Fig. 1 and Supplementary Table 3).

Fig. 2: Frequency of importations overtime for the top 4 most common countries of travel reported by individuals testing positive for SARS-CoV-2 during the study period.
figure 2

ad SARS-CoV-2 case numbers in returning travellers by the four most popular countries of travel reported by cases representing 2379/4207 (56.5%) of known travel-related cases. The light-shaded areas represent the period of time when the countries had an open ‘travel-corridor’ so did not have mandatory 14-day quarantine on return in place.

Contacts per case

The median number of reported contacts per travel-associated case was 3 (IQR 1–5), with 22% reporting no contacts, while some individuals reported a very large number of contacts, 9% reported more than 10, 3% more than 20, 0.4% more than 50, with a maximum of 172.

Of the imported cases, 2010 were imported from a country with a quarantine requirement at the time of return, whereas 1900 were not required to quarantine on return. For 297 cases quarantine status could not be determined. The number of contacts was higher for cases without a travel restriction (mean = 6.0, median = 3, IQR = 1–7) compared to cases with a travel restriction (mean = 3.0, median = 2, IQR = 1–4).

Using a negative binomial regression model, after adjusting for potential confounding factors of age, sex, date of the test, destination, and ethnicity, travelling from a country requiring quarantine on return was associated with an estimated reduction in the number of contacts of 40% (rate ratio (R.R.) = 0.60, 95% CI = 0.37–0.95; p = 0.03). Statistical modelling is fully described in the “Methods” section. Using this model, the estimated marginal mean number of reported contacts (adjusting and averaging over all covariates; age, sex, date of the test, destination, and ethnicity) was 5.85 (95% CI = 3.7–9.3) when no quarantine was required compared to 3.50 (95% CI = 3.0–4.0) when travellers were required to quarantine. To address possible bias from a small number of cases with a large number of contacts, we recorded (top-coded) all cases with more than 10 contacts as corresponding to 10; the estimated rate ratio was slightly attenuated to 0.68 (0.48–0.98; p = 0.036) for the number of contacts per case for individuals travelling from a country requiring quarantine on return compared to those not requiring quarantine.

The number of contacts per case varied significantly with age group and over time (Fig. 3 and Supplementary Table 6). The number of contacts per case was greatest in the 16–20 age group who travelled to countries with no requirement for quarantine, with a marginal mean of 9.0 (95% CI = 5.6–14.5) but reduced to 4.7 (95% CI = 3.9–5.7) when quarantine was required—similar to other age-groups.

Fig. 3: The effect of travel restriction (14-day quarantine) on contacts per imported case of SARS-CoV-2.
figure 3

The estimated marginal mean number of contacts per imported case a overall, b by age-group and c by date of test comparing countries with travel restriction guidance (closed ‘travel-corridors’) in place and those without (open ‘travel-corridors’). All points are estimated marginal means and are provided with 95% confidence intervals.

After adjusting for all other covariates the reported numbers of contacts per imported case was lower in September compared to May, June and July, whether or not a requirement to quarantine was in place. Following this observation of reduced contacts over time among travel-related cases, we compared this to the number of reported contacts over time among the remaining population who did not travel to ensure this was not a general trend due to other COVD-19 measures. Among 105794 cases recorded in our study period that were not associated with travel, 28,564 (27%) were excluded due to poor data quality. Among the remaining 77,230 cases we did not find a corresponding decrease over time in the number of contacts, with a mean of 1.6 contacts per case in May/June 2020 and around 2.3 contacts per case for the remaining period.

Onward transmission and genomic analysis

We next sought to quantify onward transmission from an imported case using genomics. High-quality sequencing data were available for 827/4207 (19.7%) cases (Fig. 1) and demographics of the sequenced cases were broadly similar to the entire travel-related cohort (Supplementary Table 3).

Important genomes and onward transmission

To monitor onward spread we identified 186/827 (22.4%) imported cases with SARS-CoV-2 genomes that were sufficiently unique, as defined by their status as extinct or genetically distinct within a sub-lineage (see Methods). Of these, 146/186 isolates had not been sampled in the entire UK dataset in the 4 weeks prior, while a further 40 isolates were more than 3 single nucleotide polymorphisms (SNPs) to their closest matching sequence in the existing UK dataset, both suggesting genuinely new importations of this genotype.

Using an SNP–matrix of imported genomes and associated travel-related metadata, we defined the number of importation events per imported genotype; these ranged from 1 to 39. The majority (119/186; 64%) of genomes were identified only once in imported cases, with 33 (18%) identified twice, 22 (12%) between 3 and 10 times and the remaining 12 (6%) between 11 and the maximum of the observed number of 39.

To compare the effect of the requirement to quarantine on the subsequent spread of likely imported cases, the entire COG-UK dataset was interrogated to identify isolates within 2 SNPs of these distinct imported cases identified up to 4 weeks after the index importation case. There was variation in the number of subsequent (up to 4 weeks later) cases matching each genome (median 0; range 0–210, IQR 0–1). The majority of genomes 125/186 = 67% were not linked to any subsequent cases, 17 (9.1%) and 8 (4.3%) were linked to one or two cases, with the remaining 36 (19%) being matched to larger numbers of subsequent cases, with a small number of imported cases corresponding to large numbers of subsequent cases, including 6 that matched to at least 50 later cases.

Association between travel restriction and onward transmission

To explore the association between onward transmission and travel restriction we first excluded cases returning before 14 July 2021 to ensure the time periods between index cases with and without a travel restriction overlapped. The proportions of imported genomes matching any subsequent detected case and the number of new cases where at least one is detected in this group are shown in Fig. 4. Overall, 56/168 (33%) of genomes from cases that were genetically unique were detected in at least one subsequent case within the subsequent four weeks. Among genomes identified from a country where quarantine requirement was in place, 25% of (20/81) were detected in at least one subsequent case, compared to 41% (29/71) when cases were imported from a country without a requirement to quarantine (Fig. 4a). The destination country for 16 index cases was unknown.

Fig. 4: The effect of travel restrictions (14-day quarantine) on the subsequent spread of likely imported cases as determined by genomics.
figure 4

a The proportion of imported cases with any matching genome detected over the four weeks following index test result. Error bars correspond to bootstrapped confidence intervals, n = 81 and 71. b The number of genomes matching the index case, with zeros excluded. The midline of the boxplot represents the median value; the lower limit of the box represents the first quartile (25th percentile), and the upper limit of the box represents the third quartile (75th percentile); the whiskers (upper and lower) extend to the largest and smallest value from the box, no further than 1.5*IQR from the box. a, b Compare countries with travel restriction guidance (closed ‘travel-corridors’) in place and those without (open ‘travel-corridors’). c The number of cases and number of importation events for each imported genome, stratified by whether the index case had returned from a country with a travel restriction in place.

The number of subsequent cases detected during the 4 weeks since the unique index case increased from a mean of 1.2 new cases when quarantine was required to 11.3 cases where there was no requirement, mainly driven by the fact that all of the nine genomes which went on to match more than 20 subsequent cases had an index case returning from a country without travel restriction (Fig. 4b). However this difference can be explained entirely by the number of importations for each genome; these genomes all had high numbers of independent importations, and genomes with a high number of importations always had an index case returning from countries with no travel restriction in place at the time (Fig. 4c).

To test the statistical significance of observed effects, a series of negative binomial regression models were fitted. In the four weeks following the index case, fewer genomically linked cases were reported when the index case was imported from a country with a requirement to quarantine compared to cases from a country with no requirements (unadjusted R.R. = 0.11, 95% CI = 0.04–0.28), but this effect was entirely explained when the number of imported cases for each genome is included as an ‘offset’ in the model and adjusting for the date (R.R = 0.83, 95% CI = 0.35–1.92; p = 0.655).

There was some evidence that imported cases with higher numbers of contacts for the index case gave rise to more cases in the subsequent month, however, the number of contacts was only known for the index case. This effect is also explained by the number of importations; there is a positive correlation (Spearman’s rho = 0.18, p = 0.018) between the number of contacts reported by the index case and the number of independent importations of each genome (Supplementary Fig. 1), and in regression models, once we adjust for the number of importations there is no evidence for an association between a number of contacts of the index case and subsequent linked cases. However, it is possible that the number of contacts of other imported cases in each group remains an important factor.

Genomic identification of a large imported cluster

In order to demonstrate the utility of genomics in identifying a probable travel-related cluster of SARS-CoV-2 cases in the England de novo, we ran the Polecat Clustering tool (https://cog-uk.github.io/polecat) on 14 September 2020 (including SARS-CoV-2 cases in the COG-UK dataset up to this date). An outlier cluster was observed (Supplementary Fig. 4). This cluster (UK1897) was associated with high diversity with a long stem length compared to samples from the UK, suggesting that this lineage evolved outside the UK. The geographic distribution of this lineage is demonstrated in Supplementary Fig. 5, likely representing multiple importations into the UK. This cluster contained the D614G mutation but no others associated with increased transmission. The root of the cluster was associated with a Swiss phylotype when linked to data in GISAID. During the course of the study period (4 August 2020 to 14 September 2020), there were 304 genomes corresponding to this cluster. These could be linked to 238 individuals, of whom 159 could be linked to a contact-tracing record. Out of 159, 143 had contact-tracing information indicating international travel or not. Out of 143, 72 (50.3%) individuals had recently returned from abroad and were associated with, 10 dispersed European countries (4 individuals had traveled to more than 1 European Country) and most commonly Croatia (35/72, 48.6%) (Supplementary Fig. 6). A further four cases were identified as contacts of individuals who had reported travel to mainland Europe. There is a trend towards an increased proportion of cases that do not report travel over time, and possibly representing dispersion and onwards transmission locally of this lineage (Supplementary Fig. 7).

Characteristics of imported genomes

The 827 imported genomes reflected 238 UK lineages (see Supplementary Materials), of which 214 were seen fewer than 5 times (142 singletons) and 24 were seen 5 or more times (Supplementary Table 8). The most commonly observed were UK5 (152 genomes, 18.4%) and UK1897 (73 genomes, 8.8%). There were 39 global lineages within the genomes. The most commonly observed lineages were B.1.1 (159 genomes, 19.2%) and B.1.177 (128 genomes, 15.5%) (Supplementary Tables 9 and 10). Potentially functionally important mutations were also identified (Supplementary Table 11 and Supplementary Fig. 8): D614G, 824/827 (99.6%) cases; N439K, 65/827 (7.86%) of cases; A222V, 131/827 (15.84%) of cases. ΔH69/V70 was identified in 53 cases associated with lineage B.1.258. We evaluated the introduction of A222V (B.1.177) over time, demonstrating a clear epidemiological link to Spain through contact tracing (Supplementary Fig. 10). By the end of the study period, this variant was introduced from 16 separate countries indicating dispersion across Europe (Supplementary Fig. 11) corroborating findings by Hodcroft et al.37. The mutations co-occur, with the proportion of cases represented by these combinations varying over time (Supplementary Fig. 12).

Discussion

Here, we provide evidence, through the analysis of both contact-tracing data and the use of genomics, that a mandatory 14-day quarantine was associated with fewer contacts for returning travellers with SARS-COV-2, and less onward transmission of imported cases, with the reduced transmission, likely mediated through fewer individual importations of each genome. From 27 May 2020 to 13 September 2020, 85.9% of importations of SARS-COV-2 into England were from European countries with three countries, Greece, Croatia, and Spain, accounting for 51.2% of all imported cases. Along with the requirement to quarantine or not, age was a significant determinant of onwards contacts, with younger age groups reporting more contacts, but the effect of a requirement to quarantine on the number of contacts was observed across all age groups. We have shown that after a period of national lockdown, systematic monitoring of imported genomes can identify sequences that are sufficiently unique and provide utility for monitoring onwards transmission.

Whilst the study period covers nearly 5 months, the importations were concentrated after the implementation of travel corridors; prior to this date travel was not advised38. The peaks for imports for each country occur at different times and with different epidemic curves, likely affected by both patterns of travel as well as the prevalence of disease and local regulations within each country. For the most common destinations, barring Spain, imported cases appear to reduce after the closing of a travel corridor and subsequent requirement to quarantine. The majority of importations from Greece came at the end of August and continued into September; there was no requirement to quarantine for travellers returning from Greece during this time period and it was the source of the greatest imported SARS-CoV-2 cases during the study period. This highlights the need for active surveillance of imported cases of SARS-CoV-2 for the introduction of requirements to quarantine in a timely manner. London accounts for 15.4% of the population in England39 and observed 11.4% (12011/105794) SARS-CoV-2 infections during the study period; the region however accounted for 28.6% of imported SARS-CoV-2 cases, possibly reflecting a younger and more diverse demographic with cultural/family links abroad, and with a concentration of international businesses and airports. The reported effective reproduction number (Rε) in London had a minimum lower-bound value of 0.6 and an upper-bound value of 1.3 during the study period which was comparably similar to the respective values of 0.7 and 1.2 observed for England40. This potentially indicates imports are unlikely to have had a substantial impact on onward infection rates in this region.

The number of onwards contacts were significantly lower when the traveller was required to quarantine highlighting the effectiveness of this policy. Age was also a significant determinant of onwards contacts, with the 16–20-year-old age group representing the greatest number of travel-related cases and the greatest number of onwards contacts per case. This identifies an opportunity to direct public health awareness campaigns to younger travellers, with the intention to promote behaviours that will reduce the risk of SARS-CoV-2 acquisition and enhance compliance with a quarantine on return.

We observed a reduction in the number of contacts per case over the Summer 2020 period that appeared specific to the travel-relatedSARS-CoV-2 cases as it was not replicated in the non-travel related cases. There was also no apparent change in successful contact tracing to explain this difference (Supplementary Fig. 13). We speculate this observation may be related to a change in traveller behaviour (e.g. due to rising cases in the England or destination countries) or changes in types of traveller (e.g. travellers visiting family versus those for occupational reasons or those with dependents versus those without). Understanding these temporal changes in traveller behaviour requires formal investigation with more detailed epidemiological information available and replication in other countries. More broadly, we do find non-travellers reported fewer contacts per case when compared to travellers which may reflect a more sociable cohort, and therefore one that may benefit from targeted public health messaging to reduce transmission risk.

The use of genomic sequencing allowed the identification of a cohort of unique genomes that could be monitored for cluster growth. The cluster size for genomes that were imported from a country without a requirement to quarantine on return was significantly higher than those related to countries with mandatory quarantine in place providing further assurance on the effectiveness of quarantine policy on reducing travel-relatedSARS-CoV-2 cases. This finding was explained by several large clusters, all of which came from countries with no quarantine requirement at the time and with high numbers of individual importation events. With the number of importation events per cluster taken into account, we did not observe an effect of quarantine on subsequent cluster size suggesting the largest effect of travel-related quarantine is through a net reduction in travel-related importations.

The Polecat Clustering Tool (https://cog-uk.github.io/polecat) highlighted a large cluster that developed largely through travel to Croatia. This analysis shows that programmatic analysis of genomics data can identify putative importation clusters. Integration with contact-tracing information was vital for the true picture of the sources of introduction and the subsequent spread, due to the bias of SARS-CoV-2 sequencing globally41. In this instance, an introduced lineage was associated with widespread dispersal and onward transmission during a period when England had limited social distancing measures42,43. The lineage, B.1.160, associated with this cluster is not associated with increased transmissibility but this method for the detection of expanding imported clusters could be useful for the investigation of newly introduced variants of concern.

Our study has several important limitations. The COG-UK dataset has a limited sequencing coverage across England meaning cluster sizes detected will under-estimate absolute numbers and there is a possibility of unsampled transmission chains despite our use of four SNPs to identify divergent clusters (Supplementary Fig. 14). The earliest reliable data available to identify if individuals were required to quarantine on returning to England from travel abroad was the date of case sampling. Our study evaluates a period of time following a national lockdown with highly restricted movement across borders which likely exaggerated the diversity of imported genomes compared to lineages circulating in England. Additionally, the quarantine guidance at this time was of 14 days; shorter periods may be as efficacious and/or when combined with testing44. Outcomes such as travel and the number of contacts are self-reported; reporting bias is mitigated through mandatory completion of a passenger locator form to assist identification of returning travellers, while travel-related cases are seen as higher risk and therefore referred to local public health agencies for targeted contact tracing. For genomic analysis, only the destination country and number of contacts for the index case in each cluster are used, irrespective of the number of imports. Finally, there will be an artificial reduction in cases at the end of the study period when accounting for the case incubation period, testing and report, with data provided 3 days after study close.

Overall, we present an integrated epidemiological and genomic evaluation of the largest dataset of confirmed SARS-CoV-2 imported cases into England (or any other country) to our knowledge. This study provides evidence for the effectiveness of 14-day quarantine in reducing contacts, and reducing, but not entirely preventing onward transmission of imported cases, through reducing the number of importations. Our data highlights the possibility of targeted public health campaigns to reduce SARS-CoV-2 importations and onwards transmission. In conclusion, this study demonstrates how routine genomic epidemiology of travel-related cases could be used to monitor SARS-CoV-2 import cases to enable rapid refinement of travel policies.

Methods

Ethics

The COG-UK study protocol was approved by the Public Health England Research Ethics Governance Group (reference: R&D NR0195). Public Health England affiliated authors had access to identifiable Cambridgeshire community case data. This data was processed under Regulation 3 of The Health Service (Control of Patient Information) Regulations 2002—permitting the processing of confidential patient information for communicable disease and other risks to public health and as such, individual patient consent is not required. Other authors only had access to anonymised or summarised data.

Contact tracing and case identification

Contact-tracing data was obtained from Test and Trace (T&T). All cases and contacts had a field for demographic data, but this was not always reported (Supplementary Tables 3 and 4). ‘Highly probable’ travel-related cases were defined as individuals who reported international travel as an activity in the two days before symptom onset/testing. On 12/08/2020 the additional facility to report international travel in the seven days prior to symptom onset/testing became available, and also included in this study and defined as ‘probable’ travel-related cases.

Cases were asked to provide details of all contacts for activities in the 2 days prior to symptom-onset/testing (whichever is earliest) up to the time of completing the system in which contacts were gathered. Though there can be a discrepancy in the time taken for individuals to complete the contact tracing system, this is not expected to result in a material change in contact numbers as they are expected to be self-isolating after symptom-onset or testing positive. If any contacts become cases they would then also be included in T&T data as a case separately, but if they did not report direct travel themselves, then they would not meet the definition for a travel-associated case.

A contact is defined as an individual who a case has had face-to-face contact with (less than 1 metre away), spent more than 15 min within 2 m of, travelling in a car or other small vehicle with, or sat close to them on a plane

Case identification from T&T data

Data included free-text destination city or country. A free text country and city search with a custom python script on travel-related T&T were used to identify destination country. Results and remaining entries were manually checked and corrected (see Supplementary materials for more details).

Requirement for quarantine

Persons returning from countries where travel-related quarantine was not mandated (except for exemptions e.g. specific employment) were required to quarantine for 14 days at home. The package of measures used to help enforce this requirement included the need to complete a Passenger Locator Form prior to arrival in England, spot checks by the PHE Isolation Assurance Service, and referral to the police through the Border Force Criminal Justice Unit who may issue fixed penalty notices.

Clinical samples, genome sequencing and quality control

Clinical samples were collected passively as part of national SARS-CoV-2 testing. This included both community testing through lighthouse labs (satellite SARS-CoV-2 testing laboratories) and testing through hospital diagnostic labs. Samples were sequenced at one of seventeen COG-UK sequencing sites (Fig. 1). The samples were prepared for sequencing using either the ARTIC45 or veSeq46 protocols and were sequenced using Illumina or Oxford Nanopore platforms. All samples were uploaded to and processed through COVID-CLIMB pipelines47,48. Genomes were aligned to the Wuhan Hu-1 reference genome (Genbank accession code: MN908947.3). Genomes that contained more than 10% missing data were excluded from further analysis to ensure high-quality phylogenetic analysis.

Lineages and minor variants

Global and UK Lineages49 were assigned to each genome using Pangolin (https://github.com/cov-lineages/pangolin) with analysis performed on COVID-CLIMB48. Minor variants were pre-defined within the COG-UK database using type_variants (https://github.com/cov-ert/type_variants).

Identification of extinct and unique genomes

The 827 high-quality travel-related genomes were compared to the COG-UK dataset on 16/10/2020. Genomes were only compared to other genomes with the same UK lineage assigned by COG-UK since we assume that no relatedness relevant to transmission exists between genomes of different UK lineages. A unique genome in the community was deemed to be one that was known to be from a travel-related case and that was either: a UK lineage that had not been sampled in the previous 4 weeks in the UK or was more than 3 SNPs distance to the closest relative in the COG-UK dataset.

Within the same UK lineage, we then identified those genomes sampled within 4 weeks prior to the genome of interest. We determined the SNP distance between the sequence of interest and these genomes. Unique genomes were compared to sequences that were generated in the COG-UK dataset within 2 and 4 weeks after their sampling date, to identify samples with the same UK lineage and within 2 SNPs. These would represent onward transmission or further introductions of similar genomes. The analysis was run with an in-house custom Python script developed by US and RM. Further detail in Supplementary materials.

Identification of multiple introductions of a unique genome

We combined the available travel-related epidemiological data and genomic data to identify the number of importations representing the travel-related clusters generated above. We used SNP-dist (version0.7.0) (https://github.com/tseemann/snp-dists) to identify the SNP distances between an alignment 827 high-quality imported genomes. These genomes were aligned with MAFFT (version7.471)50, outside of CLIMB-COVID pipelines, with minor differences in SNP differences to the entire COG-UK alignment expected. We then identified imported genomes that were within 2 SNPs of the 186 unique imported genomes in the 4 weeks subsequent to the unique imported case being sampled. This represented the number of importations of that genome in the 4-week period of interest corresponding to each unique travel-related cluster identified in the analysis above.

Identification of a travel-related SARS-CoV-2 cluster

We used the Polecat clustering tool (https://cog-uk.github.io/polecat) to systematically identify outliers in COG-UK genomic dataset and link to contact-tracing data.

Statistical analysis

Statistical models for the number of contacts per case and the number of onward cases per imported genotype were estimated using the glmmTMB package (version 1.0.1)51 with marginal means and effects calculated using the emmeans package (1.5.2-1)52 for R (version 4.02)53. Figures were generated using R (version 4.0.2) and Microsoft Excel (version 1908).

The number of contacts per case was modelled using negative binomial regression, to estimate the effect of travel-related quarantine, and whether this varied by age group, sex of the index case and calendar date. Travel destination and ethnic group were included as random effects. Negative binomial regression models were also used for the number of onward cases per imported genome, using calendar date as a covariate and the natural log of the number of importation events for each genome as an offset variable. Model validation was performed by simulation from estimated models and comparing the distribution of observed and modelled outcomes. For the contacts modelling the initial negative binomial regression model did not reflect the number of large outliers seen in the original data, therefore additional models were estimated using data with these observations dropped or top-coded to check that the reported estimates of effects were not driven by these observations. For the genomic data, the simulated and observed distributions were closely aligned.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.