Genetic structure of SARS-CoV-2 in Western Germany reflects clonal superspreading and multiple independent introduction events

We whole-genome sequenced 55 SARS-CoV-2 isolates from Western Germany and investigated the genetic structure of SARS-CoV-2 outbreaks in the Heinsberg district and Dusseldorf. While the genetic structure of the Heinsberg outbreak indicates a clonal origin, reflective of superspreading dynamics during the carnival season, distinct viral strains are circulating in Dusseldorf, reflecting the city's international links. Limited detection of Heinsberg strains in the Dusseldorf area despite geographical proximity may reflect efficient containment and contact tracing efforts.

Since its emergence in the Chinese city of Wuhan in late 2019, severe acute respiratory coronavirus 2 (SARS-CoV-2) has infected more than 2 million individuals and led to more than 130,000 deaths worldwide 1 . More than 10,000 globally sourced SARS-CoV-2 genomes are publicly available, and powerful data sharing and analysis platforms like GISAID 2 and Nextstrain 3 enable the collaborative analysis of viral population structure on a global level. Additional insights into transmission dynamics can be gained from focused investigations of individual outbreaks and by integrating genomic data with classical epidemiology.
Here we report on the genetic structure of SARS-CoV-2 in North-Rhine Westphalia, Germany's most populous state. Our analysis includes the "Heinsberg outbreak" -comprising a superspreading event at a carnival session in Gangelt, a small municipality of about 12,000 inhabitants on the border between Germany and the Netherlands -and subsequent outbreak dynamics in the state capital Düsseldorf, located 70km from Gangelt and an international economic and air travel hub of about 600,000 inhabitants.

SARS-CoV-2 Genome Sequencing
The Institute of Virology at Düsseldorf University Hospital was one of the first labs to offer SARS-CoV-2 diagnostics in Western Germany. 55 SARS-CoV-2 isolate samples, 10 directly linked to the Heinsberg outbreak (obtained from medical practices in the Heinsberg district or from residents of Heinsberg district patients treated at Düsseldorf University Hospital) and 45 from the city of Düsseldorf and surrounding districts, were acquired from diagnostic swabs sent to the Institute of Virology at Düsseldorf University Hospital. RNA extraction and reverse transcription were carried out as previously described 4 . DNA amplification and sequencing on the Oxford Nanopore platform were carried out according to the Artic protocol 5,6 (Supplementary Text), yielding between 31 and 582Mb of raw sequencing data per sample (Supplementary Table S1). Bioinformatic analysis was based on the Artic pipelines and additional manual curation was carried out (Supplementary Text), yielding completely resolved genomes with 2 -13 variant positions (Supplementary Table S2) relative to the SARS-CoV-2 reference genome 7 . Of note, we observed evidence for multi-allelic variant positions in 11 of 55 samples (Supplementary Table S2); for one such sample (NRW-39; 13 positions called as multiallelic), PCR was repeated and a separate sequencing run was carried out, confirming the detected multi-allelic variant positions (Supplementary Text). Further work is necessary to investigate whether multi-allelic variant calls represent true within-patient strain variation. In a proof-of-concept experiment, we also successfully sequenced reverse-transcribed viral cDNA from patient material without an intermediate PCR-based amplification step (Supplementary Text), potentially enabling simplified sample preparation and increased read lengths for some samples in the future. Our study was IRB-approved by the ethics committee of the Heinrich Heine University Düsseldorf (#2020-839).

Analysis of the Heinsberg Outbreak
First cases of SARS-CoV-2 infection in Germany were detected in late January 2020 and could be linked to recent travel to Northern Italy and China 8 . On 24 th and 25 th February 2020, however, a married couple from the Heinsberg district with no known travel history to SARS-CoV-2 risk areas were diagnosed with SARS-CoV-2; by 28 th February 2020, the number of confirmed infections in the Heinsberg district had grown to 37; by 22 th April 2020, to >1,700 9 . Contact tracing later showed that many of the early SARS-CoV-2 cases could be linked to a carnival session in the municipality of Gangelt, part of the Heinsberg district, held on 15 th February 2020 8 . The "Heinsberg outbreak" represented one of the first large-scale SARS-CoV-2 outbreaks in Germany, seeded by community transmission and amplified by superspreading-type dynamics. Genomic analysis of 10 SARS-CoV-2 isolates from the Heinsberg outbreak, sampled between 25 th and 28 th February and including the first two diagnosed cases, demonstrated the clonal origin of the outbreak ( Figure 1); all Heinsberg samples shared the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 30, 2020. . Table S2). Viral diversity in the Heinsberg samples varied between 2 and 6 variants relative to the SARS-CoV-2 reference genome 7 , and 5 distinct viral haplotypes could be identified (Supplementary Table S2). An analysis (Supplementary Text) of other publicly available SARS-CoV-2 sequences did not reveal an obvious origin of the Heinsberg outbreak (Supplementary Table S4); the Heinsberg isolates are not related to early sequences from other German outbreak areas (Bavaria, Baden Wuerttemberg), and, despite intense Dutch viral sampling efforts (585 available viral genomes from the Netherlands at the time of analysis), our analysis identified only 2 closely related isolates from the Netherlands (one collected on 21 st March, the other with undefined collection date). The role of the first two patients' short vacation in the Netherlands seven days prior to the Gangelt carnival session 10 thus remains, while suggestive in terms of reported SARS-CoV-2 incubation periods 11 , ambiguous. What is more, large numbers of closely related isolates are circulating in many countries, for example England, Wales, and Iceland (Supplementary Table S4). The small number of stem variants (2) compared to the maximum number of per-isolate variants (6) in the Heinsberg isolates, likely acquired over a period of a few weeks, is compatible with a relatively recent introduction from China.

Düsseldorf Outbreak Dynamics
The first SARS-CoV-2 cases in Düsseldorf, 70km from Gangelt, were diagnosed in early March 2020 12 ; as of 21 st April 2020, the outbreak had grown to more than 900 cases 13 . The set of 55 whole-genomesequenced isolates included 45 samples from Düsseldorf and nearby districts, collected between 3 rd and 23 rd March. A minimum spanning tree analysis of unambiguously resolved viral sequences ( Figure  1) showed that there were at least 5 clusters of viral haplotypes circulating in the Düsseldorf area; the number of variant positions relative to the SARS-CoV-2 reference genome in the Düsseldorf samples varied between 2 and 13 (Supplementary Table 2). Closely related strains (distance 0 or 1) were found . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 30, 2020. . in the United States, the United Kingdom, Australia, and many other countries (Supplementary Table  S4), strongly indicating multiple independent introduction events. Of note, 4 "Düsseldorf area" isolates clustered with the Heinsberg outbreak ( Figure 1); of these, 2 were collected from residents of a district next to Heinsberg treated at Düsseldorf University Hospital, and 2 remained of unclear origin (patient data not available). Thus, there was no evidence for widespread community circulation of Heinsberg-derived SARS-CoV-2 strains in the Düsseldorf area.

Illumina Validation
To verify the accuracy of Nanopore-based viral assembly, additional Illumina sequencing was carried out for the first 11 samples of our cohort (Supplementary Table S1 Table  S3).

Discussion
As SARS-CoV-2 case numbers and the social and economic consequences of social distancing and lockdown measures continue to rise, many countries are facing difficult trade-offs. Improved methods to characterize the dynamics of viral transmission are urgently required. Here we have investigated the genetic structure of two related SARS-CoV-2 outbreaks in Western Germany using Nanopore sequencing, which has additional applications in many fields such as human genetics 15 and microbial metagenomics 16 . We have demonstrated the clonal origin of the Heinsberg outbreak, consistent with existing epidemiological data on a carnival session in Gangelt as the epicentre of the outbreak. The lack of association between the Heinsberg samples and other early German outbreak isolates is suggestive of a separate introduction event, possibly via the Netherlands, China, or a third country. By contrast, SARS-CoV-2 isolates circulating in Düsseldorf are highly polyclonal and can be grouped into at least 5 clusters of viral haplotypes. Despite the geographical proximity between Heinsberg and Düsseldorf, only 4 of 36 unambiguously resolved samples from the Düsseldorf area clustered with the Heinsberg outbreak, and 2 of these were derived from residents of a district neighbouring Heinsberg. Limited detection of Heinsberg strains in the Düsseldorf area may reflect the efficacy of the contact tracing efforts conducted by the German public health authorities; of note, "lockdown"-type restrictions with limits on public gatherings in Germany were only imposed on 23 rd March 2020 17 , i.e. on the day on which the last sample of our study was collected. More extensive sampling of SARS-CoV-2 isolates from Western Germany will be required to investigate the effect of various containment measures on transmission chains at a genomic level. Consistent with reports from Iceland 18 , New York 19 , and data on Nextstrain, our study has demonstrated the simultaneous circulation of distinct viral haplotypes in a metropolitan region; even within the Heinsberg outbreak, we could identify 5 distinct viral haplotypes. Importantly, as SARS-CoV-2 genomes continue to diverge as part of ongoing viral evolution, the application of genomic epidemiology 20,21 for the identification and targeted interruption of viral transmission chains will become increasingly feasible.