Spatial Visualization of Cluster-Specific COVID-19 Transmission Network in South Korea During the Early Epidemic Phase

Background Coronavirus disease 2019 (COVID-19) has been rapidly spreading throughout China and other countries including South Korea. As of March 12, 2020, a total number of 7,869 cases and 66 deaths had been documented in South Korea. Although the first confirmed case in South Korea was identified on January 20, 2020, the number of confirmed cases showed a rapid growth on February 19, 2020 with a total number of 1,261 cases with 12 deaths based on the Korea Centers for Disease Control and Prevention (KCDC). Method Using the data of confirmed cases of COVID-19 in South Korea that are publicly available from the KCDC, this paper aims to create spatial visualizations of COVID-19 transmission between January 20, 2020 and February 19, 2020. Results Using spatial visualization, this paper identified two early transmission clusters in South Korea (Daegu cluster and capital area cluster). Using a degree-weighted centrality measure, this paper proposes potential super-spreaders of the virus in the visualized clusters. Conclusion Compared to various epidemiological measures such as the basic reproduction number, spatial visualizations of the cluster-specific transmission networks and the proposed centrality measure may be more useful to characterize super-spreaders and the spread of the virus especially in the early epidemic phase.


Introduction
The first pneumonia cases of unknown origin were identified in Wuhan in early December 2019. [1].
Since then, coronavirus disease 2019 (COVID-19) has been rapidly spreading throughout China and other countries including South Korea. As of March 17, 2020, a total of 198,181 laboratory-confirmed cases had been documented globally with 7,965 deaths. The World Health Organization (WHO) has declared COVID-19 an international public health concern. [2] The confirmed patients in South Korea had either visited or came from China. Secondary and tertiary transmissions have occurred since then, which have led to an accelerating rate of transmission in South Korea. As of March 17, 2020, a total number of 8,320 cases and 81 deaths had been documented in South Korea.

Method
With the launch of COVID-19 data hub, officials from the White House and other national organizations issued a call to action for researchers in a multitude of disciplines such as computer science, epidemiology, economics, and statistics. Open access data such as epidemiological data, interactive webbased dashboards, and descriptive statistics have informed many about the current state of the pandemic [3,4]. With a concomitant effort to combat the virus and to better understand virus etiologies, Korea Centers for Disease Control and Prevention (KCDC), an organization under the South Korean Ministry of Welfare and Health, has made many datasets available online that are unique to COVID-19 confirmed South Korea cases [5] The datasets only include confirmed COVID-19 patients with unique numeric patient identifiers, geographical data, and infection information if available. In an epidemiological dataset, they released the region of the affected patient, the identifier of the person who infected the patient, and the number of contacts with other people. The aim of this report is to create spatial visualizations of early COVID-19 transmission networks in South Korea using these data, which may indicate transmission patterns for each network.
The time series data of COVID-19 status in South Korea is analyzed to provide updated statistics. Using a spatial visualization of confirmed patients during an early epidemic phase, two major clusters are identified. As of March 12, 7,869 positive cases had been documented in South Korea, and 70 positive cases have information of the identifiers of who infected them. Although the first confirmed case in South Korea was identified on January 20, 2020, the number of confirmed cases showed a rapid growth on February 19, 2020 with a total number of 1,261 cases with 12 deaths based on the KCDC. [6] As of March, newly reported cases in South Korea show that the numbers of positive cases and deaths seem to be declining and new cases remain within known clusters. Therefore, identifying early clusters and examining the confirmed cases in these early clusters, from January 20, 2020 to February 19, 2020 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint are crucial because these clusters remain the longest lasting sources of transmission. Out of 70 patients, only a subset of patients infected from confirmed cases from an early epidemic phase (January 20, 2020 to February 19, 2020) is used to create the network from the epidemiological data to further visualize the transmission networks of these two clusters. All the analysis and visualizations are performed using the ggplot2 software in R [7] as well as Cytoscape [8].

Results
The time series data contains both overall statistics such as the number of tests as well as geographical data within South Korea from January 20, 2020 to March 12, 2020. Figure 1 shows the time series data of the cumulative COVID-19 statistics from January 20, 2020 to March 12, 2020. Since early February, there has been an exponential increase in the number of tested cases where most of them were tested negative. Out of 67 patients that were confirmed positive in South Korea as of February 19, 2020, the geographical data of 56 them are available, which allows a visualization of their route patterns. Figure 2 depicts the spatial distribution of the two large clusters of COVID-19 as of February . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.  In this graph, the 6th case made physical contact with 17 different individuals and resulted in four new cases. The 6th case is unlikely to be a super-spreader given a low number of physical contacts with other individuals before being treated. Although this cluster represents the largest connected component from the entire visualized infection network, it is reported that no further cases have been added in this cluster since February 21, 2020. [9] . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint There are also a number of different mathematical models that can be used to calculate the reproduction number under different probability distributions, which further complicates the interpretability. Instead, visualizing the transmission networks could be useful to understand the spread of the virus.

Discussion
What happened in China shows that quarantine, social distancing, and isolation of infected populations may be able to contain the epidemic. [11]. This is encouraging for the many countries where COVID-19 is beginning to spread. South Korea once had the fastest growing rate of infection outside of China.
Korea's confirmed cases have risen rapidly since the identification of the super node in the Daegu cluster since late February. Since then, the country has shown success in its mitigation efforts in both the number of newly confirmed cases and deaths. The majority of new cases originate from those original clusters, one of which is likely a super-spreader, which is suggested by the spatial network generated.
Similar observations were seen during the Middle East respiratory syndrome (MERS) in South Korea where the syndrome was spread rapidly by super-spreaders. [12] Therefore, it is important to have a better understanding of these clusters during the early epidemic phase, and visualizing them may help us . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint understand how the virus is being spread. Spatial networks can identify potential super-spreaders within the clusters. Identifying these early clusters and super-spreaders may not only reduce the spread of the virus but also help with policymaking such as enforced social distancing or quarantining.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 20, 2020. . https://doi.org/10.1101/2020.03.18.20038638 doi: medRxiv preprint