Social distance and SARS memory: impact on the public awareness of 2019 novel coronavirus (COVID-19) outbreak

This study examines publicly available online search data in China to investigate the spread of public awareness of the 2019 novel coronavirus (COVID-19) outbreak. We found that cities that suffered from SARS and have greater migration ties to the epicentre, Wuhan, had earlier, stronger and more durable public awareness of the outbreak. Our data indicate that forty-eight such cities developed awareness up to 19 days earlier than 255 comparable cities, giving them an opportunity to better prepare. This study suggests that it is important to consider memory of prior catastrophic events as they will influence the public response to emerging threats.


Introduction
Public awareness is important in managing the spread of infectious diseases. Individual actions, such as increased attention to hygiene and avoiding crowds, can reduce disease spread. Awareness also supports rapid identification and treatment of new cases and facilitates collective responses, such as closures of schools or transit systems 1 . In the modern world, diseases can move faster than ever due to the growing movement of people between cities, regions and countries. However, digital technology means information can move even faster, providing an opportunity for individuals and communities to protect themselves ahead of the disease itself arriving. 2 This study considers the spread, and persistence, of public awareness of the novel Wuhan coronavirus which emerged in late 2019. During the first few weeks of this outbreak there was little coverage from mainstream media outlets, providing an unusual opportunity to study the spread of awareness of an emerging disease via other channels.
Previous studies show that the spread of awareness is strongly related to the physical locations of individuals in a social network in relation to the unfolding events [2][3][4] , termed the social distance effect. In online social networks, people with more connections tend to receive earlier warnings of catastrophic events. For example, in Hurricane Sandy in the USA, Twitter users with more followers had an awareness lead-time of up to 26 hours than less connected users 4 . Moreover, the magnitude of awareness increases over decreasing distances to the epidemic centers. For example, public awareness in Weibo, a Chinese social media platform, was two orders of magnitude stronger for the H7N9 influenza outbreak that occurred in China than the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) outbreak that occurred elsewhere 5 .
Experience of similar events, such as outbreaks of H5NI influenza in 2001, SARS (Severe Acute Respiratory Syndrome) in 2003, H1N1 influenza in 2009 and Ebola in 2014 is also likely to influence awareness. In China, the outbreak of SARS between 2003 and 2004 caused a total of 7,429 reported cases and 685 deaths 6 , and had a lasting traumatic impact on survivors and communities 7,8 . In this work, we set out to test whether public awareness of the new disease outbreak is related to social distance from the centre of the epidemic and past experience of the SARS epidemic in 2003. The SARS outbreak was 17 years ago, but its horror might still condition public awareness of lethal infectious diseases. To the best of our knowledge, few studies were carried out to understand how past severe outbreaks affect public awareness when a new outbreak occurs. This study estimates the post-SARS effect, called SARS memory effect, on the current outbreak.
We use the continuing Wuhan coronavirus outbreak as our case study to estimate the effects of social distance and SARS memory on the spread of public awareness. In late 2019, a new coronavirus, designated as COVID-19, was identified in Wuhan, the capital of China's Hubei province 9 . As of February 1st, 2020, the virus has caused approximately 11,184 cases and 258 deaths in 21 countries with the majority of cases in mainland China 10 . The first symptoms were reported on December 1st, 2019 11 , but there was no solid evidence of human-to-human transmission until January 10 th , 2020, when a patient, who did not travel to Wuhan, became infected with the virus after several days of contact with four family members 12 . However, there was little information available to the public until an official announcement about human-to-human transmission of the virus on January 20 th , 2020 13 . Wuhan City, long known as the "Nine Provinces" thoroughfare ("九省通衢"), is in the central part of China, serving as a major transportation hub transiting more than 120 million passengers every year 14 . The massive numbers of transits provided a perfect opportunity for the virus to spread. Another feature is the timing of the outbreak, close to the Spring Festival travel season, Chunyun ("春运"), which started on January 10 th , 2020. While the virus spread across almost all provinces, 16 cities had been locked down by January 23 rd , 2020 15 . Accordingly, this study focuses on the time period between December 15 th , 2019 and Jan 23 rd , 2020.

Public awareness measurement
Seeking epidemic-related information online can provide an indicator of public awareness of this new disease. In this study, we use the Baidu Search Index (BSI), available publicly at http://index.baidu.com, to measure the public awareness over time and locations (e.g., city). The total number of internet users using the Baidu search engine reached 649 million in 2014, accounting for 47.9% of the national population 16 . BSI has been used to predict epidemic outbreak 17 , HIV/AIDS incidence 18 and tourism flows 19 , suggesting BSI can provide a representative proxy for public awareness. BSI provides a weighted index for each search term. In this study, we used the term "Wuhan pneumonia" ("武汉肺 炎"), as the public has widely used. We also tried "novel coronavirus" ("新型冠状病毒") and "Wuhan outbreak" ("武汉爆发"), but the former did not exhibit a search surge, and the latter was not indexed by Baidu. Due to the privacy concern, Baidu masks daily readings that are below 57 as zero. Therefore, we used the maximum BSI value of the search term "common cold" ("感冒") between Dec 10 th and 31 th , 2019, to control the size effect for each city. We use the Ljung-Box test 20 to estimate whether or not the daily readings of "common cold" are stationary. As a result, the daily readings of 18 out of 364 cities were found to be non-stationary, so they were excluded from this study. The magnitude of public awareness of the Wuhan outbreak over time and city ∈ {1, … , 346} can be represented as The earliest day the magnitude of public awareness exceeds the arbitrary thresholds ∈ {1.5, 2, 3, 4} is defined as the earliest warning day, ( ) , for city . We also define the starting day of Chunyun as ℎ , indicating the onset day when it is likely the virus would reach all cities. As Chunyun transited approximately 3 billion million passengers in 40 days in 2019 21 , crowded transport hubs create perfect opportunities for the virus to spread. Therefore, the earlier the lead-time awareness, the better for infection control. The lead-time of awareness for city is thus defined as: Awareness typically follows a cyclical process, called the unaware-aware-unaware (UAU) process, as time passes. Keeping the public at a high level of awareness could help mitigate the virus transmission process. Therefore, we also measure the awareness retention rate as the average of magnitude from the next day of ( ) to the day of Chunyun over the magnitude at ( ) .

Measuring social distances
Microblogs (e.g., Weibo) and private social media (e.g., WeChat) are the primary communication tools used by most Chinese people 16 . While information flows cannot be observed directly, empirical studies show that social networks are influenced by long distance travel. 22 We therefore use migration flows as a proxy for long-distance information flows To be more specific, if workers born and raised in city A now work in city B, they are likely to relate information about an epidemic in city B back to friends and family in city A. This is particularly relevant in the Chinese context, where migrant workers account for more than one-third of the working population 23 . is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint proportional to the numbers of transit passengers at the corresponding city. The network does not include Hong Kong, Macau and Taiwan, as the Baidu Migration Matrix does not include them.
We use the migration flows extracted from the Baidu Migration Matrix (BMM) to build a migration network (Fig. 1). We then compute the shortest steps between any city to Wuhan, deriving the variable social distances for city as ∈ (1,8). Wuhan and the cities located in Hubei province have = 1, while cities located far away from Wuhan tend to have larger values, e.g., = ℎ = 6.
Moreover, we added = = 1, based on the rationale that Hong Kong has more airline traffic flows to Wuhan than Shanghai 24 , which has = ℎ ℎ = 1. Macau and Taiwan both have frequent traffic flows to Hong Kong 24 , so we added = = 2 and = = 2.

Measuring SARS memory
We collected all reported SARS cases in mainland China, Hong Kong, Taiwan and Macau, and assign the numbers of cases to each city as , ∈ {1, … , 346}. The values range between zero (no reported cases) and 2,521 ( = ), with an average of 70.8 and a median of 4 cases in cities with at least one case reported. We then use the logarithm, as most cities reported zero cases of SARS resulting in:

Estimating social distance and SARS memory effects
We build three groups of regression models to estimate the effects of social distance and SARS memory on public awareness measures Δ , ( ) COVID−19 and Δ respectively. _ _ represents the gross domestic product (GDP) per capita for city .
indicates whether or not city has sub-provincial or greater administrative power. Sub-provincial cities are mostly capitals of the provinces in which they are located, or important cities designated by the central government. Four cities, including Beijing, Shanghai, Tianjin and Chongqing, which are under direct control of the central government are also labelled as = 1. Those sub-provincial and above cities have much better facilities and expertise for infection control than other cities 25 , so we assume residents could be more alert.
is used to control the effects of administrative level. We also introduce Euclidean distances as a control variable, denoted as .
All data necessary to replicate the analysis is attached as S2 of the SM.

Results
The early warnings of the outbreak As early as Dec 31 st , 2019, most of the cities (326 out of 346) have exhibited at least some awareness of the emerging Wuhan outbreak (Fig. 2B). However, awareness then decreased until Jan 19 th , 2020, one day before the Chinese Centre for Disease Control and Prevention confirmed human-to-human transmissions of the novel coronavirus. Since Jan 20 th , 2020, overall awareness has increased by a magnitude of at least five, demonstrating significant awareness across all cities (Fig. 2B). Awareness remained low as the epidemic spread, falling close to its lowest point on the starting day of Chunyun (Jan 10 th , 2020). Considering cities that showed initial novel coronavirus awareness levels at least 1.5 times that of the search term "common cold", we found a total of 166 alert cities as early as Dec 31 st , 2019 (48 cities at a tighter threshold of = 3.0 times, illustrated in Fig. 2A). However, awareness decreased significantly during Chunyun.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 16, 2020. . The evolution of public awareness over time followed an unusual pattern. In a typical UAU process, people are unaware of emerging catastrophic events until they are told by their social contacts. They remain aware during the event, and awareness then fades subsequently 2,26 . However, during the Wuhan outbreak, the public experienced a process as aware-unaware-aware, with public awareness declining during the early phase of the outbreak.
Dividing cities into two groups of equal size according to the numbers of reported SARS cases, we found the cities that had been struck by SARS to be more alert ( p < 0.05 ) during onset (2B). Therefore, we believe the SARS memory still conditions public awareness. We provide evidence of its effects at the end of this section.

Awareness advantage
Three features define the awareness advantage of alert cities, including early awareness, strong magnitude and high retention of awareness. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Lead-time advantage
The copyright holder for this this version posted March 16, 2020. . days and a median of -10 days. Forty-eight cities emerge with early signals of public awareness, as early as Dec 31 st , 2019, while for most others (255 cities), awareness is as late as Jan 20 th , 2020 ( Fig.  2A). The cities of substantial lead-time advantage are those either closed to Wuhan in terms of social distances or struck by the SARS outbreak. For example, Changchun in Jilin province with 34 SARS cases and far away from Wuhan still achieved a ten days lead-time advantage. The cities that did not exhibit awareness, such as Qaramay and Heihe, are mainly located far away from Wuhan and did not suffer from the SARS outbreak.

Magnitude of awareness
In terms of the magnitude of awareness, all 346 cities exhibit at least some awareness during the onset. The values of

Retention of awareness
Even though most of the cities exhibit at least some awareness as early as Dec 31 st , 2019, only a few retain it over the following weeks as the virus began to spread. The retention rates, Δ , range between zero and 137%, with an average of 54% and a median of 55%. Eight cities lost awareness before Chunyun, while four cities developed greater awareness. Xilingol League in Inner Mongolia ranked 4 th , with a retention rate at 103%. Xilingol is far away from Wuhan in terms of social distance, but it was struck by SARS. It is worth noting that a confirmed case of plague was reported in Xilingol on Nov 16 th , 2019, only 45 days before the Wuhan outbreak.

Estimation of the social distance and SARS memory effects
The effects of social distance and SARS memory on the lead-time advantage are estimated according to Eq. 4, controlled by Euclidean distances, GDP per capita and the city's administrative level (Table 1). We found that, in model (3) in Table 1, exhibits positive effects, while shows a negative association with awareness. That means cities of strong SARS memory and which are closer to Wuhan in terms of Social distances develop early awareness. Moreover, the interaction term * exhibits negative effects, indicating that the SARS memory effect becomes stronger where cities are closer to Wuhan in terms of social distances. While controlling the model with Euclidean distances (model (5) in Table 1), we found that SARS memory effect becomes non-significant, but social distance and its interaction with SARS memory hold. Meanwhile, Euclidean distances effect are non-significant, even though it exhibits negative effect alone in model (4) in Table 1.
We further control the model with GDP per capita and administrative level (model (6) & (7) in Table  1). Model (6) is more favorable than model (7), because the former achieves slightly lower AIC scores (Akaike's Information Criteria) 27 at 1893.132 with fewer degrees of freedom (df = 8) than the latter (AIC = 1894.160, df = 9). In model (6) in Table 1, we found that both social distance and Euclidean distances exhibit negative effects, but the social distance effects decrease almost half compared to model (5) in Table 1. The SARS memory effects hold. Also, the interaction term * is still significant, which means cities of stronger SARS memory will develop more lead-time advantage, particularly when they are closer to Wuhan. GDP per capita and the binary variable exhibit significant positive effects on the lead-time advantage.

Dependent variable:
Lead-time Δ ( = 3.0) . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Note: *p**p***p<0.01 The effects of social distance and SARS memory on the magnitude of awareness are estimated according to Eq. 5 (Table 2). Similar to the findings in Table 1, memory positively affects public awareness in all models. Social distances show a significant negative effect only in the models without controlling variables (model (1), (2) & (3) in Table 2). However, the interaction term between social distances with SARS memory show a significant negative effect. When we control by Euclidean distances, GDP per capita and the administrative level (model (5) and (6) in Table 2), those effects still hold. Moreover, the effects of administrative level and development level both exhibit positive effects on the magnitude of awareness. We hypothesize that residents with better education (proxied by GDP per capita) better understand the danger of deadly infectious diseases and, accordingly, tend to seek up-to-date information online.

Dependent variable:
Magnitude of awareness at the earliest day of awareness: is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint Note: *p**p***p<0.01 The effects of social distance and SARS memory on retention of awareness are estimated according to Eq. 6. Unlike the results in Tables 1 and 2, we observe no effects from SARS memory (model (6) in Table 3). When we control Euclidean distances, the development level and the administrative level, the explanatory power of the model is still relatively weak (Adj. 2 = 0.104). It seems the decreasing awareness is a collective behavior that occurred simultaneously. Interestingly, social distances have a significant effect while the Euclidean distances do not. Development level exhibits positive effects, which suggests residents of better educated cities could be more alert during the epidemic onset. However, administrative level shows a negative effect. It seems residents living in important cities (in terms of administrative power) lost interest in the infectious diseases before Chunyun. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint

Discussion
The Wuhan coronavirus outbreak is still striking China, with thousands of cases and dozens of deaths reported every day. From this study, we found that the spread of public awareness varies markedly across Chinese cities. Through controlling for development, administrative levels, and Euclidean distances, we observe cities that were struck by SARS and have more migration to the epicentre, Wuhan, showed earlier, stronger and more durable public awareness of the outbreak. Moreover, 48 cities had developed public awareness as early as Dec 31 st , 2019, with up to 19 days of lead-time advantage against some other 255 cities. The study suggests that memory of previous events, as well as social links to an emerging threat, may influence public behaviour. Greater awareness could help slow the spread of a disease, for example through increased attention to hygiene, mask-wearing and reduced interpersonal contact. It might also facilitate collective responses such as enforced quarantine measures. However, in some circumstances enhanced awareness could have negative impacts, such as unnecessary panic or ostracism of groups perceived as being at greater risk of infection Due to the lack of infection statistics, we cannot yet statistically estimate the effect of public awareness on the subsequent seriousness of the outbreak. We note that Xilingol League in Inner Mongolia, which had relatively stronger and more durable public awareness, had fewer cases (two cases as reported at Feb 7 th , 2020 10 ) than other cities in the same province (totally 50 cases, with an average of 4.55 cases per city 10 ).
To the best of our knowledge, this study is the first to investigate how memory of previous catastrophic events, e.g., SARS, and social distances could affect the spread of public awareness. Further studies will be needed to understand whether this holds in other context, beyond the unusual and tragic circumstances of the noel coronavirus in Wuhan.
File S1: Baidu Search Index data File S2: The data necessary to replicate the analysis of this study.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted March 16, 2020. . https://doi.org/10.1101/2020.03.11.20033688 doi: medRxiv preprint