Abstract
This study examines publicly available online search data in China to investigate the spread of public awareness of the 2019 novel coronavirus (COVID-19) outbreak. We found that cities that suffered from SARS and have greater migration ties to the epicentre, Wuhan, had earlier, stronger and more durable public awareness of the outbreak. Our data indicate that forty-eight such cities developed awareness up to 19 days earlier than 255 comparable cities, giving them an opportunity to better prepare. This study suggests that it is important to consider memory of prior catastrophic events as they will influence the public response to emerging threats.
Introduction
Public awareness is important in managing the spread of infectious diseases. Individual actions, such as increased attention to hygiene and avoiding crowds, can reduce disease spread. Awareness also supports rapid identification and treatment of new cases and facilitates collective responses, such as closures of schools or transit systems1. In the modern world, diseases can move faster than ever due to the growing movement of people between cities, regions and countries. However, digital technology means information can move even faster, providing an opportunity for individuals and communities to protect themselves ahead of the disease itself arriving.2 This study considers the spread, and persistence, of public awareness of the novel Wuhan coronavirus which emerged in late 2019. During the first few weeks of this outbreak there was little coverage from mainstream media outlets, providing an unusual opportunity to study the spread of awareness of an emerging disease via other channels.
Previous studies show that the spread of awareness is strongly related to the physical locations of individuals in a social network in relation to the unfolding events2–4, termed the social distance effect. In online social networks, people with more connections tend to receive earlier warnings of catastrophic events. For example, in Hurricane Sandy in the USA, Twitter users with more followers had an awareness lead-time of up to 26 hours than less connected users4. Moreover, the magnitude of awareness increases over decreasing distances to the epidemic centers. For example, public awareness in Weibo, a Chinese social media platform, was two orders of magnitude stronger for the H7N9 influenza outbreak that occurred in China than the Middle East Respiratory Syndrome Coronavirus (MERS-CoV) outbreak that occurred elsewhere5.
Experience of similar events, such as outbreaks of H5NI influenza in 2001, SARS (Severe Acute Respiratory Syndrome) in 2003, H1N1 influenza in 2009 and Ebola in 2014 is also likely to influence awareness. In China, the outbreak of SARS between 2003 and 2004 caused a total of 7,429 reported cases and 685 deaths6, and had a lasting traumatic impact on survivors and communities 7,8. In this work, we set out to test whether public awareness of the new disease outbreak is related to social distance from the centre of the epidemic and past experience of the SARS epidemic in 2003. The SARS outbreak was 17 years ago, but its horror might still condition public awareness of lethal infectious diseases. To the best of our knowledge, few studies were carried out to understand how past severe outbreaks affect public awareness when a new outbreak occurs. This study estimates the post-SARS effect, called SARS memory effect, on the current outbreak.
We use the continuing Wuhan coronavirus outbreak as our case study to estimate the effects of social distance and SARS memory on the spread of public awareness. In late 2019, a new coronavirus, designated as COVID-19, was identified in Wuhan, the capital of China’s Hubei province9. As of February 1st, 2020, the virus has caused approximately 11,184 cases and 258 deaths in 21 countries with the majority of cases in mainland China10. The first symptoms were reported on December 1st, 201911, but there was no solid evidence of human-to-human transmission until January 10th, 2020, when a patient, who did not travel to Wuhan, became infected with the virus after several days of contact with four family members12. However, there was little information available to the public until an official announcement about human-to-human transmission of the virus on January 20th, 202013. Wuhan City, thoroughfare (“九省通衢”), is in the central part of China, serving as a major transportation hub transiting more than 120 million passengers every year14. The massive numbers of transits provided a perfect opportunity for the virus to spread. Another feature is the timing of the outbreak, close to the Spring Festival travel season, Chunyun (“春运”), which started on January 10th, 2020. While the virus spread across almost all provinces, 16 cities had been locked down by January 23rd, 202015. Accordingly, this study focuses on the time period between December 15th, 2019 and Jan 23rd, 2020.
Data and methods
Public awareness measurement
Seeking epidemic-related information online can provide an indicator of public awareness of this new disease. In this study, we use the Baidu Search Index (BSI), available publicly at http://index.baidu.com, to measure the public awareness over time and locations (e.g., city). The total number of internet users using the Baidu search engine reached 649 million in 2014, accounting for 47.9% of the national population16. BSI has been used to predict epidemic outbreak17, HIV/AIDS incidence18 and tourism flows19, suggesting BSI can provide a representative proxy for public awareness. BSI provides a weighted index for each search term. In this study, we used the term “Wuhan pneumonia” (“武汉肺炎”), as the public has widely used. We also tried “novel coronavirus” (“新型冠状病毒”) and “Wuhan outbreak” (“武汉爆发”), but the former did not exhibit a search surge, and the latter was not indexed by Baidu. Due to the privacy concern, Baidu masks daily readings that are below 57 as zero. Therefore, we used the maximum BSI value of the search term “common cold” (“感冒”) between Dec 10th and 31th, 2019, to control the size effect for each city. We use the Ljung-Box test20 to estimate whether or not the daily readings of “common cold” are stationary. As a result, the daily readings of 18 out of 364 cities were found to be non-stationary, so they were excluded from this study. The magnitude of public awareness of the Wuhan outbreak over time t and city i ∈ {1, …, 346} can be represented as as defined below. and represent the BSI values of the search terms “Wuhan pneumonia” and “common cold” respectively. The BSI raw data is provided in S1 of the supplementary materials (SM).
The earliest day the magnitude of public awareness exceeds the arbitrary thresholds C ∈ {1.5, 2, 3, 4} is defined as the earliest warning day, twarming(i), for city i. We also define the starting day of Chunyun as tchunyun, indicating the onset day when it is likely the virus would reach all cities. As Chunyun transited approximately 3 billion million passengers in 40 days in 201921, crowded transport hubs create perfect opportunities for the virus to spread. Therefore, the earlier the lead-time awareness, the better for infection control. The lead-time of awareness for city i is thus defined as:
Awareness typically follows a cyclical process, called the unaware-aware-unaware (UAU) process, as time passes. Keeping the public at a high level of awareness could help mitigate the virus transmission process. Therefore, we also measure the awareness retention rate as the average of magnitude from the next day of twarming(i) to the day of Chunyun over the magnitude at twarming(i).
Measuring social distances
Microblogs (e.g., Weibo) and private social media (e.g., WeChat) are the primary communication tools used by most Chinese people16. While information flows cannot be observed directly, empirical studies show that social networks are influenced by long distance travel.22 We therefore use migration flows as a proxy for long-distance information flows To be more specific, if workers born and raised in city A now work in city B, they are likely to relate information about an epidemic in city B back to friends and family in city A. This is particularly relevant in the Chinese context, where migrant workers account for more than one-third of the working population23.
We use the migration flows extracted from the Baidu Migration Matrix (BMM) to build a migration network (Fig. 1). We then compute the shortest steps between any city to Wuhan, deriving the variable social distances for city i as Di ∈ (1, 8). Wuhan and the cities located in Hubei province have Di = 1, while cities located far away from Wuhan tend to have larger values, e.g., Di=Lhasa = 6.
Moreover, we added Di=Hong Kong = 1, based on the rationale that Hong Kong has more airline traffic flows to Wuhan than Shanghai24, which has Di=Shanghai = 1. Macau and Taiwan both have frequent traffic flows to Hong Kong24, so we added Di=Macau = 2 and Di=Taiwan = 2.
Measuring SARS memory
We collected all reported SARS cases in mainland China, Hong Kong, Taiwan and Macau, and assign the numbers of cases to each city as SARSi, i ∈ {1, …, 346}. The values range between zero (no reported cases) and 2,521 (i = Beijing), with an average of 70.8 and a median of 4 cases in cities with at least one case reported. We then use the logarithm, as most cities reported zero cases of SARS resulting in:
Estimating social distance and SARS memory effects
We build three groups of regression models to estimate the effects of social distance and SARS memory on public awareness measures and ΔOi respectively. GDP_per_capitairepresents the gross domestic product (GDP) per capita for city i. SubProvinciali indicates whether or not city i has sub-provincial or greater administrative power. Sub-provincial cities are mostly capitals of the provinces in which they are located, or important cities designated by the central government. Four cities, including Beijing, Shanghai, Tianjin and Chongqing, which are under direct control of the central government are also labelled as SubProvinciali = 1. Those sub-provincial and above cities have much better facilities and expertise for infection control than other cities25, so we assume residents could be more alert. SubProvinciali is used to control the effects of administrative level. We also introduce Euclidean distances as a control variable, denoted as di.
All data necessary to replicate the analysis is attached as S2 of the SM.
Results
The early warnings of the outbreak
As early as Dec 31st, 2019, most of the cities (326 out of 346) have exhibited at least some awareness of the emerging Wuhan outbreak (Fig. 2B). However, awareness then decreased until Jan 19th, 2020, one day before the Chinese Centre for Disease Control and Prevention confirmed human-to-human transmissions of the novel coronavirus. Since Jan 20th, 2020, overall awareness has increased by a magnitude of at least five, demonstrating significant awareness across all cities (Fig. 2B). Awareness remained low as the epidemic spread, falling close to its lowest point on the starting day of Chunyun (Jan 10th, 2020). Considering cities that showed initial novel coronavirus awareness levels at least 1.5 times that of the search term “common cold”, we found a total of 166 alert cities as early as Dec 31st, 2019 (48 cities at a tighter threshold of C = 3.0 times, illustrated in Fig. 2A). However, awareness decreased significantly during Chunyun.
The evolution of public awareness over time followed an unusual pattern. In a typical UAU process, people are unaware of emerging catastrophic events until they are told by their social contacts. They remain aware during the event, and awareness then fades subsequently2,26. However, during the Wuhan outbreak, the public experienced a process as aware-unaware-aware, with public awareness declining during the early phase of the outbreak.
Dividing cities into two groups of equal size according to the numbers of reported SARS cases, we found the cities that had been struck by SARS to be more alert (p < 0.05) during onset (2B). Therefore, we believe the SARS memory still conditions public awareness. We provide evidence of its effects at the end of this section.
Awareness advantage
Three features define the awareness advantage of alert cities, including early awareness, strong magnitude and high retention of awareness.
Lead-time advantage
During onset, between Dec 31st, 2019 and Jan 23rd, 2020, 266 cities exhibited significant public awareness (using a threshold at C = 3.0; 210 cities at C = 4.0; 314 cities at C = 2.0; and 322 cities at C = 1.5). The lead-time advantage Δti ranges between −13 and 10 (C = 3.0), with an average of −7.4 days and a median of −10 days. Forty-eight cities emerge with early signals of public awareness, as early as Dec 31st, 2019, while for most others (255 cities), awareness is as late as Jan 20th, 2020 (Fig. 2A). The cities of substantial lead-time advantage are those either closed to Wuhan in terms of social distances Di or struck by the SARS outbreak. For example, Changchun in Jilin province with 34 SARS cases and far away from Wuhan still achieved a ten days lead-time advantage. The cities that did not exhibit awareness, such as Qaramay and Heihe, are mainly located far away from Wuhan and did not suffer from the SARS outbreak.
Magnitude of awareness
In terms of the magnitude of awareness, all 346 cities exhibit at least some awareness during the onset. The values of range between 0.50 and 61.51, with an average of 2.18 and a median of 1.49. Wuhan undoubtedly ranked the first, while Wuzhong in Ningxia ranked last. Cities in Hubei province exhibit much greater awareness, with an average of 7.83 and a median of 3.46. Shenzhen, Shanghai and Beijing also had high scores at 10.40, 10.65, and 9.78 respectively. Those three cities are both close to Wuhan in terms of social distances Di and struck by SARS.
Retention of awareness
Even though most of the cities exhibit at least some awareness as early as Dec 31st, 2019, only a few retain it over the following weeks as the virus began to spread. The retention rates, ΔOi, range between zero and 137%, with an average of 54% and a median of 55%. Eight cities lost awareness before Chunyun, while four cities developed greater awareness. Xilingol League in Inner Mongolia ranked 4th, with a retention rate at 103%. Xilingol is far away from Wuhan in terms of social distance, but it was struck by SARS. It is worth noting that a confirmed case of plague was reported in Xilingol on Nov 16th, 2019, only 45 days before the Wuhan outbreak.
Estimation of the social distance and SARS memory effects
The effects of social distance and SARS memory on the lead-time advantage are estimated according to Eq. 4, controlled by Euclidean distances, GDP per capita and the city’s administrative level (Table 1). We found that, in model (3) in Table 1, SARSi exhibits positive effects, while Di shows a negative association with awareness. That means cities of strong SARS memory and which are closer to Wuhan in terms of Social distances develop early awareness. Moreover, the interaction term Di ∗ SARSi exhibits negative effects, indicating that the SARS memory effect becomes stronger where cities are closer to Wuhan in terms of social distances.
While controlling the model with Euclidean distances (model (5) in Table 1), we found that SARS memory effect becomes non-significant, but social distance and its interaction with SARS memory hold. Meanwhile, Euclidean distances effect are non-significant, even though it exhibits negative effect alone in model (4) in Table 1.
We further control the model with GDP per capita and administrative level (model (6) & (7) in Table 1). Model (6) is more favorable than model (7), because the former achieves slightly lower AIC scores (Akaike’s Information Criteria)27 at 1893.132 with fewer degrees of freedom (df = 8) than the latter (AIC = 1894.160, df = 9). In model (6) in Table 1, we found that both social distance and Euclidean distances exhibit negative effects, but the social distance effects decrease almost half compared to model (5) in Table 1. The SARS memory effects hold. Also, the interaction term Di ∗ SARSi is still significant, which means cities of stronger SARS memory will develop more lead-time advantage, particularly when they are closer to Wuhan. GDP per capita and the binary variable SubProvinciali exhibit significant positive effects on the lead-time advantage.
The effects of social distance and SARS memory on the magnitude of awareness are estimated according to Eq. 5 (Table 2). Similar to the findings in Table 1, SARSi memory positively affects public awareness in all models. Social distances Di show a significant negative effect only in the models without controlling variables (model (1), (2) & (3) in Table 2). However, the interaction term between social distances with SARS memory show a significant negative effect. When we control by Euclidean distances, GDP per capita and the administrative level (model (5) and (6) in Table 2), those effects still hold. Moreover, the effects of administrative level and development level both exhibit positive effects on the magnitude of awareness. We hypothesize that residents with better education (proxied by GDP per capita) better understand the danger of deadly infectious diseases and, accordingly, tend to seek up-to-date information online.
The effects of social distance and SARS memory on retention of awareness are estimated according to Eq. 6. Unlike the results in Tables 1 and 2, we observe no effects from SARS memory (model (6) in Table 3). When we control Euclidean distances, the development level and the administrative level, the explanatory power of the model is still relatively weak (Adj. R2 = 0.104). It seems the decreasing awareness is a collective behavior that occurred simultaneously. Interestingly, social distances have a significant effect while the Euclidean distances do not. Development level exhibits positive effects, which suggests residents of better educated cities could be more alert during the epidemic onset. However, administrative level shows a negative effect. It seems residents living in important cities (in terms of administrative power) lost interest in the infectious diseases before Chunyun.
Discussion
The Wuhan coronavirus outbreak is still striking China, with thousands of cases and dozens of deaths reported every day. From this study, we found that the spread of public awareness varies markedly across Chinese cities. Through controlling for development, administrative levels, and Euclidean distances, we observe cities that were struck by SARS and have more migration to the epicentre, Wuhan, showed earlier, stronger and more durable public awareness of the outbreak. Moreover, 48 cities had developed public awareness as early as Dec 31st, 2019, with up to 19 days of lead-time advantage against some other 255 cities. The study suggests that memory of previous events, as well as social links to an emerging threat, may influence public behaviour. Greater awareness could help slow the spread of a disease, for example through increased attention to hygiene, mask-wearing and reduced interpersonal contact. It might also facilitate collective responses such as enforced quarantine measures. However, in some circumstances enhanced awareness could have negative impacts, such as unnecessary panic or ostracism of groups perceived as being at greater risk of infection
Due to the lack of infection statistics, we cannot yet statistically estimate the effect of public awareness on the subsequent seriousness of the outbreak. We note that Xilingol League in Inner Mongolia, which had relatively stronger and more durable public awareness, had fewer cases (two cases as reported at Feb 7th, 202010) than other cities in the same province (totally 50 cases, with an average of 4.55 cases per city10).
To the best of our knowledge, this study is the first to investigate how memory of previous catastrophic events, e.g., SARS, and social distances could affect the spread of public awareness. Further studies will be needed to understand whether this holds in other context, beyond the unusual and tragic circumstances of the noel coronavirus in Wuhan.
Data Availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
Funding
X.L. and W.X. were supported by the National Natural Science Foundation of China, No. 41571118.
Competing interests
The authors declare that they have no competing interests.
Author contributions
X.L., H.C., C.P. and A.R. conceived of the research question. H.C. and W.X. processed the Baidu Search Index data. X.L. and W.X. collected the other meta data including SARS cases and GDP per capita. H.C., X.L., W.X., A.R. and C.P. interpreted the result. H.C., X.L., W.X., A.R. and C.P. drafted the manuscript and compiled supplementary information. All authors edited the manuscript and supplementary information and aided in concept development.
Data and materials availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.
Supplementary Materials
File S1: Baidu Search Index data
File S2: The data necessary to replicate the analysis of this study.
Acknowledgments
We thank Dr. Ross Sparks from Data61, CSIRO for helpful suggestions and advice on the methodologies. We also thank the anonymous reviewers for their valuable suggestions.