Tracking the spread of novel coronavirus (2019-nCoV) based on big data

The novel coronavirus (2019-nCoV) appeared in Wuhan in late 2019 have infected 34,598 people, and killed 723 among them until 8th February 2020. The new virus has spread to at least 316 cities (until 1st February 2020) in China. We used the traffic flow data from Baidu Map, and number of air passengers who left Wuhan from 1st January to 26th January, to quantify the potential infectious people. We developed multiple linear models with local population and air passengers as predicted variables to explain the variance of confirmed cases in every city across China. We found the contribution of air passengers from Wuhan was decreasing gradually, but the effect of local population was increasing, indicating the trend of local transmission. However, the increase of local transmission is slow during the early stage of novel coronavirus, due to the super strict control measures carried out by government agents and communities.

preventing and controlling the ongoing pandemic disease. The value of R 0 (basic 3 0 reproduction number) was estimated as 2.2, inferring a median size outbreak (3). 3 1 However, based on the epidemic transmission model, the number of actual infections 3 2 would be much larger than the number of confirmed cases (4). 3 3 At present, tracking the passengers from Wuhan in January 2020 is still the top 3 4 task for preventing the further spread of novel coronavirus (2019-nCoV). To 3 5 accurately estimate the risk of the novel coronavirus, we compiled the detailed daily 3 6 traffic data outbound Wuhan from a big-data source, Baidu Map, before the lockdown 3 7 of Hubei Province, in order to provide information for risk assessment of 2019-nCoV 3 8 at the province level and the city level (Supp. Fig. 1). 3 9 The traffic flow data outbound Wuhan from 1 st January to 26 th January 2020 was 4 1 downed from Baidu Map Huiyan platform (5). The number of air passengers from 4 2 Wuhan from 30 th December 2019 to 20 January 2020 was released by Aviationtalk (6).
The time series data of confirmed 2019-nCoV cases from 10 th January to 30 January 4 4 2020 was obtained from People's daily-Dingxiangyuan (1), which was released by 4 5 China National Health Commission. 4 6 We did Spearman correlation analysis for the daily traffic from Wuhan (from 1 st 4 7 January to 26 th January) and the total traffic in this period with the number of 4 8 confirmed cases (from 25 th January to 30 th January). To explain the variance of 4 9 confirmed cases in all infected provinces, we developed multiple linear models 5 0 including population, GDP, population density, and mean temperature as independent 5 1 variables. All analysis was performed using R (version 3.6.2). 5 2

3
From 20 th December 2019 to 20 January 2020, 854,424 air passengers left 5 4 Wuhan Tianhe Airport to 49 cities in China (Fig .1). From 1 st to 26 th January, about 5 5 three million domestic passengers travelled from Wuhan to other cities. Among the 5 6 passengers, a few thousands had been confirmed to infected by the novel coronavirus 5 7 (Fig. 2). The distribution of air passengers from Wuhan to other cities in China had 5 8 high correlation coefficients (0.71) with the number of confirmed infection cases in 5 9 those cities on 22 nd January. The correlation coefficient drops to 0.56 on 24 th January. 6 0 Then the number of confirmed infection cases was positive correlation with local 6 1 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020. We used a multiple regression model to explain the variance of the number of 6 3 cases in the infected cities. After model selection, only two variables remained, the 6 4 number of passengers and local population (Fig. 3). Overall, the population of the 6 5 provinces explains near half of the variance in the number of confirmed cases across 6 6 34 provinces and province-level municipalities, whereas the number of passengers 6 7 from Wuhan explained around 10% (Fig. 3).
Correlation coefficients of number of confirmed that the number of cases in the 6 9 cities (n=97) from 25 th January to 30 th January match the number of passengers from 7 0 Wuhan during the period from 1 st January to 26 th January. The highest correlation 7 1 appears on 5 th January, inferring a long incubation period up to two weeks (Supp. In the beginning of the spread of the pneumonia, there is a high correlation (0.71) 7 5 between the number of confirmed infection cases and air passengers from Wuhan, 7 6 proofed Wuhan the source of the pneumonia (7). As time going, local population 7 7 played a more dominant role, because the local spread of 2019-nCoV is likely to 7 8 happen. The basic reproductive number of the infection (ܴ 0 ) to be estimated as 3.8 7 9 means 72-75% of transmissions must be prevented to stop the outbreak (4). 8 0 Fortunately, the transmission of the virus was really controlled due to strict prevention 8 1 measures carried out by Chinese government. Restricted population movements ban 8 2 was enforced upon 16 cities in Hubei Province since 23 rd January 2020 (8), resulting 8 3 in significant decrease in passengers from Wuhan and adjacent cities, which 8 4 effectively reduces the spread of the pneumonia. However, 3-5 million people had left 8 5 Wuhan for numerous cities in China before the province lockdown (Supp. Fig.2), and 8 6 among them a number of infected people have no clinical symptom yet infectious to 8 7 others. We believe this is the highest challenge against the current national level 8 8 antivirus campaign. 8 9 Currently the first-level response to major public health emergencies has been 9 0 initiated in 30 provinces, municipalities and autonomous regions in China on 25 th 9 1 January 2020 (9), so that strict control procedures are carried out to prevent the spread 9 2 of the virus. According to an infectious disease model, it was estimated that the actual 9 3 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020. February, all suspected case in Wuhan will be taken in to medical care, and the virus 9 7 spread in Wuhan can be controlled gradually. 9 8 The correlation coefficients between the number of confirmed cases from 26 th 9 9 January to 30 th January and number of passengers from Wuhan were higher than that 1 0 0 on 25 th January (Supp. Table 1). We think the reason is that the confirmed cases on 1 0 1 25 th is too low due to lack of virus detection kit. After 25 th January the supply of virus 1 0 2 detection kit was enough and the number of confirmed cases reflect the real situation, 1 0 3 which have very high correlation with the number of passengers from Wuhan to these 1 0 4 cities during 1 st January to 26 th January. We notice the highest correlation appear on 1 0 5 5 th January, two to three weeks ahead of the confirmed cases in those cities 1 0 6 (confirmation also needs several days to complete at that time), which infers the long 1 0 7 latent period of 2019-nCoV. 1 0 8 Combining the two variables (local population and passengers from Wuhan) to 1 0 9 interpret the virus outbreak risk, we recommend that the preventing and controlling 1 1 0 measures can be divided into two different stages. In the early stage, checking the 1 1 1 passengers from Wuhan is more important. Some cities (e.g., Wenzhou in Zhejiang 1 1 2 Province) with a large number of returnees from Wuhan have many cases even when 1 1 3 they are far from Wuhan in space. The database of travel routes of confirmed patients 1 1 4 has been developed and published for free use (10), in order to tracking and warning 1 1 5 close contactors. In the late stage, local population become the most important factor 1 1 6 in predicting the number of confirmed infection cases. This suggests us that the 1 1 7 densely populated metropolitan areas, such as Shanghai, Beijing, Guangzhou, and 1 1 8 Shenzhen should pay special attentions to preventing the second-generation infections. 1 1 9 Moreover, the densely populated rural areas around Wuhan may face double threats of 1 2 0 the spread of infectious people from Wuhan and a large number of local susceptible 1 2 1 people. These areas, including areas of Hubei Province except for Wuhan, and 1 2 2 surrounding area of Henan (Nanyang, Xinyang) and Chongqing, need to prepare for a 1 2 3 surge in infection. The rural medical facilities in these areas are scarcer than in cities, 1 2 4 clustering cases are more likely to happen in these places. 1 2 5 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020. ; https://doi.org/10.1101/2020.02.07.20021196 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020. ;https://doi.org/10.1101https://doi.org/10. /2020   CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020.   . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted February 11, 2020. Supp. Fig. 1 The daily passenger (include all vehicles) numbers from Wuhan to top 50 cities in 1 3 7 China from 1st January to 26th January are showed by an animated GIF file 1 3 8 2019-nCoV_spread_1-26 Jan.gif (Supp. Fig. 1).

3 9
Supp. Fig. 2 Estimated daily number of passengers from Wuhan using a logistic curve y = 60000 + 1 4 0 10000/exp(1-0.2*x) from 1 st January to 22 nd January (the day before city lockdown). The 1 4 1 assumption is that people tend to leave Wuhan just before the Chinese New Year on 25 th January. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted February 11, 2020. ; https://doi.org/10. 1101/2020