Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Open-source analytics tools for studying the COVID-19 coronavirus outbreak

Tianzhi Wu, Erqiang Hu, View ORCID ProfileXijin Ge, View ORCID ProfileGuangchuang Yu
doi: https://doi.org/10.1101/2020.02.25.20027433
Tianzhi Wu
1Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Erqiang Hu
1Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Xijin Ge
2Department of Mathematics and Statistics, South Dakota State University, Brookings, SD 57007, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Xijin Ge
  • For correspondence: gcyu1@smu.edu.cn Xijin.Ge@sdsate.edu
Guangchuang Yu
1Department of Bioinformatics, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Guangchuang Yu
  • For correspondence: gcyu1@smu.edu.cn Xijin.Ge@sdsate.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The COVID-19 outbreak originated at the end of 2019 from Wuhan [1, 2], a city in Hubei province in central China. According to the World Health Organization (WHO), there were 88,948 confirmed cases and 3,043 deaths from 65 countries as of March 2, 2020. In China, the outbreak has effectively confined over 1 billion people to their apartments and homes since the end of January 2020 and continues to disrupt healthcare, wellbeing, and the economy. As the situation in China appears to be stabilizing, sharp increases in confirmed cases are being reported in South Korea, Italy, Japan, and Iran.

Access to real-time data is essential to the public, scientists, and public health officials. The interactive dashboard developed by Dong et al. [3] is a powerful tool to assess the current situation. To provide direct access to real-time epidemiological data on this outbreak, we developed an R package, nCov2019 [4]. This open-source software aggregates data from four different sources. We retrieve the current numbers of confirmed cases and deaths in geographical locations using API (application programming interface) calls to the Tencent SARS-COV-2 website [5]. Updated several times a day, the Tencent website relies on official data obtained from the Chinese provincial health agencies, China National Health Commission (CNHC), the World Health Organization (WHO), and public health agencies in other countries. More importantly, our R package offers access to three data sources with detailed daily statistics from December 1, 2019, for 43 countries and more than 500 Chinese cities. Our first source is obtained directly from CNHC, which is official historical statistics for the 34 the Chinese provinces and special districts. The second source is from a non-governmental organization Dingxiangyuan [6], which has been continuously aggregating official data from provincial and city health agencies and the CNHC. The third source is a public GitHub repository [7], which derives data from the literature [8] for December 1, 2019, to January 10, 2020, after which it relies on the Chinese news aggregator Toutiao API. This GitHub repository includes historical data for Chinese cities as well as 43 countries. All datasets are updated daily, and they are consistent with each other (see Suppl. Doc. 1).

As demonstrated in Suppl. Doc. 1, this new package also contains functionalities to facilitate data visualization. For example, with one command, users can easily plot the distribution of cases on the maps of the world, China, and even individual provinces (Figure 1). With historical data, we can incorporate temporal and spatial information to create an animation to help us understand disease transmission and examine the spread of the COVID-19 outbreak.

Figure 1.
  • Download figure
  • Open in new tab
Figure 1.

a) Number of confirmed COVID-19 cases across different countries (a), Chinese provinces (b), and cities (c).

To enable users to access these datasets without coding, we also developed interactive web apps in both English [9] and Chinese [10]. As demonstrated in Supp. Doc. 1, these apps can also be run locally from Rstudio. Using these apps, users can gain insights by quickly generating all 23 plots in Supp. Doc. 2 based on daily updated data. Complementing the dashboard by Dong et al. [3], our web app enables users to select their regions of interest and check both the historical and real-time data. Generated by the app on February 25, 2020, Figure 2 shows that the total confirmed cases in the provinces outside Hubei are stabilizing, following a similar trend. The extreme measures that the Chinese government took since January 23 seem to be working.

Figure 2.
  • Download figure
  • Open in new tab
Figure 2.

The number of confirmed COVID-19 cases in Chinese provinces except for Hubei. Through the interactive website, users can get detailed statistics for all provinces and cities in China.

Built with the RStudio Shiny framework, these apps contain a simple forecast module. We first converted the log-transformed numbers of cases or deaths as a time-series data, then used the exponential smoothing method (ets) in the R package forecast [11] with default settings to forecast the total cases. On February 7, 2020, this simple model predicted that the death toll would reach 2000 in ten days, a staggering number at the time that later materialized, unfortunately. We also converted the raw number of cases as percent daily changes and conducted a similar forecast. Interestingly, daily percent changes in both confirmed cases and deaths in China are decreasing linearly except for a few outliers (see Figure 16 and 18 in Supplementary Document 2).

Even though not all data sources are official statistics, this kind of detailed data offers a unique opportunity to study this novel pathogen. The hundreds of cities could even be considered as semi-independent outbreaks, as many of them are far from the epicenter and effectively on lockdown from the end of January 2020. As shown in Figures 5 and 6 in Supp. Doc. 2, the death rate, estimated by dividing current total deaths by total confirmed cases, in Wuhan is 4.47%. Probably due to an overwhelmed healthcare system, this death rate is higher than the average of 2.92% (95% confidence interval [2.35% - 3.38%]) observed in 22 Chinese cities with 200 or more confirmed cases. Cities in Hubei province have higher fatality rates than cities in other regions (Figure 6 in Supp. Doc. 2). Internationally, the death rate in Japan (2.50%) is close to that of Italy (2.60%), lower than the 3.67% observed in China overall (Figure 17 in Supp. Doc. 2). The death rate in Iran is 9.63%, probably due to underreported cases.

The rapid, exponential growth phase in China spans roughly from January 15 to February 15, 2020, when the number of confirmed cases skyrocketed 1670-fold from 41 to 68,500. Such rapid growth is now evident in South Korea, Italy, and Iran (Figure 3). Other countries with a smaller number of cases but showing a sharp upward trend include Germany, Spain, and France. If not managed well, tens of thousands of cases in each of these and other countries could be possible in weeks. Public health officials need to grasp the power of exponential growth.

Figure 3.
  • Download figure
  • Open in new tab
Figure 3.

Countries with rapidly growing COVID-19 cases. This plot is obtained using our interactive app.

Currently, city-level historical data is only available for China. These data sources occasionally change data formats, which requires us to monitoring the data sources. If the APIs stopped providing data, the real-time data would not be updated. But the historical data will remain accessible for researchers. We will maintain the web apps during this outbreak.

Our nCov2019 package reduces the barrier for researchers and public health officials in obtaining comprehensive, up-to-date data on this ongoing outbreak. With this package, epidemiologists and other scientists can directly access data from four sources, facilitating mathematical modeling and forecasting of the COVID-19 outbreak. The interactive web apps are accessible to the general public and could also be easily customized by researchers to produce other dashboards or track other countries. We hope these analytics tools could be useful in studying and managing this pathogen on a global scale.

Data Availability

Source code and data are publically available on GitHub.

https://github.com/GuangchuangYu/nCov2019

Conflicts of Interests

None.

Supplementary Document 1: Detailed tutorial and example of how to use the R package.

Supplementary Document 2: Example of plots obtained from our web app.

References

  1. 1.↵
    Wang, C., et al., A novel coronavirus outbreak of global health concern. Lancet, 2020. 395(10223): p. 470–473.
    OpenUrlCrossRefPubMed
  2. 2.↵
    Zhu, N., et al., A Novel Coronavirus from Patients with Pneumonia in China, 2019. N Engl J Med, 2020. 382(8): p. 727–733.
    OpenUrlCrossRefPubMed
  3. 3.↵
    Dong, E., H. Du, and L. Gardner, An interactive web-based dashboard to track COVID-19 in real time. Lancet Infect Dis, 2020.
  4. 4.↵
    Yu, G. nCov2019: An R package for accessing coronavirus statistics. 2020 [cited 2020 Feb. 25]; Available from: https://github.com/GuangchuangYu/nCov2019.
  5. 5.↵
    Tencent. Real-time tracking of the coronavirus infection [In Chinese]. [cited 2020 February 25]; Available from: https://news.qq.com/zt2020/page/feiyan.htm.
  6. 6.↵
    DingXiangYuan. Real-time data on the novel coronavirus [In Chinese]. [cited 2020 February 25]; Available from: https://ncov.dxy.cn/ncovh5/view/pneumonia.
  7. 7.↵
    GitHub Repo: Daily statistics of 2019-nCov. [cited 2020 March 2]; Available from: https://github.com/canghailan/Wuhan-2019-nCoV.
  8. 8.↵
    Huang, C., et al., Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet, 2020. 395(10223): p. 497–506.
    OpenUrlCrossRefPubMed
  9. 9.↵
    Ge, X. Coronavirus COVID-19 outbreak statistics and forecast. [cited 2020 Feb. 25]; Available from: http://www.bcloud.org/e/.
  10. 10.↵
    Ge, X. Coronavirus COVID-19 outbreak statistics and forecast [in Chinese]. Available from: http://www.bcloud.org/v/.
  11. 11.↵
    Hyndman, R.J., Forecasting with exponential smoothing : the state space approach. Springer series in statistics. 2008, Berlin: Springer. xiii, 359 p.
View Abstract
Back to top
PreviousNext
Posted March 05, 2020.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Open-source analytics tools for studying the COVID-19 coronavirus outbreak
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Open-source analytics tools for studying the COVID-19 coronavirus outbreak
Tianzhi Wu, Erqiang Hu, Xijin Ge, Guangchuang Yu
medRxiv 2020.02.25.20027433; doi: https://doi.org/10.1101/2020.02.25.20027433
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Open-source analytics tools for studying the COVID-19 coronavirus outbreak
Tianzhi Wu, Erqiang Hu, Xijin Ge, Guangchuang Yu
medRxiv 2020.02.25.20027433; doi: https://doi.org/10.1101/2020.02.25.20027433

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (62)
  • Allergy and Immunology (141)
  • Anesthesia (44)
  • Cardiovascular Medicine (408)
  • Dentistry and Oral Medicine (67)
  • Dermatology (47)
  • Emergency Medicine (141)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (171)
  • Epidemiology (4807)
  • Forensic Medicine (3)
  • Gastroenterology (177)
  • Genetic and Genomic Medicine (670)
  • Geriatric Medicine (70)
  • Health Economics (187)
  • Health Informatics (621)
  • Health Policy (314)
  • Health Systems and Quality Improvement (200)
  • Hematology (84)
  • HIV/AIDS (155)
  • Infectious Diseases (except HIV/AIDS) (5278)
  • Intensive Care and Critical Care Medicine (324)
  • Medical Education (91)
  • Medical Ethics (24)
  • Nephrology (73)
  • Neurology (677)
  • Nursing (41)
  • Nutrition (110)
  • Obstetrics and Gynecology (124)
  • Occupational and Environmental Health (203)
  • Oncology (437)
  • Ophthalmology (138)
  • Orthopedics (36)
  • Otolaryngology (88)
  • Pain Medicine (35)
  • Palliative Medicine (15)
  • Pathology (127)
  • Pediatrics (193)
  • Pharmacology and Therapeutics (129)
  • Primary Care Research (84)
  • Psychiatry and Clinical Psychology (765)
  • Public and Global Health (1796)
  • Radiology and Imaging (321)
  • Rehabilitation Medicine and Physical Therapy (138)
  • Respiratory Medicine (255)
  • Rheumatology (86)
  • Sexual and Reproductive Health (68)
  • Sports Medicine (61)
  • Surgery (100)
  • Toxicology (23)
  • Transplantation (28)
  • Urology (37)