Early epidemiological analysis of the coronavirus disease 2019 outbreak based on crowdsourced data: a population-level observational study

Lancet Digit Health. 2020 Apr;2(4):e201-e208. doi: 10.1016/S2589-7500(20)30026-1. Epub 2020 Feb 20.

Abstract

Background: As the outbreak of coronavirus disease 2019 (COVID-19) progresses, epidemiological data are needed to guide situational awareness and intervention strategies. Here we describe efforts to compile and disseminate epidemiological information on COVID-19 from news media and social networks.

Methods: In this population-level observational study, we searched DXY.cn, a health-care-oriented social network that is currently streaming news reports on COVID-19 from local and national Chinese health agencies. We compiled a list of individual patients with COVID-19 and daily province-level case counts between Jan 13 and Jan 31, 2020, in China. We also compiled a list of internationally exported cases of COVID-19 from global news media sources (Kyodo News, The Straits Times, and CNN), national governments, and health authorities. We assessed trends in the epidemiology of COVID-19 and studied the outbreak progression across China, assessing delays between symptom onset, seeking care at a hospital or clinic, and reporting, before and after Jan 18, 2020, as awareness of the outbreak increased. All data were made publicly available in real time.

Findings: We collected data for 507 patients with COVID-19 reported between Jan 13 and Jan 31, 2020, including 364 from mainland China and 143 from outside of China. 281 (55%) patients were male and the median age was 46 years (IQR 35-60). Few patients (13 [3%]) were younger than 15 years and the age profile of Chinese patients adjusted for baseline demographics confirmed a deficit of infections among children. Across the analysed period, delays between symptom onset and seeking care at a hospital or clinic were longer in Hubei province than in other provinces in mainland China and internationally. In mainland China, these delays decreased from 5 days before Jan 18, 2020, to 2 days thereafter until Jan 31, 2020 (p=0·0009). Although our sample captures only 507 (5·2%) of 9826 patients with COVID-19 reported by official sources during the analysed period, our data align with an official report published by Chinese authorities on Jan 28, 2020.

Interpretation: News reports and social media can help reconstruct the progression of an outbreak and provide detailed patient-level data in the context of a health emergency. The availability of a central physician-oriented social network facilitated the compilation of publicly available COVID-19 data in China. As the outbreak progresses, social media and news reports will probably capture a diminishing fraction of COVID-19 cases globally due to reporting fatigue and overwhelmed health-care systems. In the early stages of an outbreak, availability of public datasets is important to encourage analytical efforts by independent teams and provide robust evidence to guide interventions.

Funding: Fogarty International Center, US National Institutes of Health.

Publication types

  • Observational Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Age Factors
  • Aged
  • Aged, 80 and over
  • COVID-19 / epidemiology*
  • COVID-19 / mortality
  • Child
  • Child, Preschool
  • China / epidemiology
  • Crowdsourcing / methods
  • Crowdsourcing / statistics & numerical data*
  • Disease Outbreaks / statistics & numerical data
  • Female
  • Humans
  • Infant
  • Male
  • Middle Aged
  • Models, Statistical
  • Sex Factors
  • Young Adult