PT - JOURNAL ARTICLE AU - Juliana C. Taube AU - Paige B. Miller AU - John M. Drake TI - An open-access database of infectious disease transmission trees to explore superspreader epidemiology AID - 10.1101/2021.01.11.21249622 DP - 2021 Jan 01 TA - medRxiv PG - 2021.01.11.21249622 4099 - http://medrxiv.org/content/early/2021/01/13/2021.01.11.21249622.short 4100 - http://medrxiv.org/content/early/2021/01/13/2021.01.11.21249622.full AB - Historically, emerging and re-emerging infectious diseases have caused large, deadly, and expensive multi-national outbreaks. Often outbreak investigations aim to identify who infected whom by reconstructing the outbreak transmission tree, which visualizes transmission between individuals as a network with nodes representing individuals and branches representing transmission from person to person. We compiled a database of 383 published, standardized transmission trees consisting of 16 directly-transmitted diseases ranging in size from 2 to 286 cases. For each tree and disease we calculated several key statistics, such as outbreak size, average number of secondary infections, the dispersion parameter, and the number of superspreaders. We demonstrated the potential utility of the database through short analyses addressing questions about superspreader epidemiology for a variety of diseases, including COVID-19. First, we compared the frequency and contribution of superspreaders to onward transmission across diseases. COVID-19 outbreaks had significantly fewer superspreaders than outbreaks of SARS and MERS and a dispersion parameter between that of SARS and MERS. Across diseases the presence of more superspreaders was associated with greater outbreak size. Second, we further examined how early spread impacts tree size. Generally, trees sparked by a superspreader had larger outbreak sizes than those trees not sparked by a superspreader, and this trend was significant for COVID-19 trees. Third, we investigated patterns in how superspreaders are infected. Across trees with more than one superspreader, we found support for the theory that superspreaders generate other superspreaders, even when controlling for number of secondary infections. In sum, our findings put the role of superspreading to COVID-19 transmission in perspective with that of SARS and MERS and suggest an avenue for further research on the generation of superspreaders. These data have been made openly available to encourage reuse and further scientific inquiry.Author Summary Public health investigations often aim to identify who infected whom, or the transmission tree, during outbreaks of infectious diseases. These investigations tend to be resource intensive but valuable as they contain epidemiological information, including the average number of infections caused by each individual and the variation in this number. To date, there remains no standardized format nor comprehensive database of infectious disease transmission trees. To fill this gap, we standardized and compiled more than 350 published transmission trees for 16 directly-transmitted diseases into a database that is publicly available. In this paper, we give an overview of the database construction process, as well as a demonstration of the types of questions that the database can be used to answer related to superspreader epidemiology. For example, we show that COVID-19 outbreaks have fewer superspreaders than outbreaks of SARS and MERS. We also find support for the theory that superspreaders generate other superspreaders. In the future, this database can be used to answer other outstanding questions in the field of epidemiology.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThis work was supported by the Population Biology of Infectious Diseases REU Site and National Science Foundation grants DBI-1659683 and DGE-1545433. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:We are using publicly available data.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).Yes I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data used in this manuscript are available for download at the following link (https://outbreaktrees.ecology.uga.edu/) and all code used to compile the database is available on GitHub (https://github.com/DrakeLab/taube-transmission-trees). https://outbreaktrees.ecology.uga.edu/ https://github.com/DrakeLab/taube-transmission-trees