Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

COVID-19 Preprints and Their Publishing Rate: An Improved Method

Francois Lachapelle
doi: https://doi.org/10.1101/2020.09.04.20188771
Francois Lachapelle
1Department of Sociology, University of British Columbia
PhD
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: f.lachapelle{at}alumni.ubc.ca
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Context As the COVID-19 pandemic persists around the world, the scientific community continues to produce and circulate knowledge on the deadly disease at an unprecedented rate. During the early stage of the pandemic, preprints represented nearly 40% of all English-language COVID-19 scientific corpus (6, 000+ preprints | 16, 000+ articles). As of mid-August 2020, that proportion dropped to around 28% (13, 000+ preprints | 49, 000+ articles). Nevertheless, preprint servers remain a key engine in the efficient dissemination of scientific work on this infectious disease. But, giving the ‘uncertified’ nature of the scientific manuscripts curated on preprint repositories, their integration to the global ecosystem of scientific communication is not without creating serious tensions. This is especially the case for biomedical knowledge since the dissemination of bad science can have widespread societal consequences.

Scope In this paper, I propose a robust method that will allow the repeated monitoring and measuring of COVID-19 preprint’s publication rate. I also introduce a new API called Upload-or-Perish. It is a micro-API service that enables a client to query a specific preprint manuscript’s publication status and associated meta-data using a unique ID. This tool is in active development.

Data I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus’ conversion rate to peer-reviewed articles. CORD-19 dataset includes preprints from arXiv, bioRxiv, and medRxiv.

Methods I utilize conditional fuzzy logic on article titles to determine if a preprint has a published counterpart version in the database. My approach is an important departure from previous studies that rely exclusively on bioRxiv API to ascertain preprints’ publication status. This is problematic since the level of false positives in bioRxiv metadata could be as high as 37%.

Findings My analysis reveals that around 15% of COVID-19 preprint manuscripts in CORD-19 dataset that were uploaded on from arXiv, bioRxiv, and medRxiv between January and early August 2020 were published in a peer-reviewed venue. When compared to the most recent measure available, this represents a two-fold increase in a period of two months. My discussion review and theorize on the potential explanations for COVID-19 preprints’ low conversion rate.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

The author received no funding for this work.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

This research is not require any approval or exemption from any IRB/oversight body at my home institution.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus' conversion rate to peer-reviewed articles. Arguably the most ambitious bibliometric COVID-19 project, CORD-19 is the collaborative effort between the Allen Institute for AI and half a dozen organizations including NIH and the White House (for more details, see Wang et al., 2020). This is an open-source dataset. I also used bioRxiv API pipeline to determine if COVID-19 preprints were associated with a peer-review final counterpart. I also scraped pubmed and pmc NIH's websites for the same purpose. Finally, I use the Python 'wrapper' package "arxiv" to query arXiv aPI to, again, determine if certain COVID-19 arXiv preprints had also been a published peer-reviewed journal.

https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge

https://api.biorxiv.org/details/biorxiv/

https://pubmed.ncbi.nlm.nih.gov/

https://www.ncbi.nlm.nih.gov/pmc/articles/

https://github.com/titipata/arxivpy

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted September 07, 2020.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
COVID-19 Preprints and Their Publishing Rate: An Improved Method
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
COVID-19 Preprints and Their Publishing Rate: An Improved Method
Francois Lachapelle
medRxiv 2020.09.04.20188771; doi: https://doi.org/10.1101/2020.09.04.20188771
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
COVID-19 Preprints and Their Publishing Rate: An Improved Method
Francois Lachapelle
medRxiv 2020.09.04.20188771; doi: https://doi.org/10.1101/2020.09.04.20188771

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Infectious Diseases (except HIV/AIDS)
Subject Areas
All Articles
  • Addiction Medicine (430)
  • Allergy and Immunology (756)
  • Anesthesia (221)
  • Cardiovascular Medicine (3298)
  • Dentistry and Oral Medicine (365)
  • Dermatology (280)
  • Emergency Medicine (479)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1173)
  • Epidemiology (13384)
  • Forensic Medicine (19)
  • Gastroenterology (899)
  • Genetic and Genomic Medicine (5157)
  • Geriatric Medicine (482)
  • Health Economics (783)
  • Health Informatics (3274)
  • Health Policy (1143)
  • Health Systems and Quality Improvement (1193)
  • Hematology (432)
  • HIV/AIDS (1019)
  • Infectious Diseases (except HIV/AIDS) (14636)
  • Intensive Care and Critical Care Medicine (913)
  • Medical Education (478)
  • Medical Ethics (127)
  • Nephrology (525)
  • Neurology (4930)
  • Nursing (262)
  • Nutrition (730)
  • Obstetrics and Gynecology (886)
  • Occupational and Environmental Health (795)
  • Oncology (2524)
  • Ophthalmology (727)
  • Orthopedics (282)
  • Otolaryngology (347)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (544)
  • Pediatrics (1302)
  • Pharmacology and Therapeutics (551)
  • Primary Care Research (557)
  • Psychiatry and Clinical Psychology (4218)
  • Public and Global Health (7512)
  • Radiology and Imaging (1708)
  • Rehabilitation Medicine and Physical Therapy (1016)
  • Respiratory Medicine (980)
  • Rheumatology (480)
  • Sexual and Reproductive Health (498)
  • Sports Medicine (424)
  • Surgery (549)
  • Toxicology (72)
  • Transplantation (236)
  • Urology (205)