PT - JOURNAL ARTICLE AU - Francois Lachapelle TI - COVID-19 Preprints and Their Publishing Rate: An Improved Method AID - 10.1101/2020.09.04.20188771 DP - 2020 Jan 01 TA - medRxiv PG - 2020.09.04.20188771 4099 - http://medrxiv.org/content/early/2020/10/13/2020.09.04.20188771.short 4100 - http://medrxiv.org/content/early/2020/10/13/2020.09.04.20188771.full AB - Context As the COVID-19 pandemic persists around the world, the scientific community continues to produce and circulate knowledge on the deadly disease at an unprecedented rate. During the early stage of the pandemic, preprints represented nearly 40% of all English-language COVID-19 scientific corpus (6, 000+ preprints | 16, 000+ articles). As of mid-August 2020, that proportion dropped to around 28% (13, 000+ preprints | 49, 000+ articles). Nevertheless, preprint servers remain a key engine in the efficient dissemination of scientific work on this infectious disease. But, giving the ‘uncertified’ nature of the scientific manuscripts curated on preprint repositories, their integration to the global ecosystem of scientific communication is not without creating serious tensions. This is especially the case for biomedical knowledge since the dissemination of bad science can have widespread societal consequences.Objective In this paper, I propose a robust method that allows the repeated monitoring and measuring of COVID-19 preprints’ publication rate. I also introduce a new API called Upload-or-Publish. It is a free micro-API service that enables a client to query a specific preprint manuscript’s publication status and associated meta-data using a unique ID. The beta-version is currently working and deployed.Data I use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus’ conversion rate to peer-reviewed articles. CORD-19 dataset includes 10,454 preprints from arXiv, bioRxiv, and medRxiv.Methods I utilize conditional fuzzy logic to link preprints with their published counterparts. My approach is an important departure from previous studies that rely exclusively on bio/medRxiv API to ascertain preprints’ publication status.Results As expected, the findings suggest a positive relationship between the time elapsed since preprints’ first server upload and preprints harboring a published status. For instance, as of mid-September, close to 50% of preprints uploaded in January were published in peer-review venues. That figure is at 29% for preprints uploaded in April, and 5% for preprints uploaded in August. As this is an ongoing project, it will continue to track the publication rates of preprints over time.Competing Interest StatementThe authors have declared no competing interest.Funding StatementThe author received no funding for this work.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:This research is not require any approval or exemption from any IRB/oversight body at my home institution.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesI use Covid-19 Open Research Dataset (CORD-19) to calculate COVID-19 preprint corpus' conversion rate to peer-reviewed articles. Arguably the most ambitious bibliometric COVID-19 project, CORD-19 is the collaborative effort between the Allen Institute for AI and half a dozen organizations including NIH and the White House (for more details, see Wang et al., 2020). This is an open-source dataset. I also used bioRxiv API pipeline to determine if COVID-19 preprints were associated with a peer-review final counterpart. I also scraped pubmed and pmc NIH's websites for the same purpose. Finally, I use the Python 'wrapper' package "arxiv" to query arXiv aPI to, again, determine if certain COVID-19 arXiv preprints had also been a published peer-reviewed journal. https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge https://api.biorxiv.org/details/biorxiv/ https://pubmed.ncbi.nlm.nih.gov/ https://www.ncbi.nlm.nih.gov/pmc/articles/ https://github.com/titipata/arxivpy