Evidence on the role of journal editors in the COVID19 infodemic: metascientific study analyzing COVID19 publication rates and patterns

ABSTRACT Objective: Infodemic, a neologism characterizing an excess of fast-tracked low quality publications, has been employed to depict the scientific research response to the COVID19 crisis. The concept relies on the presumed exponential growth of research output. This study aimed to test the COVID19 infodemic claim by assessing publication rates and patterns of COVID19-related research and a control, a year prior. Design: A Reproduction Number of Publications (Rp) was conceived. It was conceptualized as a division of a week incidence of publications by the average of publications of the previous week. The publication growth rates of preprint and MEDLINE-indexed peer-reviewed literature on COVID19 were compared using the correspondent Influenza output, a year prior, as control. Rp for COVID19 and Influenza papers and preprints were generated and compared and then analyzed in light of the respective growth patterns of their papers and preprints. Main outcomes: Output growth rates and Reproduction Number of Publications (Rp). Results: COVID19 peer-reviewed papers showed a fourteen fold increase compared to Influenza papers. COVID19 papers and preprints displayed an exponential growth curve until the 20th week. COVID19 papers displayed Rp=3.17{+/-}0.72, while the control group presented Rp=0.97{+/-}0.12. Their preprints exhibited Rp=2.18{+/-}0.54 and Rp=0.97{+/-}0.27 respectively, with no evidence of exponential growth in the control group, as its Rp remained approximately one. Conclusions: COVID19 publications displayed an epidemic pattern. As the growth patterns of COVID19 peer-reviewed articles and preprints were similar, and the majority of the COVID19 output came from indexed journals, not only authors but also editors appear to had played a significant part on the infodemic. Review protocol: https://osf.io/q3zkw/?view_only=ff540dc4630b4c6e9a2639d732047324 Ethical aspects: No ethical clarence was required as all analyzed data were publicly available.


What are the new findings?
No study pushed the infodemic metaphor forward to analyze not only volume of publication but also publication rates comparing them to a control group as to clearly pinpoint an exponential phase of contagion in the infodemic (as it would take place in a real epidemic) through a mathematical analysis of the growth patterns and rates of those publications. In this paper, we were able to demonstrate that there has been an infodemic indeed and that the editor population was as susceptible to the infodemic bug as the author population because the exponential phase was shaped not only by authors but mainly by editors from PubMed-indexed journals.

How might it impact clinical practice in the foreseeable future?
These results and conclusions are consequential to subsequent studies on rigor and depth of post publication peer review and on editorial practices within the life and health sciences research community.

INTRODUCTION
The global scientific output has been doubling every nine years 1 leaving the scientific community scrambling to keep up-to-date and to separate wheat from chaff [2][3][4] . Excessive publication is an long-established scientific community complaint, an issue the scientific journal was created to overcome by organizing and validating research [5][6][7][8] . Notwithstanding, quality complaints encompassed in the concept of avoidable research waste is a recent issue. Awareness regarding this topic has been increasing for the past thirty years [9][10][11] , giving rise to the notion that perhaps the incentives of the scientific ecosystem could have been skewing publication output towards novelty and quantity and away from reliability and quality 2,3,11 . If pressures for publishing innovation were to be considered indeed a strong drive behind the current scientific output and the average methodological quality of biomedical research had already been deemed uncommendable before the pandemic [2][3][4][9][10][11] , what would happen should the scholarly publishing ecosystem find itself under even more pressure caused by a worldwide sanitary crisis?
Mounting evidence from independent studies have been suggesting that the publication ecosystem appears to have been rushing the publication of lower quality COVID19-related papers in journals in comparison to pre-pandemic times: the time elapsed from article submission to . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01.23.22269716 doi: medRxiv preprint publication of COVID19-related articles in journals has accelerated [12][13][14][15][16][17] while there is evidence that the overall methodological quality of studies has decreased 15,18 . The median time to reach final acceptance stage was found to be eight times faster for COVID-related peer-reviewed articles in comparison to papers on other issues by independent studies 15,16,19 . In a sample of PubMedindexed journals, 10% of COVID-related studies were found to have been accepted within two days of submission 16 . A study on open peer review from two high tier journals found qualitative distinctions in COVID19-related papers when compared to non-related ones. The prior category usually underwent a single round of peer review, displayed less propensity to request further data and more propensity to request authors to tone down claims and conclusions. Before the pandemic, they usually underwent two or more rounds of review 19 .
It has also been found that, in 2020, 27% of the active authors from the SCOPUS database published COVID19 research in a subfield discipline that was not among the top three subfield disciplines in which they had published most commonly during their career. Approximately one in seven active scientists publishing in English-speaking high and middle tier journals rapidly adjusted their portfolio to procure publications on COVID19. Those authors were found across twenty one major fields of SCOPUS and included experts specializing in fishery, ornithology, entomology and architecture that had published on COVID19 in 2020 20 . Furthermore, COVID19 manuscripts were uploaded as preprints concurrently to their submission to journals, implying their authors generally did not specifically pursue the pre-submission feedback when posting their research as preprint 21 . It is noteworthy, however, there is no evidence COVID19 preprint authors published outside of their expertise 22 . Those findings point to an informational phenomenon that could maybe be contagious as other information phenomena have proved to be. A famous example is the Matthew Effect, in which citations are associated with more citations over time 23,24 , implying citations attract citations, as attention attracts attention. The ongoing publishing phenomenon has been referred to by the scholarly community as an infodemic 12,25 . But, could the deluge of papers from journals and preprints indeed be framed as an infodemic in which COVID19 publications would have lead to even more COVID19 publications exponentially? As compared to what control-group and to which period of time? And, if so, what could have been the participation of journals and preprint servers in the phenomenon? Have the COVID19-related output rates been following some sort of epidemic pattern or was the term infodemic just a catching metaphor? Were preprints to blame? And if so, to what extent?
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 24, 2022. ; In view of this rationale, this study intended to evaluate whether COVID19-related publications followed an epidemic pattern, displaying expontential growth and a Reproduction Number of Publications (Rp) above one, being the Rp defined as a division of a weekly incidence of publications by the average of publications of previous weeks. Additionally, it aimed to assess the behavior of preprint publication rates compared to their journal counterparts. To the best of our knowledge no such comparison has yet been made. Preprints have proved to be a relevant part of the scientific publication ecosystem, informing policy and being increasingly discussed in the lay

Incidence of Publications
From January 2020, month of the first COVID19 publication, to February 6th 2021, the COVID19 output increased weekly to a total of 97,781 documents, a fourteen-fold increase compared to the Influenza output, which accumulated 6,936 documents for the equivalent period.
COVID19 journal articles incidence increased until the 20 th week of 2020, weekly, from January to June, while Influenza output had remained stable over time a year prior ( Figure 1). From the 20 th week to 52 nd week, the number of COVID19 journal articles became steady at 1,820 (SD ± 233.8). (Figures 1 and 2 in Appendix).
From the 1 st to the 30 th week in the 2020 timeframe, COVID19 preprints accounted for 6,960 documents and Influenza for 614 ( Figure 1a). Incidence of COVID19 preprints also . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Influenza Preprint
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01.23.22269716 doi: medRxiv preprint For both COVID19 journal articles and preprints, the log transformation of the weekly cumulative number of publications displayed an exponential behavior from weeks 6 to 20 ( Figure   2). The linear regressions described the data at this time-lapse with an R 2 =99% for journal articles and an R 2 =96% for preprints, through the weeks (t) with: ( ) = 0.28 + 3.9 and ( ) = 0.26 + 3.3 respectively.
The Influenza control group displayed a stable linear growth pattern. At the cumulative semi-log function, Influenza journal articles and preprints presented a slope of 0.08 (95% CI, 0.07 -0.09), R 2 =96%). Accordingly, if we analyze the cumulative incidence function over time (without any log transformation), a linear pattern with a slope of 75.8 (95% CI, 74 -78) for journal articles and 18.8 (95% CI, [18][19][20] for preprints was found, each with an R 2 =99.8%. Consequently, the best function to represent the control group was not an exponential one.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ;

Publication Reproduction Analysis
The Rp for journal articles from the 6 th to the 20 th week showed statistically significant difference between COVID19 (3.14±0.71) and Influenza (0.96±0.12), p<0.01. Difference was also found for preprints: COVID19 (2.78±0.53) and Influenza (0.99±0.27), p<0.01. Within COVID19 and Influenza groups, Rp of preprints and journal articles did not differ: p=0.13 for COVID publications and p=0.64 for Influenza ( Figure 3).

DISCUSSION
Epidemics may result from a combination of transmissibility of an etiological agent, environmental context and susceptibility of a population. Our findings suggest that the deluge of COVID19-related papers and preprints could indeed be framed as an infodemic as compared to the control group, resulting from (a) a very transmissible bug, (b) exposed and susceptible individuals in (c) an environment that enhanced transmission due to the natural urgency to find solutions for the crisis and, possibly, pressure for publication and recognition. As the present findings show, it becomes likely that, in the early months of the pandemic, the more papers were published, the more authors fed into the cycle of transmission by publishing yet more papers both in traditional journals and in the preprint servers, exposing themselves and others. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. It is interesting to note, however, that both preprint publication and journal publication displayed equivalent growth patterns suggesting that the publishing venues may have reacted to community pressures. In other words, as Influenza-related articles had been published in journals, they were published in preprint servers in a proportional frequency, displayed by the two superimposed growth curves with analogous patterns (Figure 4). As COVID19-related articles were published in journals so they were published in preprint servers in the same fashion, although following a steeper growth curve on both venues.
In view of that, we would do very well to keep in mind that, as a contaminated doorknob is no more responsible in transmission than the unwashed hand that touched the knob and then the face, the publication venues appear to have behaved as the scientific community shaped them to.
However, because journals have placed themselves in the past two hundred years as official science adjudicators, and, according to our findings they were responsible for the bulk in COVID19 publication, they could be interpreted as being the main culprit behind the infodemic.
Thus, it would appear that the infodemic bug contaminated not only authors, but also editors.
It could be speculated perhaps that journal editors may have decided to accelerate the publication process, publish more and, perhaps, expect the community to judge later, placing a lot of pressure on post-publication peer review, which may or may not have happened and is a topic for further investigation in other studies, being outside the scope of this thought experiment.
Mainstream life and medical science preprint servers, on the other hand, played but a very small part in the infodemic while journals acted as the main superspreading venues as they are the main infrastructure that to this date dominates the scientific publication ecosystem.
On the other hand, a greater volume of journal articles does not necessarily imply they received greater attention, both in lay media and in scientific circles as to, by themselves, account for disinformation, misinformation and all in between in the COVID19 era. It is likewise not possible, within the scope of this experiment, to argue that preprints did not account for more disinformation than their journal counterparts. To the best of our knowledge, there is no strong empirical evidence swaying the argument either way. Any future analysis should consider the attention received by published papers and preprints in the form of citation, social media mentions and lay media coverage. A future infodemic investigation would do well to assess and compare . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01.23.22269716 doi: medRxiv preprint Altmetrics and citations for journal articles and preprints employing also a control group.
Considering preprints are readily accessible, it is plausible that they could have played an important part on the spread of the infodemic bug by means of social media and lay media coverage. However the debate is far from settled and in need of further evidence.
At this point, everyone is well aware that more publication does not equal better, more relevant and useful publication. The lower quality of journal articles during the pandemic has been identified by independent studies and, due to that, the quality of peer review is to be presumed lacking. Therefore the infodemic may as well have brought about a secondary wave of hazards such as cognitive fatigue and the possibility of having good, useful and relevant COVID19 research buried under a deluge of low quality research reports. It is not to be implied that all COVID19-era research has resulted from a desire for recognition in spite of quality but our findings, along with previous findings on the quality of pandemic publication output, make a strong case that recognition-seeking played a very strong part, adding strength to claiming the infodemic as an actual contagious informational (and, thus, behavioral) phenomenon among authors, not as a metaphor alone.
The infodemic bug is still active, there is no immunization against publishing urges and pressures yet. It is just not exponential anymore as it was in the early months of 2020, which is a good sign. This has not been the first time the scientific community has been at odds with the amount of research output it spawned. The journals themselves have been established so that scientists could cope with the amount of literature they had to keep up with in order to remain upto-date back in the XVII century. Thus, one could argue that the cognitive overload after the bibliographic explosion exerted an adaptive pressure on the scientific publication ecosystem towards professionalization. The novelty in the COVID19 infodemic is the widespread use of preprint servers, which may play out as another adaptive pressure on the scientific publication ecosystem towards more openness and post-publication peer review. As the evidence showed, if journals published so much more and while presenting a significantly worse quality than in prepandemic times, perhaps more skepticism towards pre-publication peer review would be in order as well.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01. 23.22269716 doi: medRxiv preprint In light of the present findings, it seems the publication venues will be what authors and editors shape them to be, even more so than reviewers. Thus, there is no point in placing the blame onto structures and technologies and not tackling the real issue: pressure for publication and recognition in order to achieve career advancement, at the expense of methodological rigor and quality. This is the underlying complication the debate around research waste has been pointing out for the past thirty years. So, as we hope for our thought experiment to have shown, for both journals and preprint servers to provide fruitful high-quality public scientific debate by means of reliable research, both authors and editors need to put quality and transparency before speed and novelty and actively resist the infodemic bug themselves.

Query
For articles from PubMed-indexed journals, Entrez Direct (EDirect) NIH/NLM application was employed for retrieving publication metadata from MEDLINE by E-utilities API 29,30 . MEDLINE metadata retrieval included papers from December 1 st , 2019 to February 6 th , 2021 for COVID19. The same interval one year prior was considered for Influenza papers because publications on that topic were presumed to have been suppressed in 2020 due to the current focus of the experts on the ongoing COVID19 pandemic. Thus, this study aimed for a more conservative approach as to not overestimate journal article output findings.
Preprint metadata was retrieved from the bioRxiv and medRxiv servers 31 , presumed to be the most prominent among the life sciences servers to this date 21,22,28 . The rationale was: if PubMed was to be considered a collection of high quality biomedical output, the equivalent collection among preprint servers would be both bioRxiv and medRxiv. The native advanced search tools connecting medRxiv and bioRxiv databases was employed for querying preprints 31,32 . Preprint metadata retrieval interval for COVID19 ranged from January 13 th , 2020 (first occurrence) to July 28, 2020. The same interval, one-year prior, was considered for Influenza preprints.
To rule out the possibility of missing metadata in the EDirect output files, COVID19 EDirect queue results were compared to the PubMed queue results 33 . Comparison revealed no loss . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; of articles from automatic extraction employing EDirect. Thus, to reduce the possibility of manual metadata collection inducing bias or loss of metadata, it was decided upon using the EDirect software package. Search strategies, data extraction and technical software aspects are detailed in the Appendix.

Eligibility
All journal articles from EDirect queries were included. Preprints that came in mixed up within those search results and those articles that did not come with a full date of publication upon extraction were excluded. All bioRxiv and medRxiv search results for preprints were included as all of them have publication date. No metadata curation was performed at this point for any group.
A simulation was run, excluding letters and opinion pieces from the published papers output, amounting to no expressive change in quantity of published pieces.

Extraction and internal validity assessment
EDirect metadata from COVID19 and Influenza journal articles was downloaded into a Microsoft Excel readable document (Microsoft Corporation, United States of America). Influenza preprint metadata was exported into an Extensible Markup Language file from the preprint search result pages. COVID19 preprints were collected from the COVID19 dedicated page connecting the servers' databases, accessed in July 28 th , 2020 31 .
To assess the validity of employing a non-curated and automatic method of metadata extraction, the search strategy for journal articles was repeated at different timepoints, in which the PubMed output was compared to NCBI LitCovid output 34 . Due to that assessment, PubMed was confirmed to have 1.17 more publications on COVID19 than LitCovid did then, indicating a possible delay in the LitCovid update timeframe. Thus, it was decided PubMed was more reliable for an accurate coverage. Influenza metadata results were deemed not needed to be validated or curated due to the output being eight times smaller than its COVID19 counterpart.
Attribute variable PubmedPubDate@pubmed, a date that reflects inclusion in PubMed database, and PubDate, a date that reflects the issue date of publication, were extracted from PubMeb as quality control of ArticleDate, a variable which corresponds to the date the journal . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; publisher has made an electronic version of the article available online. PubmedPubDate@pubmed PubDate was perceived to display loss in accuracy, with papers that seemed to have been added into the NLM collection, in bulk, days or weeks after their first online publication date in their journal websites. Thus, ArticleDate was chosen as variable of interest because it showed to be equivalent or slightly more nuanced than PubmedPubDate@pubmed and to PubDate, presenting itself as the most accurate choice 35 (see Figure 4 in Appendix).
To enable the creation of a timeframe of incidence of publication, weeks were numbered automatically employing Excel, placing week 1 in January for all groups.

Function appraisal
As every exponential function becomes linear when it is log-transformed, to allow for the testing of the hypothesis of exponential growth, a log transformation on the cumulative incidence variable was performed. Hence, the data linearity was appraised by linear regression throught its function slope, or growth rate (r), which is also present at the exponential function model ( ) = *(') , and the determination coefficient (R²). Exponential model function was described using trendline equation in Excel. The growth pattern of the curve was analyzed and the steeper the inclination, the more pronounced the growth was, which characterized the multiplier effect (r) of the time independent variable as the power of the exponential function. Time intervals for analysis were later defined after evaluation of curve behavior and best function fit.

Publication Reproduction Number
The well-known effective reproduction number (Rt) helps project the epidemic growth patterns because it is an indicator of the contagiousness of an etiologic agent within a specific timeframe and environmental context. It has been established as the number of secondary cases a single case at a given time would generate 36 . Based that rationale, a parameter conceptually inspired by the effective reproduction number was conceived for this thought experiment: the Publication Reproducibility Number (Rp). This parameter consisted in the division of the weekly incidence of publications by the average of previous weekly cases, where i is the first considered week for a given time-lapse and n is the total number of weeks within the analyzed timeframe, in which s stands for week incidence, and c is the current week incidence.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ; In the context of an infodemic, new cases were to be understood as additional published papers or preprints. This concept was conceived after the assumption that the desire to publish COVID19-related studies could have been amplified by the amount of COVID19 literature already published and the attention it thus received, which is supported by mounting independent evidence on author publication behavior in the COVID19-era 20,22 and on the faulty reward system of scientific publication 2-4, 9, 10 .
As the infodemic metaphor went, to better define an epidemic of published papers and preprints, the Rp would paint a clearer picture where simple comparison of quantity of published papers and preprint output could not. So, within a putative model where each article had perpetual influence over the subsequent article generation, authors would influence -or, metaphorically, contaminate -other authors with a misplaced urge to spawn additional COVID19 papers. So, as this rationale went, the way the Rp had grown in the early months of the COVID19 pandemic could inform how much the average incidence of new weekly publications would be increasing or decreasing, supporting or refuting the infodemic claim.

Data Analysis
Journal articles and preprints were compared by theme of publication (COVID19 or Influenza) and by type of publication (preprints or journal articles). The numerical variables were described as mean and standard deviation, after the data distribution appraisal. To compare the COVID19 and Influenza Rp means, the bilateral independent Student's t-test was employed.
Exponencial function growth rate was achieved throught the linear regression and described with confidence interval. For all analysis, p-value of less than 0.05 was taken as statistically significant.  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted January 24, 2022. ;  is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 24, 2022. • Duplicates and references with PublicationsType variable equal to "Preprint" were automatically excluded.
• Several formulas were used to extract and plot the number of publications per week, as follows: Reference example cell containing date information: Gx, where x is a row. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01.23.22269716 doi: medRxiv preprint R = Cumulative value of incidence per week; =(SUM(Q2:Qx))* S = Neperian log of Rx; =LN(Rx) *the sheet has headers.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted January 24, 2022. ; https://doi.org/10.1101/2022.01.23.22269716 doi: medRxiv preprint