Abstract
Background Prospectively registering study plans in a permanent time-stamped and publicly accessible document is becoming more common across disciplines and aims to improve the trustworthiness of research findings. Selective reporting persists, however, when researchers deviate from their registered plans without disclosure. This systematic review aims to estimate the prevalence of undisclosed discrepancies between prospectively registered study plans and their associated publication. We further aim to identify the research disciplines where these discrepancies have been observed, whether interventions to reduce discrepancies have been conducted, and gaps in the literature.
Methods On 15 December 2019, we searched Scopus and Web of Knowledge for articles that included quantitative data about discrepancies between registrations or study protocols and their associated publications. We used random-effects meta-analyses to synthesize the results.
Results We reviewed k = 89 articles, including k = 70 that report on primary outcome discrepancies from n = 6314 studies and, k = 22 that report on secondary outcome discrepancies from n = 1436 studies. Meta-analyses indicate that between 10% to 68% (95% prediction interval) of studies contain at least one primary outcome discrepancy and between 13% to 95% (95% prediction interval) contain at least one secondary outcome discrepancy. Almost all articles assessed clinical literature, and there was considerable heterogeneity, resulting in wide prediction intervals. We identified only one article that attempted to correct discrepancies.
Discussion Many articles did not include information on whether discrepancies were disclosed, which version of a registration they compared publications to, and whether the registration was prospective. Thus, our estimates represent discrepancies broadly, rather than our target of undisclosed discrepancies between prospectively registered study plans and their associated publications. Discrepancies are common and reduce the trustworthiness of medical research. Interventions to reduce discrepancies could prove valuable.
Registration osf.io/ktmdg. Protocol amendments are listed in Supplementary Material A.
Introduction
In 2000, ClinicalTrials.gov and the ISRCTN Registry were launched with several aims, including aiding participant recruitment, facilitating knowledge synthesis, and reducing duplication, publication bias and selective reporting (Zarin et al., 2017). In 2005, the International Committee of Medical Journal Editors (ICMJE) made prospective registration a condition of consideration for publication (De Angelis et al., 2004). Thousands of journals now claim to follow this policy (ICMJE, 2021). In parallel, the World Health Organization International Clinical Trials Registry Platform established a minimum set of required information for a trial to be considered fully registered, including experimental design elements such as the conditions being studied, intervention, key inclusion and exclusion criteria, sample size, primary outcomes, and key secondary outcomes (Sim et al., 2006). While the relatively widespread uptake of clinical trial registration has substantially improved transparency, many trials remain unregistered, are registered after enrollment of participants begins or analyses are complete (i.e., retrospective registration), are never published, or publish outcomes discrepant with those in the registration without disclosing the discrepancy (e.g., Chan et al., 2017; Scott et al., 2015). Nevertheless, the existence of registries allows researchers to identify and quantify these issues.
Here we systematically review articles that quantify the prevalence of discrepancies between registrations or study protocols and their associated publications (e.g., in primary outcome measures). Our analysis extends beyond the three systematic reviews already published on this topic in several ways (Dwan, Gamble, et al., 2013; C. W. Jones et al., 2015; G. Li et al., 2018). First, registration has expanded beyond clinical trials; we included all research disciplines and registries in our search, including psychology and the social sciences (the Open Science Framework), economics (American Economic Association RCT Trial Registry), and systematic reviews (PROSPERO). Second, we extracted more fine-grained information about a wide-range of discrepancies (e.g., outcomes, analysis, sample size), as well as which version of the registration was surveyed and whether discrepancies were disclosed (we believe disclosed discrepancies present little reason for concern). Third, our review includes over twice as many studies as previous systematic reviews on this topic, provides meta-analytic estimates, and uses meta-regression and additional analyses to attempt to identify predictors of discrepancies.
Methods
Terminology
We present a systematic review of k = 89 articles that assessed a wide-range of outcome discrepancies and non-outcome discrepancies across over n = 7,000 studies. To avoid confusion, this report consistently uses the terms studies to refer to the over n = 7,000 individual studies that were assessed, and the term article to refer to the k = 89 articles that assessed these studies, and that we reviewed. We restrict our usage of the term publication to refer to the publications stemming from the studies (not to refer to the articles). We use the term discrepancy to refer to any incongruity between the content of a publication and its associated registration (e.g., on clinicaltrials.gov) or study protocol (e.g., submitted to an ethics review board or funding agency). We use the term prospective registration broadly to include terms used in different research disciplines, such as prospective trial registration, preregistration, and pre-analysis plans. All these terms indicate the registration of study details before commencing a study, or in some cases, before viewing the data or removing the blind. They are in contrast to retrospective registration, which occurs after participant enrollment begins or analyses are complete.
Search Strategy
We searched Scopus and Web of Science on 15 December 2019 using the queries in Appendix A and Appendix B of our preregistered protocol (osf.io/ktmdg). Briefly, our queries included (1) variations of the terms preregistration, pre-analysis plans, and prospective registration in the title or keyword fields; (2) terms indicating discrepancies such as “outcome switching” in the title, keywords or abstract; (3) names of registration or protocol repositories such as “clinicaltrials.gov” in the title or keywords; and excluded overlapping but irrelevant terms (e.g., “nursing preregistration”). To limit the number of irrelevant articles, we did not search for variations of the term preregistration or for repository names in the abstract field.
Our search returned 4,283 articles after duplicates were removed (see Figure 1 for a PRISMA flowchart). Articles were screened independently by two reviewers in two stages. In Stage One, reviewers screened titles and, if necessary, briefly examined abstracts of articles to determine inclusion in the systematic review or in a scoping review on prospective registration, which we will report separately. If at least one of the reviewers deemed an article potentially relevant, it was included in Stage Two screening. In Stage Two, the reviewers independently examined the remaining 464 abstracts in greater detail for eligibility. Disagreements were resolved through discussion between the two reviewers and eventual consensus. Inter-rater reliability for the 464 articles was Cohen’s k = 0.67 for inclusion in the systematic review (the list of articles and coding is available at osf.io/wa62f). Inter-rater reliability for all 4,283 articles was Cohen’s k = 0.72. We included articles that reported quantitative data about discrepancies between registrations or study protocols and their associated publication. We excluded conference proceedings and articles written in a language other than English (for full inclusion and exclusion criteria, see our preregistered protocol at osf.io/ktmdg). We used a snowball method and identified 33 additional articles that met our inclusion criteria, mostly through citations in G. Li et al. (2018) and C. W. Jones et al. (2015). After full-text review, we included 89 articles in our systematic review.
PRISMA flowchart of article inclusion
Coding items
Each included article was independently coded by two of four reviewers (RTT, RC, OvdA, SW) using a coding form designed for this review. The form details the operationalization of each variable we coded, and is available at osf.io/728ys. The complete dataset, including the coding of each reviewer and the resolved coding, is available at osf.io/ue2c6. A cleaned dataset with only the resolved coding is available at osf.io/6cn9m. We chose items to code based on a pilot test of our protocol, as well as the categories used in a seminal paper (Chan, Hróbjartsson, et al., 2004) and a recent systematic review on discrepancies (G. Li et al., 2018). Missing data was coded as missing and not included in analyses.
Statistical analyses
We performed two random effects meta-analyses: one on the proportion of studies with at least one primary outcome discrepancy, and another on the proportion of studies with at least one secondary outcome discrepancy. We used random effects models because they allow for the true effect to vary across the populations the articles sampled from, and the articles we reviewed differ in their methodologies and the research disciplines that they assess. We performed meta-regressions to test whether article characteristics are associated with the proportion of studies with at least one primary or secondary outcome discrepancy.
For pooled estimates, we report both confidence intervals and prediction intervals. Whereas researchers are likely more familiar with confidence intervals, interpreting confidence intervals can be unintuitive (Hoekstra et al., 2014), and their pooled-estimate does not incorporate uncertainty due to the between-article heterogeneity. If we assume that we could resample from our population, 95% of the resampled meta-analyses would give a pooled result that falls within a 95% confidence interval. Alternatively, if we are interested in the results that would come from another article assessing discrepancies, we would want a 95% prediction interval. In other words, of 100 articles drawn from the same population, we could expect the results from 95 of them to fall within the 95% prediction interval. While prediction intervals are not commonly reported, methodologists recommend reporting them for random effects meta-analysis, particularly when few articles are included or, as in our case, included articles are highly heterogeneous (Higgins et al., 2009; Riley et al., 2011).
Whereas we did not perform a formal risk of bias assessment—because our review differed substantially from the purpose these tools were built for—we shed light on a few potential sources of bias with additional analyses that consider the funding source, statistical significance, and the timing of registration of included studies. These additional analyses were not prospectively registered. We made a few amendments to our preregistered study protocol which are listed in Supplementary Material A.
Results
Articles characteristics
We identified and reviewed k = 89 articles that report at least one type of discrepancy. Article characteristics are outlined in Table 1. All articles except for two, one preprint in economics (Ofosu & Posner, 2019) and one preprint in psychology (Claesen et al., 2019), focused on clinical trials or systematic reviews. All but k = 10 articles were solely observational. Only one article attempted to correct published discrepancies. They sent letters to the editor within weeks of a study being published with discrepant outcomes (Goldacre et al., 2019).
Article characteristics
Registration timing
Articles varied in the level of detail they provided about whether and when studies were registered. For example, whereas some articles presented their sample only after selecting for prospectively registered studies, other articles detailed their selection process including how many studies were registered and if so, when they were registered. Using the terminology in the articles we reviewed, articles identified studies that were registered retrospectively (k = 29), registered during participant enrollment (k = 17), registered after participant enrollment was complete (k = 14), and studies that were not registered (k = 36). Many articles were ambiguous regarding when some studies were registered (k = 47) and whether or not some studies were registered at all (k = 24). While these data do not provide fine-grained detail, they highlight two overarching issues: many studies are not registered, and many registered studies are registered retrospectively. These studies fail to meet the Declaration of Helsinki (World Medical Association, 2013) (item 35) requirement that “Every research study involving human subjects must be registered in a publicly accessible database before Article characteristic k = 89 recruitment of the first subject” and the equivalent International Committee of Medical Journal Editors (ICMJE) policy (ICMJE, 2019), which thousands of journals claim to follow (ICMJE, 2021).
Eighty of the k = 89 articles we reviewed report at least one type of outcome discrepancy. Of these, 23 report only on studies that were unambiguously prospectively registered, 51 do not unambiguously distinguish between prospectively and retrospectively registered studies, and 6 report outcome discrepancies separately for each of prospectively and retrospectively registered studies. Separate meta-analyses for unambiguously prospectively registered studies and studies with unclear timing of registration are presented in Supplementary Material H.
Forty-six of the k = 89 articles report at least one non-outcome discrepancy (e.g., in sample size or analyses). Of these, 12 report only on studies that were unambiguously prospectively registered, 33 do not unambiguously distinguish between prospectively and retrospectively registered studies, and one reports non-outcome discrepancies separately for each of prospectively and retrospectively registered studies. Non-outcome analyses are presented in Supplementary Material F.
Primary outcome discrepancies
An estimated 10-68% (95% prediction interval) of the population of studies contain at least one primary outcome discrepancy (Figure 2). This equates to a 95% confidence interval of 29-37%.
Forest plot of articles reporting the proportion of assessed studies with at least one primary outcome discrepancy.
This meta-analysis had high heterogeneity (I 2 = 86%), suggesting that the broad range of estimates across the articles stem largely from differences in the methodology of the articles or populations they sample from, rather than from chance. Heterogeneity could not be explained by meta-regression of any of the following article-level characteristics: discipline (p = 0.28), whether the publications were compared to registry entries versus other protocol formats (e.g., ethics applications) (p = 0.46), sources searched to identify studies (p = 0.65), version of the registry analyzed (p = 0.77), whether discrepancies were disclosed (p = 0.97), and year of article publication (p = 0.83). The meta-regression on discipline had low power because 63 articles assessed medical research and 7 assessed studies across dentistry, psychology, physical therapy, and economics. To increase statistical power, we reran this meta-regression after dichotomizing discipline and found that studies in disciplines other than medicine may have a greater proportion of articles with at least one primary outcome discrepancy (p = 0.09; OR 95% CI: 0.91-3.19). We ran another meta-regression after dichotomizing the source which publications were compared to into registrations versus other protocols and did not find evidence to suggest this moderator is playing a role (p = 0.42). All meta-regression model summaries are presented in Supplementary Material I.
The high heterogeneity in this meta-analysis may stem from genuine differences among the articles, including the sub-disciplines surveyed, specific sources searched, definition of a discrepancy (e.g., whereas some articles considered a change in the timing of an outcome as a discrepancy, others did not), and other article characteristics that may or may not have been reported. Our dataset contains more fine-grained information about the specific sub-discipline surveyed and specific sources searched. While we do not further explore these potential moderators in the present report, we note that, whereas some sub-disciplines and sources were highly specific (e.g., cystic fibrosis, lung cancer immunotherapy, Global Resource of Eczema Trials database), others were broad (e.g., medicine, clinicaltrials.gov, core clinical MEDLINE journals). We did not collect information on the exact definitions an article used to identify a primary outcome discrepancy. However, we did collect information on the proportion of articles with sub-categories of outcome discrepancies, which are more strictly defined and listed in Table S1 (e.g., promoting a secondary outcome to a primary outcome). We ran meta-analyses on these sub-categories of outcome discrepancies and found they also had high heterogeneity (see Supplementary Material E). Thus, varying definitions are unlikely to be the main driver of the high heterogeneity in the present analysis on primary outcome discrepancies.
Secondary outcome discrepancies
An estimated 13-95% (95% prediction interval) of the population of studies contain at least one secondary outcome discrepancy (Figure 3). This equates to a 95% confidence interval of 50-75%.
Forest plot of articles reporting the proportion of assessed studies with at least one secondary outcome discrepancy.
This meta-analysis also had high heterogeneity (I 2 = 90%) which could not be explained by meta-regression of the version of the registry analyzed (p = 0.8) or the year of article publication (p = 0.72). Meta-regression of the sources searched to identify studies explained some heterogeneity in that searches stemming from journals, compared to registries, had a greater proportion of publications with at least one secondary outcome discrepancy (p = 0.03; OR 95% CI: 1.89-13.45). Meta-regressions on discipline (p = 0.29), whether discrepancies were disclosed (p = 0.68), and whether the publications were compared to registry entries versus other protocol formats (p = 0.08) had very low statistical power because almost all articles had the same characteristic. All meta-regression model summaries are included in Supplementary Material I.
Descriptively, omitting secondary outcomes and adding secondary outcomes appears to occur more frequently than omitting primary outcomes, adding primary outcomes, or demoting primary outcomes, which in turn appear to occur more frequently than promoting a secondary outcome (see Table S1).
Parameters potentially related to discrepancies
A subset of articles contained information on parameters potentially related to the proportion of outcome discrepancies. These include the disclosure of discrepancies, presence of a ‘statistically significant’ result, funding source, and timing of registration (Table 2; Supplementary Material D).
Additional analyses regarding discrepancies
Discussion
We find that outcome measures in registrations and study protocols often differ from published outcome measures, that the prevalence of discrepancies varies substantially across the articles we reviewed, and that this heterogeneity is not easily assigned to specific article characteristics. Given the wide range of discrepancy prevalence across individual articles, point estimates and confidence intervals may provide false precision when extrapolating our findings to the registered literature at large. Moreover, because heterogeneity could not be explained by meta-regression of article characteristics, more precise estimates cannot be derived for subsets of the literature. The prediction intervals can reasonably be used to extrapolate to the registered medical literature at large, although the included studies do not necessarily form a representative sample.
Our main findings are in line with previous systematic reviews. These reviews included 27 articles each and found that 31% of studies had a primary outcome discrepancy in the median article they reviewed (C. W. Jones et al., 2015) and 54% of studies had any outcome discrepancy in the median article they reviewed (G. Li et al., 2018). The latter review did not distinguish between primary and secondary outcomes, and many articles they reviewed only assessed primary outcomes. Our review included all the articles contained in these systematic reviews, except for a few that did not meet our inclusion criteria (e.g., a PhD thesis, an abstract).
We identified several gaps in the literature on discrepancies. There exists little research on: (1) the prevalence of discrepancies in fields other than clinical research, (2) the prevalence of discrepancies in a representative sample across clinical disciplines, (3) the level of specificity in registrations, and (4) interventions to reduce undisclosed discrepancies (see Supplementary Material G for more depth regarding these gaps). We also identified several themes from surveying the conclusions of the articles we reviewed. These include the need for awareness surrounding discrepancies, the need for mandates, enforcement, and/or new initiatives to address discrepancies, and the benefit of registering additional information such as analysis plans (Table S3).
Our review raises broader issues regarding the efficiency of the research ecosystem and the trustworthiness of research outputs. We identified articles that documented discrepancies between publications and all of registrations, protocols, ethics applications, funding applications, and marketing approval applications. The existence of multiple documents outlining the same study raises the likelihood of discrepancies and, in the absence of a clearly demarcated ‘master’ document, leaves ambiguity regarding which document is ‘correct.’ Rehashing the same study details for different audiences may also be an inefficient use of researchers’ time. Identifying a single publicly accessible document as the version of record (this could be the registration) and having all other documents point to this version of record for key information could reduce ambiguity and improve efficiency.
As for trustworthiness, registration has had a clearly positive influence on medical research. At the same time, some registration policies have poor adherence (e.g., many trials are registered retrospectively, and many trial results are never reported (DeVito et al., 2019)). The existence of research policies that are regularly overlooked, rarely monitored, and come with no consequence for non-compliance, can be damaging in at least two ways. They risk devaluing research policies altogether and they can reduce the trustworthiness of research outputs by creating a false impression that rigorous research practices were employed. Conceiving research as a complex ecosystem comprised of various agents with diverse incentives (e.g., funders, publishers, institutions, individual researchers) can help to comprehend why some policies have poor adherence and to develop and implement effective research infrastructure.
In conclusion, registrations provide the evidence to detect selective reporting and outcome switching, which we found to be common. Nearly all articles we reviewed focused on documenting issues. Future efforts regarding discrepancies—and research improvement broadly—could prove more fruitful by shifting focus towards developing and testing solutions to these now well-documented issues.
Data Availability
All data and analysis code will be openly shared on the University of Bristol Data Repository upon acceptance for publication. Before acceptance, these documents will be available at https://osf.io/5gfty/
Supplementary material
osf.io/byqhp
Funding
Robert Thibault is supported by a postdoctoral fellowship from the Fonds de la recherche en santé du Québec. Hugo Pedder was funded by the NIHR Biomedical Research Centre at University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol. The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. Olmo van den Akker is supported by a Consolidator Grant (IMPROVE) from the European Research Council (ERC; grant no. 726361). Robbie Clark is supported by a SWDTP ESRC +3 PhD studentship. Jacqueline Thompson was funded by a grant from Jisc during the course of this research. Marcus Munafò, Robert Thibault, Jacqueline Thompson, and Robbie Clark are part of the MRC Integrative Epidemiology Unit (MC_UU_00011/7). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Transparency statement
Robert Thibault, the manuscript’s guarantor, affirms that the manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as originally planned have been explained. All data and analysis code will be openly shared on the University of Bristol Data Repository upon acceptance for publication.
Competing interests
All authors have a current interest in improving research practice and research quality. Our prior belief is that discrepancies between registrations and publications are common, and that they reduce the trustworthiness of research, which motivated this review.
Contributors
Contributions according to Contributor Roles Taxonomy (CRediT) (casrai.org/credit/)
Footnotes
(a contributorship statement is included before the references)
There was an error in the authorship list, which is now corrected. No other changes have been made to the manuscript since the first version uploaded to medRxiv.