Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Almost significant: trends and P values in the use of phrases describing marginally significant results in 567,758 randomized controlled trials published between 1990 and 2020

Willem M Otte, View ORCID ProfileChristiaan H Vinkers, Philippe Habets, David G P van IJzendoorn, View ORCID ProfileJoeri K Tijdink
doi: https://doi.org/10.1101/2021.03.01.21252701
Willem M Otte
1Biomedical MR Imaging and Spectroscopy, Center for Image Sciences, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
2Department of Child Neurology, UMC Utrecht Brain Center, University Medical Center Utrecht, 3584 CX Utrecht, The Netherlands
PhD
Roles: associate professior
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christiaan H Vinkers
3Amsterdam UMC, Department of Psychiatry, Department of Anatomy and Neurosciences, 1081 HZ Amsterdam, The Netherlands
MD, PhD
Roles: associate professor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christiaan H Vinkers
Philippe Habets
3Amsterdam UMC, Department of Psychiatry, Department of Anatomy and Neurosciences, 1081 HZ Amsterdam, The Netherlands
MD, PhD
Roles: candidate
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
David G P van IJzendoorn
4Department of Pathology, Leiden University Medical Center, Leiden, The Netherlands
PhD
Roles: postdoctoral researcher
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joeri K Tijdink
5Department of Ethics, Law and Humanities, Amsterdam UMC, location VUmc, 1081 HZ Amsterdam
6The Netherlands and Department of Philosophy, VU University, 1081 HE Amsterdam
MD, PhD
Roles: assistant professor
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Joeri K Tijdink
  • For correspondence: j.tijdink@amsterdamumc.nl
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Objective To quantitatively map how non-significant outcomes are reported in randomised controlled trials (RCTs) over the last thirty years.

Design Quantitative analysis of English full-texts containing 567,758 RCTs recorded in PubMed (81.5% of all published RCTs).

Methods We determined the exact presence of 505 pre-defined phrases denoting results that do not reach formal statistical significance (P<0.05) in 567,758 RCT full texts between 1990 and 2020 and manually extracted associated P values. Phrase data was modeled with Bayesian linear regression. Evidence for temporal change was obtained through Bayes-factor analysis. In a randomly sampled subset, the associated P values were manually extracted.

Results We identified 61,741 phrases indicating close to significant results in 49,134 (8.65%; 95% confidence interval (CI): 8.58–8.73) RCTs. The overall prevalence of these phrases remained stable over time, with the most prevalent phrases being ‘marginally significant’ (in 7,735 RCTs), ‘all but significant’ (7,015), ‘a nonsignificant trend’ (3,442), ‘failed to reach statistical significance’ (2,578) and ‘a strong trend’ (1,700). The strongest evidence for a temporal prevalence increase was found for ‘a numerical trend’, ‘a positive trend’, ‘an increasing trend’ and ‘nominally significant’. The phrases ‘all but significant’, ‘approaches statistical significance’, ‘did not quite reach statistical significance’, ‘difference was apparent’, ‘failed to reach statistical significance’ and ‘not quite significant’ decreased over time. In the random sampled subset, the 11,926 identified P values ranged between 0.05 and 0.15 (68.1%; CI: 67.3–69.0; median 0.06).

Conclusions Our results demonstrate that phrases describing marginally significant results are regularly used in RCTs to report P values close to but above the dominant 0.05 cut-off. The phrase prevalence remained stable over time, despite all efforts to change the focus from P < 0.05 to reporting effect sizes and corresponding confidence intervals. To improve transparency and enhance responsible interpretation of RCT results, researchers, clinicians, reviewers, and editors need to abandon the focus on formal statistical significance thresholds and stimulate reporting of exact P values with corresponding effect sizes and confidence intervals.

Significance statement The power of language to modify the reader’s perception of how to interpret biomedical results cannot be underestimated. Misreporting and misinterpretation are urgent problems in RCT output. This may be at least partially related to the statistical paradigm of the 0.05 significance threshold. Sometimes, creativity and inventive strategies of clinical researchers may be used – describing their clinical results to be ‘almost significant’ – to get their data published. This phrasing may convince readers about the value of their work. Since 2005 there is an increasing concern that most current published research findings are false and it has been generally advised to switch from null hypothesis significance testing to using effect sizes, estimation, and cumulation of evidence. If this ‘new statistics’ approach has worked out well should be reflected in the phases describing non-significance results of RCTs. In particular in changing patterns describing P values just above 0.05 value.

More than five hundred phrases potentially suited to report or discuss non-significant results were searched in over half a million published RCTs. A stable overall prevalence of these phrases (10.87%, CI: 10.79–10.96; N: 61,741), with associated P values close to 0.05, was found in the last three decades, with strong increases or decreases in individual phrases describing these near-significant results. The pressure to pass scientific peer-review barrier may function as an incentive to use effective phrases to mask non-significant results in RCTs. However, this keeps the researcher’s pre-occupied with hypothesis testing rather than presenting outcome estimations with uncertainty. The effect of language on getting RCT results published should ideally be minimal to steer evidence-based medicine away from overselling of research results, unsubstantiated claims about the efficacy of certain RCTs and to prevent an over-reliance on P value cutoffs. Our exhaustive search suggests that presenting RCT findings remains a struggle when P values approach the carved-in-stone threshold of 0.05.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Trial

n/a

Funding Statement

Funding source: No funding source supported this study.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethics approval: None required.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Data sharing: All used PubMed IDs, detected phrases, co-text extractions, manually identified P values, and processing scripts are openly shared at: https://github.com/wmotte/almost_significant (v1.0; http://doi.org/10.5281/zenodo.4313162).

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted March 03, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Almost significant: trends and P values in the use of phrases describing marginally significant results in 567,758 randomized controlled trials published between 1990 and 2020
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Almost significant: trends and P values in the use of phrases describing marginally significant results in 567,758 randomized controlled trials published between 1990 and 2020
Willem M Otte, Christiaan H Vinkers, Philippe Habets, David G P van IJzendoorn, Joeri K Tijdink
medRxiv 2021.03.01.21252701; doi: https://doi.org/10.1101/2021.03.01.21252701
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Almost significant: trends and P values in the use of phrases describing marginally significant results in 567,758 randomized controlled trials published between 1990 and 2020
Willem M Otte, Christiaan H Vinkers, Philippe Habets, David G P van IJzendoorn, Joeri K Tijdink
medRxiv 2021.03.01.21252701; doi: https://doi.org/10.1101/2021.03.01.21252701

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (269)
  • Allergy and Immunology (549)
  • Anesthesia (134)
  • Cardiovascular Medicine (1744)
  • Dentistry and Oral Medicine (238)
  • Dermatology (172)
  • Emergency Medicine (310)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (649)
  • Epidemiology (10769)
  • Forensic Medicine (8)
  • Gastroenterology (582)
  • Genetic and Genomic Medicine (2927)
  • Geriatric Medicine (286)
  • Health Economics (530)
  • Health Informatics (1917)
  • Health Policy (832)
  • Health Systems and Quality Improvement (740)
  • Hematology (289)
  • HIV/AIDS (626)
  • Infectious Diseases (except HIV/AIDS) (12493)
  • Intensive Care and Critical Care Medicine (684)
  • Medical Education (299)
  • Medical Ethics (86)
  • Nephrology (318)
  • Neurology (2777)
  • Nursing (150)
  • Nutrition (431)
  • Obstetrics and Gynecology (553)
  • Occupational and Environmental Health (596)
  • Oncology (1451)
  • Ophthalmology (440)
  • Orthopedics (172)
  • Otolaryngology (254)
  • Pain Medicine (190)
  • Palliative Medicine (56)
  • Pathology (378)
  • Pediatrics (863)
  • Pharmacology and Therapeutics (361)
  • Primary Care Research (333)
  • Psychiatry and Clinical Psychology (2625)
  • Public and Global Health (5331)
  • Radiology and Imaging (1001)
  • Rehabilitation Medicine and Physical Therapy (592)
  • Respiratory Medicine (721)
  • Rheumatology (329)
  • Sexual and Reproductive Health (288)
  • Sports Medicine (278)
  • Surgery (327)
  • Toxicology (47)
  • Transplantation (149)
  • Urology (124)