Use of causal claims in observational studies: a research on research study

Objective: To evaluate the consistency of causal statements in the abstracts of observational studies published in The BMJ. Design: Research on research study. Data source: All cohort or longitudinal studies describing an exposure-outcome relationship published in The BMJ during 2018. We also had access to the submitted papers and reviewer reports. Main outcome measures: Proportion of published research papers with 'inconsistent' use of causal language in the abstract. Papers where language was consistently causal or non-causal were classified as 'consistently causal' or 'consistently not causal', respectively; those where causality may be inferred were classified as 'suggests causal'. For the 'inconsistent' papers, we then compared the published and submitted version. Results: Of 151 published research papers, 60 described eligible studies. Of these 60, we classified the causal language used as 'consistently causal' (13%), 'suggests causal' (35%), 'inconsistent' (20%) and 'consistently not causal'(32%). The majority of the 'Inconsistent' papers (92%) were already inconsistent on submission. The inconsistencies found in both submitted and published versions was mainly due to mismatches between objectives and conclusions. One section might be carefully phrased in terms of association while the other presented causal language. When identifying only an association, some authors jumped to recommending acting on the findings as if motivated by the evidence presented. Conclusion: Further guidance is necessary for authors on what constitutes a causal statement and how to justify or discuss assumptions involved. Based on screening these abstracts, we provide a list of expressions beyond the obvious 'cause' word which may inspire a useful more comprehensive compendium on causal language.


Introduction
Many researchers remain tempted to draw causal conclusions from observational data despite acknowledging that mere association is not causation because causal inference is the ultimate goal of most clinical and public health research (1,2). Gold-standard answers are typically sought through randomized controlled trials (RCTs). The unique ability of RCTs to avoid confounding bias (3) has led to demands that empirical research must be drawn from randomized studies to justify causal statements (4)(5)(6). RCTs are mainly used to assess the effect of a treatment or intervention but are not easily adapted to evaluate prognostic or risk factors rather than interventions.
There are however good reasons to look beyond RCTs for evidence on treatment effects. In many settings, RCTs are not feasible, ethical or timely and thus observational data are all that is available for some time, as in the recent COVID-19 crisis. Furthermore, observational studies typically involve broader real-world contexts than RCTs, where the costs and risks of experimentation suggest studying high risk patients without major comorbidities (7). This selection challenges generalization to the target population. Trials further suffer from treatment non-compliance which complicates analysis, as treatment-specific populations lose the benefit of randomization. Recent ICH9 guidelines therefore emphasize the importance of causal estimands beyond intention-to-treat, such as per-protocol and astreated analysis (8,9).
Deliberately avoiding causal statements on a hoped-for causal answer brings ambiguity and contrived reporting (10, 11). Instead, authors should openly discuss the likely distance in meaning and magnitude between the data based measure they are able to estimate and the desired targeted causal effect. Arguments would consider study design with additional assumptions in context (12). Owing to decades of progress in statistical science (involving potential outcomes, directed acyclic graphs, propensity scores and more) (13), this allows for results, often unreachable by randomized trials, with a justified causal interpretation (14).
In 2010, Cofield et al (5) assessed the use of causal language in observational studies in nutrition but deemed causal language inappropriate for all observational studies. From a different angle, Haber et al (15) examined whether the tone and strength of causal claims made in a given paper matched the language describing the findings in social media. Not surprisingly they found stronger causal statements in the media in half of the cases, emphasizing the importance of clear scientific messages.
To promote this, Lederer et al (16) recently published a guide for authors and editors on how to report causal studies in Respiratory, Sleep and Critical Care Journals. Rather than circumventing the problem by asking to avoid causal language, they provide key elements that ensure valid causal claims (17). Besides briefly explaining causal inference, they provide a definition of a confounder, outline how to identify confounding through so-called directed acyclic graphs and discuss how p-values are often misinterpreted and how their value does not reflect the magnitude, direction or clinical importance of a given association. All these elements empower their target audience to critically assess observational studies.
To find out whether and how statements in study reports present confusing use of causal language (or lack thereof), we examined abstracts of research papers concerned with exposures and outcomes published in The BMJ in 2018. Our focus was on the causal message The BMJ readers receive from the abstracts. We evaluate the consistency of causal statements in the abstracts of observational studies published abstract and if any a priori changes had been made as a result of the peer review process.

Sampling and inclusion criteria
COP identified all original research articles published in The BMJ in 2018 described as either cohort or longitudinal studies in the study design of the abstract. The eligible studies were identified by statements in this section of the abstract such as "cohort", "longitudinal" or "registry-based". Those identified as "observational" were included if they suggested a period of follow-up rather than being cross-sectional. Articles described as case-cohorts were excluded as their interpretation and analysis differs from other studies with follow-up assessing the exposure-outcome relationship.

Assessment of published abstracts
Two reviewers (COP, LB) independently screened all the abstracts of the eligible papers. For the text included under each of the subheadings in the abstract (objective, design, setting, participants, outcome, results, conclusion), the reviewers assessed whether there was an (implicit) causal claim using a yes/no/unclear response. After assessing each separate subheading, each reviewer then gave an overall assessment of the main claims in the paper's abstract as either 'consistently causal', 'suggests causal', 'inconsistent' or 'consistently not causal'. After the independent assessments, the overall rating of the abstract was compared between both reviewers; where there was disagreement, a third reviewer (EG) was consulted and a consensus reached.

Assessment of submitted versions
As the focus of this paper is the avoidance of misleading and ambiguous messages, we further assessed those articles judged as 'Inconsistent' to see if there were changes introduced to the manuscript between submission and publication. For this subset, we obtained the submitted version of the manuscripts and the associated peer reviewers' comments from The BMJ's manuscript tracking system. We then compared the published version with the first submitted version to identify whether . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint the wording related to causal claims appeared in the submitted version of the abstract and whether changes occurred as a result of comments from peer reviewers and editors.
The same reviewers (COP, LB) independently evaluated the submitted versions of the papers. The reviewers assessed whether the content under each subheading of the submitted abstract differed from the published version. Where there were discrepancies between versions, each reviewer indicated the presence of a causal claim as yes/no/unclear for each abstract subheading (title, objective, design, setting, participants, outcome, results, conclusion) and made an overall assessment of the submitted abstract as either 'consistently causal', 'suggests causal', 'inconsistent' or 'consistently not causal'. As before, the assessments were compared and, in cases of disagreement, a third reviewer (EG) was consulted and consensus reached.

Assessment of full text
For the published papers classified as 'inconsistent', we further evaluated the full published text to identify the statistical method applied. We look for statements that would support a causal aim, including confounding adjustment, sources of bias and issues of generalizability.

Ethics and consent
This study used routinely collected data. When authors and reviewers submit manuscripts and reviews to The BMJ, they are notified that their paper or review may be entered into research projects for quality improvement purposes. COP was given access to The BMJ's data under a confidentiality agreement.

Patient and public involvement
Patients were not involved in the design, analysis or interpretation of the study. Patients were not participants in this study; it was a methodological study (research on research). Patients' opinions of causal statements and the use of ambiguous language in research papers is important and further work in this area partnered by patients is important.

Assessment of published abstracts
In 2018, 151 research papers were published in The BMJ, of which 60 (40%) were eligible for inclusion in our study ( Figure 1). We identified 29 studies (48%) using causal language ('consistently causal' and 'suggests causal'). A further twelve (20%) abstracts were considered inconsistent mainly because the objective stated evaluating an association while the conclusion presented a causal finding (9/12) or the opposite (3/12). Finally, there were abstracts that described studies that aim for prediction or reported associations without (implicitly) suggesting that they had a causal nature that were considered consistently not causal (n=19, 32%). Tables   Table 1 shows excerpts from the abstracts that were evaluated. Each row corresponds to statements from the same study. The first column indicates the assigned category, based on the type of association it describes. The last column explains why a given abstract was considered to belong to the assigned category. As the assessment pertains to causal claims in general, the words referring to the particular topic of the corresponding study were removed from the statements. The examples shown are not an exhaustive list, but were chosen to illustrate the different phrasing of statements belonging to the different categories. It is worth noting that the statements presented correspond to the objective and conclusion subheading of the abstract. When assessing the abstracts, we identified that these were the subheadings under which the information to classify the abstract was mainly found. Other subheadings like Design, Setting and Participants were not as relevant for this purpose.
To further illustrate how statements in these two sections can be misleading, we tabulated a few examples in a 2 by 2 table showing mismatches between what was reported in the objectives and conclusion resulting in the paper being categorised as either 'consistently (not) causal' or 'inconsistent' ( Table 2).

Assessment of submitted versions
After evaluating the first submitted version of the 12 abstracts classified as 'inconsistent', we classified 11/12 (92%) as also inconsistent on submission. There was only one study where the submitted version described a different type of association. In this case, the conclusion of both the submitted and published versions was rather conservative by stating that the intervention was "independently associated" with the outcome. The submission expressed a causal objective, stating the aim of evaluating the "impact" of a particular intervention with corresponding methods: providing adjusted estimated effects and including sensitivity analysis using propensity score matching. However, in the published version the term "impact" was replaced by "association" making the abstract less clear is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. . about a causal aim because both the objectives and the conclusion described an association but the authors still provided adjusted hazard ratios and resorted to propensity score matching.

Assessment of full text
Looking at the methods used in the papers classified as 'inconsistent', we found that 11 of the 12 provided adjusted estimates. Most of the studies (8/12, 67%) used outcome regression models, mainly Cox proportional hazard models, or (propensity score) matching (3/12, 25%). Table 3 presents statements found in both the published abstract and full text of each of these papers regarding the method used and considerations suggesting a causal aim.

Statement of principal findings
We found that the majority of the published research paper abstracts of observational studies had a consistent use of causal language. Still 20% of them contained inconsistent messages on the causal nature of the key "effect". Inconsistencies showed up in two directions: an intentional quest for causality ending in uncriticized non-causal conclusions or carefully phrased mere associations ending with recommendations to act and intervene based on the exposure outcome association.
Beyond the wording in the abstract, readers can learn much about the sought, after interpretation from described statistical methods, and assumptions made explicit in the paper. On a case by case basis, one could then assess whether additional assumptions, e.g. involving 'no-unmeasured confounders', would justify the causal assessment derived from these approaches. Identifying key elements like the ones presented in Table 3 would help to assess if causal inference is possible. If in doubt, a sensitivity analysis may be in order. It seems better to be transparent about the ultimate aim to draw a causal conclusion and to acknowledge to fall short of that, than to generate confusion.

Comparison with other studies
This is not the first study to evaluate the use of causal language in the medical literature. Cofield et al (5) assessed the use of causal language in observational studies in nutrition. However, they reduced the problem to assessing whether authors included causal language or not, as it was deemed inappropriate due to the observational nature of the study. We have made the case that merely avoiding explicit causal terms is not a real solution. Even without them, a causal conclusion is implicit when the take home message encourages interventions based on the presented findings. Avoiding inconsistency is important but equally one should be able to trust that the use of consistent causal . CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. . language is not in vain. This requires a more in depth look at methods and assumptions validating the casual claims.

Strengths and weaknesses of the study
Accurate abstracts are important. In just a few brief paragraphs, the author summarizes key elements of design, methods and results, and comes to a conclusion. Many readers only read the abstract. However, a powerful abstract opens the door to readers and sets the scene for any study. It serves the different roles of informing the audience about its main findings while motivating the reader to further explore the full text, all within the constraints of brevity. This demands from authors special attention to ensure that every word in the abstract is required. All of the above makes the assessment of the abstract relevant but also challenging.
Further research is needed to explore how causal claims presented in the abstract are supported by the full article, which entails assessing the methods used and evaluating whether the underlying assumptions were met (18). The ultimate conclusion should not simply label a study as black or white in causal terms. In the present study we used a convenient limited number of classifications for short statements. In practice a continuous degree of confidence in a potential causal relationship is likely to emerge based on observed association.
We are aware that by limiting our assessment to the abstract, we may have missed the discussion of the extent to which the underlying assumptions that enable causal inference were met. Indeed, when there was a clear causal aim but the authors considered that these assumptions were not fulfilled, they may have decided that a causal claim was inappropriate and phrased their conclusion in terms of association rather than causation. If this is the case, the apparent inconsistency found in the abstract would no longer hold. On the contrary, any undue causal claims can be viewed as a form of spin (19,20).

Conclusions and policy implications
As observational data resources abound, methods for causal inference from observational data have surged in tandem with the call for real world evidence. The new opportunities bring new challenges and the responsibility for clear and well supported statements on the evidence. In this spirit and motivated by novel guidelines as proposed by ICH9 and FDA, Miguel Hernan and collaborators have embarked on a project entitled "Developing Guidelines for the Analysis of Randomized Controlled Trials in Real-World Settings" (21). The importance of such initiatives, supports a shift towards being explicit and discussing assumptions underlying causal methods that allow for causal interpretations in context, with or without an RCT (13). In the meantime, uncritical ambiguous phrasing in observational studies remains prevalent (14). Those searching for the best possible evidence supporting future treatment decisions, are best served by transparent reports on observational studies. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09. 17.20194530 doi: medRxiv preprint Faced with uncertainty when concluding on the nature of the observed exposure outcome relationship, a justifiable balance between the type I and II error rate is a natural guide for action. The cost of errors must be weighed in context, for instance as in clinical trials emphasizing control of the type I error to avoid introducing new unhelpful drugs at a potentially large cost. Alternative weights are typical in screening programs where false positives will be caught in follow-up examinations, but false negatives are lost forever. In a crisis, such as the current COVID-19 pandemic, we must act before long term randomised trials have materialised. It becomes undeniably important to learn as much as we can from observational data, be aware of the types of risk when acting or not, as displayed in the schematic Table 4.
A prerequisite for good causal language practice includes awareness of which language implies a causal statement and which does not. To support correct phrasing and raise awareness, we have compiled a short list of words and expressions with dedicated (non) causal meaning (Box). The list draws on phrases found in our study and in the references cited, particularly Hernan et al (10) and Thapa et al (6). We consider that a definition of causal language that is generally recognized by the research community is needed (22,23).
Words like "effect", "impact", "determinant of"…, inevitably point in the causal direction and their use should come with the requirement of at least stating and ideally critically evaluating the necessary assumptions (6). Uncertainty on the causal nature of the conclusion should tone down any suggestion for intervening on the studied exposure. Specifying the corresponding level of evidence rather than hiding the ultimate causal aim of a study is what we recommend (19), while acknowledging a margin of error in any empirical study (20).
In summary, we have found that causal messages are embedded in studies otherwise carefully phrased in terms of association. Further guidance for authors appears needed on what constitutes a causal statement, similar to the one published by Lederer et al (13) for Respiratory, Sleep and Critical Care Journals. We look forward to similar guidance for other disease groups. From the screened BMJ abstracts, we provided a list of expressions with clear interpretation which may inspire a useful more comprehensive compendium. We argue that such awareness and special attention amongst authors and reviewers would serve our communication on the best available evidence for conceived interventions.
. CC-BY 4.0 International license It is made available under a perpetuity.
is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020.

Patient and public involvement
Patients were not involved in the design, analysis or interpretation of the study. Patients were not participants in this study; it was a methodological study (research on research). Patients' opinions of causal statements and the use of ambiguous language in research papers is important and further work in this area partnered by patients is important.

Ethics Approval
This study used routinely collected data. When authors and reviewers submit manuscripts and reviews to The BMJ, they are notified that their paper or review may be entered into research projects for quality improvement purposes.

Transparency declaration
COP confirms that this manuscript is an honest, accurate, and transparent account of the study being reported and that no important aspects of the study have been omitted.

Data sharing statement
The reviews and published versions of the papers included in the study are publicly available at BMJ.com. No further data will be made available as it is confidential submission data. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020.  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint The copyright holder for this this version posted September 23, 2020. . https://doi.org/10.1101/2020.09.17.20194530 doi: medRxiv preprint Tables   Table 1.  Phrasing the objective and conclusion as if just to assess an association but then suggesting to take action given the findings "…Targeting ... prevention strategies among these patients should be considered." "Systematically addressing ... may be an important public health strategy to reduce the incidence of" "...present findings encourage the downward revision of such guidelines ..." Table 2. Examples of (mis)matching causal and non-casual statements found respectively in the objectives and conclusions of abstracts of observational studies published in The BMJ in 2018. "...association with..." and "Systematically addressing ... may be an important public health strategy to reduce the incidence of" "To develop and validate a set of practical prediction tools that reliably estimate the outcome of..." and "...prediction models reliably estimate the outcome..."  is the author/funder, who has granted medRxiv a license to display the preprint in (which was not certified by peer review) preprint