Abstract
We aimed to explore, in a sample of systematic reviews with meta-analyses of the association between food/diet and health-related outcomes: (i) whether systematic reviewers selectively included study effect estimates in meta-analyses when multiple effect estimates were available, and, (ii) what impact selective inclusion of study effect estimates may have on meta-analytic effects. We randomly selected systematic reviews of food/diet and health-related outcomes published between January 2018 and June 2019. We selected the first presented meta-analysis in each review (index meta-analysis), and extracted from study reports all study effect estimates that were eligible for inclusion in the meta-analysis. We calculated the Potential Bias Index (PBI) to quantify and test for evidence of selective inclusion. The PBI ranges from 0 to 1; values above or below 0.5 suggest selective inclusion of effect estimates more or less favourable to the intervention, respectively. We investigated the impact of any potential selective inclusion by comparing the index meta-analytic estimate to the median of a randomly constructed distribution of meta-analytic estimates. Thirty-nine systematic reviews with 312 studies were included. The estimated PBI was 0.49 (95% CI 0.42 to 0.55), suggesting the selection of study effect estimates was consistent with a process of random selection. In addition, the impact of any potential selective inclusion on the meta-analytic effects was negligible. Despite this, we recommend that systematic reviewers report the methods used to select effect estimates to include in meta-analyses, which can help readers understand the risk of selective inclusion bias in the systematic reviews.
1. BACKGROUND
Systematic reviews (SRs) synthesise the findings of studies that address a specific research question. By attempting to include all available evidence meeting pre-specified eligibility criteria, SRs can provide more valid evidence for healthcare decision making than non-systematic literature reviews or ad hoc findings of single studies1. It is now commonplace for SRs to underpin clinical practice guidelines and policy decisions2. Furthermore, SRs play a critical role in identifying research gaps and directing future research. However, the validity of the findings from SRs is, in part, dependent on the use of methods that minimise bias in the SR process.
Systematic reviewers commonly encounter multiple effect estimates from a primary study that are eligible for inclusion in a particular meta-analysis (which we refer to as ‘multiplicity of results within studies’)3-5. For example, in a particular primary study, mean differences between intervention groups for three different quality of life questionnaires measured at 1-, 2- and 3-months follow-up may be reported, all of which are eligible for a planned meta-analysis of quality of life at 0-3 months follow-up. Naïve inclusion of all six estimates in the meta-analysis introduces statistical dependency in the meta-analytic dataset, which can lead to misleading results6.
Two general approaches for dealing with multiplicity of study results in a meta-analysis include the integrative and reductionist approach6. Integrative approaches aim to include multiple effect estimates per study in the meta-analysis, using statistical methods such as multilevel modelling7 and robust variance estimation8. These approaches are not the focus of this paper. Reductionist approaches, which are the focus of this paper, involve reducing the data so that only one effect estimate per study is included in a particular meta-analysis. Best practice for reductionist approaches is to pre-specify “eligibility criteria”, indicating which results are eligible for inclusion in the meta-analysis, along with “decision rules”, that specify the methods used to select the result for inclusion, when multiple are available5,9. These rules articulate a preference for one estimate over another, and typically this will be based on perceived credibility of the estimate (e.g. selecting based on the most valid and reliable scales), or for clinical (e.g. selecting clinically relevant measures) or methodological (e.g. selecting the most consistently used measure) reasons. In practice, however, such rules are not always specified in advance9, leading to the potential for selection of study effect estimates based on the observed result itself (e.g. selecting the largest estimate, or the one with the smallest P-value). This is known as ‘selective inclusion of results’4,9, and has to potential to bias meta-analysis results.
To our knowledge, only one study has investigated bias due to selective inclusion of results in meta-analyses10. This study focused on SRs (published between 2010 and 2012) of randomised trials examining the effects of interventions for arthritis or depressive or anxiety disorders. The authors constrained their investigation to the first presented meta-analysis of a continuous outcome in each review. They concluded that in this random sample of SRs, there was no clear evidence of selective inclusion of effect estimates, and that any potential selective inclusion did not have a meaningful impact on the meta-analytic effects. However, it is unclear whether these findings apply to other clinical conditions, outcome types (e.g. binary), study designs, and more recently published reviews.
Systematic reviews of nutrition research provide an opportunity to extend the previous investigation of bias due to selective inclusion, because these reviews often include study designs beyond randomised trials, different outcome types, and include results from multiple analyses that attempt to adjust for potential confounding variables (in the case of non-randomised studies). Furthermore, given the importance of SRs of nutrition research in informing recommendations in dietary guidelines and public health policy, assessment of whether there is evidence of selective inclusion bias is required. Therefore, the objectives of this study were to explore, in a sample of SRs with meta-analyses of the association between food/diet and health-related outcomes: (i) whether systematic reviewers selectively included study effect estimates in meta-analyses when multiple effect estimates were available, and; (ii) what impact selective inclusion of study effect estimates may have on meta-analytic effects.
2. METHODS
The study protocol for the ROBUST study has been published11. The ROBUST study, in addition to objectives (i) and (ii) above, aims to investigate the risk of bias due to missing results (objective (iii) in the study protocol). The results of this latter investigation will be reported in a subsequent manuscript. An investigation of the extent of multiplicity of results in the primary studies of the sample of SRs included in the ROBUST study, has been published5. Here, we provide an overview of the methods relevant for our investigation of bias due to selective inclusion of results, with deviations from the planned methods presented in Supplementary Table S1.
2.1. Eligibility criteria, search and selection of SRs
We included reviews that satisfied the definition of an SR, according to the 2019 edition of the Cochrane Handbook for Systematic Reviews of Interventions12, that had explicitly stated methods of study identification (e.g. a search strategy) and of study selection (e.g. eligibility criteria and selection process), and included a meta-analysis. We included such SRs with a meta-analysis that:
included studies that enrolled, regardless of their age and background, (a) people who were generally healthy, or (b) a mixture of generally healthy people and people with diet-related risk factors (e.g. overweight, high blood pressure) or people with a particular health condition (e.g. type II diabetes or cardiovascular disease), or (c) people with non-specified health status;
included randomised trials or non-randomised studies that assessed the effects of at least one type of food (e.g. nuts, red meat) or at least one dietary pattern (e.g. vegan) on any continuous (e.g. weight) or non-continuous (e.g. cardiovascular disease incidence) health-related outcome;
were published in English between 1 January 2018 and 30 June 2019 (i.e. within 18 months prior to the drafting of our study protocol);
provided citations for all included studies in the SR, and;
presented the summary statistics or an effect estimate and its precision (e.g. standard error or 95% confidence interval) for each study included in a meta-analysis, and the meta-analytic summary effect estimate and its precision in the text or forest plot.
The exclusion criteria can be found in our published protocol11.
We searched for eligible SRs indexed in the PubMed and Epistemonikos databases from 1 January 2018 to 30 June 2019 (search strategies reported in supplementary Table S2), exported unique records to a Microsoft Excel and randomly sorted them. Following piloting of the eligibility criteria, two investigators (MJP and one of CMK, ZD or SM) independently screened titles and abstracts (in batches of 500 records) and their potentially eligible full-text reports. This screening process was repeated until we reached the target sample of 50 SRs, including 25 meta-analyses of continuous and 25 meta-analyses of non-continuous outcomes.
If the total number of eligible SRs exceeded this target at the end of a batch, we planned to sample 25 SRs of each type randomly. Our target of 50 SRs was primarily selected for feasibility reasons given our available resources to conduct all components of the ROBUST study, which was informed by the time taken to conduct a previous similar study10. Any discrepancies in screening decisions at each stage were resolved via discussion between investigators or by consultation with another investigator (JM) where necessary.
One investigator (MJP) selected from each included SR one pairwise meta-analysis of aggregate data for inclusion in the present study. The selected meta-analysis was the first meta-analytic result mentioned in the review (regardless of its placement in the manuscript). Henceforth, we refer to the selected meta-analysis as the “index meta-analysis”. Initially, the index meta-analysis was selected irrespective of the outcome domain (e.g. quality of life, prostate cancer), effect measure (e.g. risk ratio, mean difference), meta-analytic model (fixed-effect, random-effects) or the number and type of included studies (i.e. randomised or non-randomised study). However, following the selection of 50 index meta-analyses, we determined that they included 553 studies (range 2-55 studies per meta-analysis), which is more than twice what we had anticipated based on our previous study10. For feasibility reasons, we therefore chose to restrict inclusion to only those SRs with an index meta-analysis including fewer than 20 studies. For each included index meta-analysis, we retrieved the reports of all included studies (see the protocol for further details11).
2.2 Data collection and management
A data collection form was developed in REDCap13 (see supplementary Table S3). Following piloting, two investigators (RK and one of ZD, SM, CMK, EK, LB and MJP) independently collected data from a random sample of half of the index meta-analyses and their included studies. For the remaining index meta-analyses and their included studies, data were collected by one investigator (RK) and verified for accuracy by another investigator (MJP). Any discrepancies were resolved through discussions between the two investigators or through adjudication by a third investigator (JM) if necessary.
An overview of the data items and the sources these were obtained from (i.e. systematic review protocol, systematic review or study report) is presented in Supplementary Table S4; further details are available in the protocol11. For data extracted from the reports of studies included in the index meta-analysis, we extracted all outcome data eligible for inclusion in the index meta-analysis. This was determined by the eligibility criteria and decision rules stated in the SR protocol when available. For example, if the systematic reviewers pre-defined in the SR protocol that for the meta-analysis of weight gain, they would only compare the “highest versus lowest intake of fish”, and would consider only data at 8 weeks follow-up when data were available at multiple time points, we only extracted data for that comparison and time point, regardless of whether data for other time points and other comparisons for the same outcome were available in the study reports. If there was no SR protocol available, or if there were no eligibility criteria or decision rules to select results specified in the SR protocol, we extracted all study outcome data based on the comparison and outcome specified in the SR. For example, if the meta-analysis in the SR was described as “fish intake versus none on weight at 6 months”, we extracted all data on weight at 6 months for comparisons of any level of intake of fish versus none, results that were unadjusted and adjusted (for potential confounding variables), and results from all analysis samples (e.g. per-protocol, intention-to-treat). We ignored any eligibility criteria or decision rules to select results that appeared in the SR, as we could not determine whether they were pre-specified. Additional rules for determining which data to collect are outlined in the protocol11.
2.3 Data analysis
2.3.1 Descriptive analysis
We summarised the characteristics of SR protocols, SRs, index meta-analyses and included studies using descriptive statistics. For categorical variables, we present frequencies and percentages. For continuous variables, we report medians with interquartile ranges (IQRs).
2.3.2. Quantification and testing for evidence of selective inclusion of results
We used the ‘Potential Bias Index’ (PBI) to quantify and test for evidence of selective inclusion14. In brief, this index is based on ordering the effect estimates in each primary study according to their magnitude and direction of effect, and then determining the position within that order where the effect estimate included in the index meta-analysis sits. The PBI is then calculated as the weighted average rank position of the selected effect estimates across all studies, with the weights being the number of effect estimates available per study. This weighting system, therefore, attributes greater priority to the rank positions of effect estimates where there are a larger number of effect estimates to choose from.
The PBI ranges from 0 to 1. The PBI has the value 1 when the effect estimate that is most favourable to the experimental intervention/exposure is always selected for inclusion from each study. By “most favourable” we mean the effect estimate that suggested the most benefit or least harm of the intervention/exposure. Conversely, the PBI would have a value of 0 when the effect estimate that is least favourable to the experimental intervention/exposure is always selected for inclusion from each study.
For meta-analyses comparing different levels or patterns of intake of the same food (e.g. red meat consumed 5 days per week versus red meat consumed once a week), we determined from the text of the review whether the systematic reviewers hypothesised that the higher or lower category would have the most benefit or least harm, and ranked study effect estimates based on their favourability to the category of consumption considered to be most beneficial/least harmful. For meta-analyses comparing different foods/diets (e.g. vegan versus vegetarian diet), we determined from the text of the review which intervention/exposure the systematic reviewers were most interested in evaluating (which we considered the experimental intervention/exposure), and ranked the study effect estimates based on their favourability to the experimental intervention/exposure. Given there was more uncertainty in determining the experimental intervention/exposure in reviews comparing different food/diets, we performed an a-priori specified sensitivity analysis where we excluded meta-analyses comparing different foods/diets to examine the impact on the PBI. One investigator (RK) assigned a ranking to each effect estimate extracted from each study included in the index meta-analyses, and these rankings were verified by another investigator (MJP).
Several methods for selecting effect estimates are acceptable in terms of not introducing bias, including (i) randomly selecting effect estimates, (ii) selecting effect estimates based on some clinical or methodological rationale or (iii) selecting the median effect estimate. If systematic reviewers employed selection methods (ii) and (iii) across the studies, we expected that the distribution of the selected effect estimates would be consistent with what we would observe under purely random selection, that is, akin to selection method (i), so on average, the selected effect estimates would be at the middle-rank position, and the PBI would take the value of 0.5. A PBI of 0.5, therefore, suggests that there is no selective inclusion of the most or least favourable effect estimates. We ran a statistical test based on the PBI that has been constructed to test whether the observed selection of effect estimates was consistent with randomness of selection14. Confidence intervals (95%) for the PBI were obtained by bootstrap resampling15. Index meta-analyses that included no studies with multiplicity of effect estimates were excluded from all PBI analyses, given there is no potential for selective inclusion of results in such meta-analyses.
We conducted subgroup analyses to explore whether the availability of an SR protocol or register entry, SRs funded by food industry, outcome types for the index meta-analysis, or SRs having at least one author disclosing a financial conflict of interest of any type, modified the PBI. The confidence intervals and p value for the difference in PBI between subgroups were constructed using bootstrap methods15. A similar approach was used to examine whether the PBI was modified by the number of available effect estimates, where a regression of PBI on the number of available effect estimates was fitted (for details, see the study protocol11).
We undertook a series of sensitivity analyses to investigate whether the PBI was robust to certain assumptions. For SRs without protocols or register entries, our primary calculation of the PBI was based on the set of study effect estimates that were compatible with the assumption of ‘no pre-specified eligibility criteria or decision rules’. However, we also performed a sensitivity analysis where study effect estimates that were compatible only with the eligibility criteria and decision rules in the methods sections of the SR were included, to examine if this restriction affected the PBI.
We anticipated that in some study reports, only an effect estimate and its standard error or 95% confidence interval would be presented (that is, the number of events or means and standard deviations per group would not be available). In this circumstance, algebraic manipulation was required to include the result in a meta-analysis. Algebraic manipulation may be considered challenging by some systematic reviewers, so effect estimates requiring algebraic manipulation may not have been considered by reviewers in the set of effect estimates to potentially include in the meta-analysis. For the primary calculation of the PBI, we excluded study effect estimates that required algebraic manipulation; however, we performed a sensitivity analysis to explore whether the PBI was modified when we included these study effect estimates.
We conducted a fixed-effect meta-analysis of the PBI obtained in the current study with that estimated in the previous study by Page et al.10. We synthesised estimates of the PBI using a fixed-effect meta-analysis model because the number of included studies (n=2) was too small to adequately estimate the between-study variance.
2.3.3. Investigation of the impact of any selective inclusion of study effect estimates on meta-analysis effects
We investigated the impact of any potential selective inclusion of study effect estimates on the magnitude of the resulting meta-analytic effect estimates. For continuous outcomes, we expressed all study effect estimates as standardised mean differences (SMDs). For non-continuous outcomes, we expressed all study effect estimates as odds ratios (ORs). When study authors reported risk ratios or hazard ratios for non-continuous outcomes, we extracted these estimates and made the assumption that these would provide a reasonable approximation to an odds ratio16, given the outcomes were rare in the included reviews. We standardised the direction of effects so that ORs < 1 or SMDs < 0 represented effects that are more favourable to the experimental intervention/exposure.
For each of the meta-analyses of continuous outcomes, we calculated all possible meta-analytic SMDs from all combinations of available study effect estimates. All possible meta-analytic SMDs were generated using a random-effects meta-analysis model, with the between-study variability estimated using the restricted maximum likelihood estimator. When the number of possible combinations was prohibitively large to calculate all combinations (i.e. >30,000), we generated a random sampling distribution of 5,000 meta-analytic SMDs.
This was achieved by randomly selecting (with equal probability) an effect estimate for inclusion from each study comparison within a meta-analysis, meta-analysing the chosen effects to yield one meta-analytic result, and repeating this process 5,000 times. For each distribution of generated meta-analyses, we calculated (i) the median of all possible meta-analytic SMDs, which represents the median of a distribution where study effect estimates were not selectively included, and (ii) the difference between the index meta-analytic SMD and the median meta-analytic SMD.
We used non-parametric statistics to describe the differences between the index and median meta-analytic SMDs. We also synthesised these differences using a random-effects meta-analysis model, with the meta-analytic weights based on the variance of the index meta-analytic SMD estimate and the between-study variability estimated using the restricted maximum likelihood estimator. The Hartung-Knapp-Sidik-Jonkman confidence interval method was used to calculate uncertainty in the combined differences17,18, and we quantified statistical inconsistency using the I2 statistic19. When the difference between the index and median meta-analytic SMD was minimal, we concluded that any potential selective inclusion had a limited impact on the meta-analytic effect (a worked example is provided in Page et al.14). We repeated the above analyses for each of the meta-analyses of non-continuous outcomes (i.e. binary, count, time-to-event) by calculating meta-analytic ORs. Finally, we undertook an analysis that included all meta-analyses (i.e. including meta-analyses of all outcome types) by first converting log ORs to SMDs by dividing the log ORs by π/√3 (= 1.814)20. For this analysis, we also calculated all possible meta-analytic estimates using a fixed-effect model as a sensitivity analysis to examine whether the meta-analysed difference and its confidence interval were affected by the meta-analysis model. We also performed a sensitivity analysis where the meta-analyses of risk ratios and hazard ratios were excluded because of the assumption that such estimates provide a good approximation to the OR. All analyses were undertaken using Stata version 1621.
3. RESULTS
Our search yielded a total of 7,167 references from PubMed and Epistemonikos. After removing 908 duplicates, the remaining 6259 references were randomly sorted, and we screened the titles and abstracts of the first 3,013 references. Of these, 2,777 were excluded, leaving 236 for full-text screening. Of these, 99 SRs met our eligibility criteria, including 25 SRs with a meta-analysis of a continuous outcome and 74 with a meta-analysis of a non-continuous outcome. Initially, all SRs with a continuous outcome were included, and 25 of the 74 SRs with a non-continuous outcome were randomly selected. However, from these 50 SRs, eight were excluded (6 continuous, 2 non-continuous), because the index meta-analysis had 20 or more studies, leaving 42 included SRs22-41,42-46,47-51,52,53,54-63. After extracting data from the included studies of these 42 SRs, we excluded a further three SRs because we were unable to identify the relevant outcome data in any of the included primary study reports61, or none of the included studies had multiplicity of effect estimates62,63. This left us with 39 SRs included in the analysis of selective inclusion (Figure 1).
3.1 Characteristics of included meta-analyses
Of the 39 SRs, two had published protocols, and 11 were registered in PROSPERO (Table 1). Most index meta-analyses included only non-randomised studies (64%, 25/39), 31% (12/39) included only randomised trials, and the remaining two (5%) included studies of both designs (Table 1). Only two reviews (5%) were funded by food industry and authors of six (15%) reviews had a financial conflict of interest. The types of included study designs differed by outcome type. In the 17 meta-analyses of a continuous outcome, 12 included randomised trials only, three included non-randomised studies only, and two included both designs. Of the 22 meta-analyses of a non-continuous outcome, all included non-randomised studies only. Nearly all index meta-analyses were fitted using a random-effects model (92%, 36/39). The 39 index meta-analyses included a total of 312 studies with 386 comparisons (since some studies contributed multiple comparisons to the same meta-analysis when, for example, multiple foods or diets were compared), with a median of seven studies (IQR 5-11; range 2-17) per meta-analysis.
3.2 Multiplicity of effect estimates in study reports and methods to select effect estimates
Of the 386 comparisons, 224 (58%) had multiple effect estimates that were eligible for inclusion in a particular meta-analysis. In these 224 comparisons, there was a median of 2 (IQR 2-4; range 2-21) eligible effect estimates per comparison. Descriptive statistics at the meta-analysis level reflected those at the study level; per meta-analysis, a median of 63% of comparisons had multiplicity (IQR 50-83%; range 6-100%). The most common types of multiplicity arose from the availability of unadjusted and one or more covariate-adjusted analyses (39% of studies) and multiple intervention/exposure groups (22% of studies) (Table 2).
The types of methods to select effect estimates to include in meta-analyses that were documented in the SR protocols and SRs varied considerably (Table 3 and Supplementary Table S5). For example, in 1/13 protocols a decision rule for any type of analysis was provided, while such a rule appeared in 14/39 in reviews. In all cases where a decision rule was reported, the rule articulated selection of a specific eligible effect estimate (Table 3). In no cases did review authors articulate non-specific selection of an effect estimate (e.g. through random selection, or calculating an average of the eligible effect estimates).
3.3 Evidence of selective inclusion of study effect estimates
The estimated PBI was 0.49 (95% CI 0.42 to 0.55; two tailed p-value of 0.64) (Table 4). This suggests that the selection of study effect estimates was consistent with a process of random selection. The PBI for SRs with a protocol/register entry was 0.16 units higher (95% CI 0.02 to 0.29) than the PBI for SRs without a protocol/register entry, suggesting that selective inclusion of the most favourable effect estimates was more likely to occur in SRs with a protocol/register entry. The PBI was not modified by the outcome type of the index meta-analysis, funding of the SR or financial conflicts of interest of SR authors (Table 2), and was robust to the series of sensitivity analyses (Supplementary Table S6).
The linear regression exploring the relationship between the number of available effects estimates and the PBI suggested that for every unit increase in the number of effect estimates, the PBI was predicted to reduce by -0.014 (95% CI -0.018 to -0.010; see Supplementary Figure S1). For example, the predicted PBI was 0.50 (95% CI 0.42 to 0.59) when there were two available effect estimates, and 0.41 (95% CI 0.25 to 0.57) when there were four effect estimates.
Meta-analysis of the PBI estimate of the previous study by Page et al.10 and our study resulted in a combined PBI of 0.53 (95% CI 0.48 to 0.58) (Supplementary Figure S2).
3.4 Impact of potential selective inclusion of study effect estimates on meta-analytic SMDs
The median number of possible meta-analytic SMDs arising from meta-analysing all combinations of study effect estimates was 288 (across the 39 meta-analyses); however, there was wide variation in this number (IQR 12-2880, range 2–20155392; Table 5). For most meta-analyses, the range of possible meta-analytic SMDs which could be calculated (each fitted using a random-effects model) was narrow (Table 5, column 5). The median difference between the largest and smallest possible meta-analytic SMD was 0.09 SD units (IQR 0.04 to 0.15; range 0.01 to 1.37; Table 5 and Figure 2).
The median of the differences between the index meta-analytic SMD and the median of all its possible meta-analytic SMDs (i.e. the SMD expected when there is no selective inclusion) was -0.003 SD units (IQR -0.01 to 0.02; range -0.59 to 0.07). Meta-analysing these differences using a random-effects model yielded a pooled difference of 0.002 SD units (95% CI -0.01 to 0.01; I2=0%; Figure 3). Recalculating all possible meta-analytic SMDs using a fixed-effect model yielded nearly identical results (see Supplementary Figure S3; for results of all differences converted to ratios of odds ratios and sensitivity analyses, see Supplementary Figures S4-S9).
4. DISCUSSION
There was no evidence of selective inclusion of study effect estimates by systematic reviewers in our sample of meta-analyses of food/diet-outcome associations. The PBI was not modified by outcome type for the index meta-analysis (continuous or non-continuous), funding source for the SR, or conflict of interests of systematic reviewers, but did differ depending on whether the review authors worked from a SR protocol/register entry. In addition, the PBI was robust to several assumptions in sensitivity analyses. The impact of any potential selective inclusion on the meta-analytic effects was negligible.
4.1 Comparison with other studies
Our study results are largely similar to the only other study investigating bias due to selective inclusion of results10. In both studies, there was no clear evidence of selective inclusion of study estimates in the sample of meta-analyses evaluated, and importantly, the impact of any potential selective inclusion on the meta-analytic estimates was negligible in both studies. The latter finding is explained by the fact that the range of possible meta-analytic SMDs calculated from the different combinations of study effect estimates was narrow; in the current study, the median difference (IQR) between the largest and smallest possible meta-analytic SMD was 0.09 SD units (IQR 0.04, 0.15), and in Page et al.10 it was similar (0.11 SD units; IQR 0.03, 0.19). Therefore, even in the event that reviewers did selectively include results, the potential for this to importantly impact the meta-analysis effect estimate was limited.
However, larger differences between the largest and smallest possible meta-analytic SMDs have been observed elsewhere. In a study evaluating multiplicity of effect estimates in study reports and its impact on meta-analysis results, Mayo-Wilson et al. observed a median difference between the largest and smallest possible meta-analytic SMD of 0.34 SD units64. In this circumstance, selective inclusion of results could impact meta-analysis effects; however, the authors did not explore selective inclusion of results in that study.
4.2 Explanations for study findings
There are several possible reasons why no evidence of selective inclusion of results was observed in this study. Despite infrequently pre-specifying or reporting methods for selecting results for inclusion in meta-analyses, the majority of systematic reviewers might have been following undisclosed decision rules that were based on some clinical or methodological rationale (i.e. not based on the statistical significance, magnitude or direction of effects). For example, systematic reviewers confronted with multiple effect estimates for the association between cheese intake and risk of prostate cancer, each adjusted for different sets of covariates, might have considered it reasonable to always select the most adjusted estimate available, without declaring in the Methods section that they followed this decision rule. Recommendations to report such selection methods only became available in sources such as the Cochrane Handbook12 and PRISMA 202065 after all the SRs in this sample were published, which might explain the infrequent reporting of such methods.
Our observation that the PBI was more indicative of selective inclusion of results in SRs with a protocol/register entry than in SRs without a protocol/register entry was unanticipated. We assumed that review authors who publicly record their pre-specified methods may be more likely to specify methods for selecting results for inclusion in meta-analyses, and aware of the bias that can be introduced if this selection is made based on the nature of the result itself. However, of the 13 protocols/register entries, 11 were PROSPERO entries, and PROSPERO does not currently request authors pre-specify methods for selecting results for inclusion in meta-analyses. Therefore, simply registering a SR might not protect against selective inclusion at this stage. Furthermore, while the median number of eligible effect estimates was the same for studies included in SRs with a protocol/register entry and SRs without (median 2), the range of eligible effect estimates was narrower for the former (range 2-9 versus 2-21), which might have made it easier for SRs with a protocol/register entry to select the largest effect.
The negligible impact of potential selective inclusion on the meta-analytic effects may have been influenced by several factors. These include the percentage of studies with multiple effect estimates per meta-analysis, the extent to which the multiple effect estimates in a study varied in magnitude and direction, the weights that study effect estimates received in the meta-analysis or any combination of the above10. For example, if one adjusted estimate yields an OR of 1.87 and another yields an OR of 1.59, but the precision of the selected estimate was low and hence the study contributed little weight to the meta-analysis, selective inclusion of the most favourable OR is unlikely to affect the meta-analysis in an important way.
4.3 Strengths and weaknesses of the study
Our study has several strengths. We pre-specified our methods in a published protocol11, and have also specified any deviations from our protocol in Supplementary Table S1. We conducted a comprehensive search for SRs, and all records were screened by two authors independently to minimise errors in SR selection. Ranking of effect estimates for the PBI calculation was conducted by one author and verified by another. There are also some limitations to our study. Data collection was conducted by two authors independently for only half of the SRs and their included studies. However, for the other half in which only one author collected data, we expect the number of data collection errors to be low because the collected data were verified for accuracy by another author. Our sample size calculation was based on estimating the extent of multiplicity in studies rather than the magnitude of selective inclusion bias. Our findings may be influenced by the set of study effect estimates that we considered eligible for inclusion in the meta-analyses, which might differ to what the systematic reviewers considered eligible. We used the methods reported in the SR protocols to select effect estimates; however, other selection methods may have been assumed but not documented at the protocol stage. Finally, we only extracted data from the studies cited by the systematic reviewers. If other reports of studies (e.g. regulatory reports, conference abstracts) had been consulted but not cited by the systematic reviewers, our estimate of the true extent of multiplicity of results and the PBI might differ.
4.4 Future research
To date, only two studies (Page et al.10 and the current study) have investigated whether selective inclusion of study effect estimates occurs in meta-analyses in three areas (musculoskeletal conditions, mental health, nutrition), hence extending this type of investigation to other specialities will add to the evidence-base on the likelihood and impact of selective inclusion bias. Moreover, SRs that are funded by industry commonly report results favourable to the sponsor66, yet no study to date has examined the issue of selective inclusion in such SRs. Furthermore, it is worthwhile to explore bias due to selective inclusion of results in meta-analyses of harm outcomes, for which multiplicity of results is also a challenge67,68. In addition, selective inclusion might be more likely to occur in reviews that have broad outcome domains, such as ‘patient health’, since within any study, there could be a large number of outcomes and effects that could be selected, with likely a wider range of effect estimates.
5. CONCLUSION
There was no evidence that systematic reviewers selectively included study effect estimates in this sample of meta-analyses of the association between food/diet and health-related outcomes. The impact of any potential selective inclusion on the meta-analytic effects was negligible. Despite this, we encourage systematic reviewers to report whether they encountered multiplicity of results in the included studies, the methods they used to select effect estimates when multiple estimates were eligible for inclusion in a particular meta-analysis, and whether these selection methods were pre-specified. Doing so should help readers understand the risk of selective inclusion bias in the systematic review.
Data Availability
Data and analytic code are available on the Open Science Framework (https://osf.io/umk62/).
DECLARATIONS
Funding
This project was funded by an Australian National Health and Medical Research Council (NHMRC) project grant (APP1139997). RK is supported by a Monash Graduate Scholarship and a Monash International Tuition Scholarship. MJP is supported by an Australian Research Council Discovery Early Career Researcher Award (DE200101618). JEM is supported by an Australian NHMRC Investigator Grant (GNT2009612). SM is supported by the Country Women’s Association (NSW) and Edna Winifred Blackman Postgraduate Research Scholarship. The funders had no role in the study design, data collection and analysis, or preparation of the manuscript.
Author contributions
All authors declare to meet the ICMJE conditions for authorship. MJP and JEM conceived the project. MJP, JEM, LB, ZD, SM, CMK and AF contributed to the design of the project. MJP, ZD, SM, and CMK screened articles for inclusion. RK, MJP, LB, ZD, SM, CMK, and EK collected data. RK and MJP analysed the data, adapting the analytic code written by JEM and AF for an earlier project. RK wrote the first draft of the manuscript, which was revised in conjunction with MJP and JEM. MJP and JEM drafted sections of the manuscript. All authors were involved in revising the article critically for important intellectual content. All authors approved the final version of the article. MJP is the guarantor of this work.
Competing interests
We have no competing interests in relation to this study.
Data availability statement
Data and analytic code are available on the Open Science Framework (https://osf.io/umk62/).
HIGHLIGHTS
Selective inclusion of results occurs when, for any particular primary study, multiple results are eligible for inclusion in a specific meta-analysis, and the result chosen for inclusion by the systematic reviewer is based on the nature of the result itself (e.g. the result’s P-value, magnitude or direction). Only one other study has investigated bias due to selective inclusion of results, in a sample of meta-analyses published between 2010 and 2012 and including randomised trials examining the effects of interventions for arthritis or depressive or anxiety disorders. It is unclear whether the findings of this prior study apply to other clinical conditions, outcome types (e.g. binary), study designs (e.g. non-randomised), and more recently published reviews.
In a sample of 39 meta-analyses including 312 studies of nutrition research, there was no evidence of selective inclusion of study effect estimates by systematic reviewers. In addition, the impact of any potential selective inclusion on the meta-analytic effects was negligible. The systematic reviews in our sample provided an opportunity to extend the previous investigation of bias due to selective inclusion of results, because these reviews often include study designs beyond randomised trials, different outcome types, and include results from multiple analyses that attempt to adjust for potential confounding variables (in the case of non-randomised studies).
We recommend that systematic reviewers report the methods used to select effect estimates to include in meta-analyses, which can help readers understand the risk of selective inclusion bias in the systematic reviews.