Participant characteristics and exclusion from trials: a meta-analysis of individual participant-level data from phase 3/4 industry-funded trials in chronic medical conditions

ObjectivesTrials often do not represent their target populations, threatening external validity. The aim was to assess whether age, sex, comorbidity count and/or race/ethnicity are associated with likelihood of screen failure (i.e., failure to be randomised to the trial for any reason) among potential trial participants.

DesignBayesian meta-analysis of individual participant-level data (IPD).

SettingIndustry-funded phase 3/4 trials in chronic medical conditions. Participants were identified as "randomised" or "screen failure" using trial IPD.

ParticipantsData were available for 52 trials involving 72,178 screened individuals of whom 24,733 (34%) failed screening.

Main outcome measuresFor each trial, logistic regression models were constructed to assess likelihood of screen failure, regressed on age (per 10-year increment), sex (male versus female), comorbidity count (per one additional comorbidity) and race/ethnicity. Trial-level analyses were combined in Bayesian hierarchical models with pooling across condition.

ResultsIn age- and sex-adjusted models, neither age nor sex was associated with increased odds of screen failure, though weak associations were detected after additionally adjusting for comorbidity (age, per 10-year increment: odds ratio [OR] 1.02; 95% credibility interval [CI] 1.01 to 1.04 and male sex: OR 0.95; 95% CI 0.91 to 1.00). Comorbidity count was weakly associated with screen failure, but in an unexpected direction (OR 0.97 per additional comorbidity, 95% CI 0.94 to 1.00, adjusted for age and sex). Those who self-reported as Black were slightly more likely to fail screening (OR 1.04; 95% CI 0.99 to 1.09); an effect which persisted after adjustment for age, sex and comorbidity count (OR 1.05; 95% CI 0.98 to 1.12).

ConclusionsAge, sex, comorbidity count and Black race/ethnicity were not strongly associated with increased likelihood of screen failure. Proportionate increases in screening these underserved populations may improve representation in trials.

Trial registrationRelevant trials in chronic medical conditions were identified according to pre-specified criteria (PROSPERO CRD42018048202).

Despite commitments from funders, journal editors, trialists and policymakers to improve the recruitment and retention of people from underserved populations 2 , there has not been any evident change over the last decade 1,4,5,[12][13][14] .
To become a trial participant, individuals undergo two rounds of selectionan invitation to screening phase and a screening phase proper ( Figure 1). In the invitation phase, individuals receive and accept an invitation to attend screening. Such invitees are identified from clinical encounters (or occasionally from patient registers), usually by members of healthcare staff rather than the research trial team. In the screening phase, trial staff apply formal inclusion and exclusion criteria (based on demographic and clinical characteristics) to individuals, and eligible people are invited to participate. The number or characteristics of individuals invited and screened is not a reporting requirement of clinical trial registries such as ClinicalTrials.gov, nor are these items in the influential Consolidated Standards of Reporting Trials (CONSORT) checklist for trial publications 15 . As such, the contribution of invitation and/or screening related factors to underrepresentation is not well described. This is an important gap as it is remains unclear whether changing trial eligibility criteriain line with recommendations from organisations such as the United States Food and Drug Administration (FDA)are likely to improve representation.
We have previously studied age, sex and comorbidity in 116 phase 3/4 industry-funded trials for which we had access to individual participant-level data (IPD). We found that trial participants were younger and had lower comorbidity counts than members of the community with the same index condition 6 . For a subset of these trials, we have access to data on individuals screened for participation. Therefore, we now examine whether age, sex, comorbidity counts, and self-reported race/ethnicity, predict failure to progress to randomisation among individuals who have undergone trial screening.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint 6 race/ethnicity (number and %). Race/ethnicity categories used in this analysis were largely driven by those recorded in the trial IPD, which included White, Black or African descent ("Black"), Asian, American Indian or Alaska Native and Native American or Other Pacific Islander ("Indigenous") and multiple or other "Other". Of these the first four were as per the FDA recommendations 18 ; the fifth "Other" was formed by collapsing all other categories because of small numbers.
Full details of the modelling have been published previously 16 . In brief, for each trial, logistic regression models were constructed to assess likelihood of screen failure, regressed on age (per 10-year increment), sex (male versus female), comorbidity count (per 1 additional comorbidity) and race/ethnicity. Coefficients, standard errors and variance/covariance matrices were exported for each model from the Vivli secure environment. The estimates from each trial were then meta-analysed in Bayesian hierarchical models. Each model had a multivariate normal likelihood, where for each trial the exported coefficients supplied a vector of means, and the exported variancecovariance matrices for these coefficients was the covariance matrix of the multivariate normal. In the primary analysis, models were fitted with trial nested within index condition. In secondary analyses, we then explored different structures for the model hierarchy, where trial was nested within both index condition and treatment comparison. For all models, trial was treated as a random effect.
We fitted models with five main sets of covariates: i) age and sex; ii) comorbidity count; iii) race/ethnicity; iv) age, sex and comorbidity count, and; v) age, sex, comorbidity count and race/ethnicity. We fitted additional models with interaction terms for selected two-way and three-way interactions. For sex, female was the reference category, while for race/ethnicity, White was the reference category (as this was the largest group and present in all trials) with dummy (indicator) variables for the remaining levels. In sensitivity analyses, we included only participants with one or more, or two or more comorbidities (other than the index condition).
For each model we report point estimate and 95% credible interval odds ratios for the association between each characteristic (age, sex, comorbidity count and race/ethnicity) and screen failure. These were obtained by exponentiating the posterior distributions and obtaining the mean, 2.5 th and 97.5 th percentiles. We additionally report between trial, between index condition and between treatment comparison variation for each parameter as the standard deviations. Finally, for model v) we present condition-specific odds ratios.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint

Baseline data
There were 52 trials involving 72,178 screened individuals, of whom 24,733 (34%) failed screening ( Table 1). The number of trials included in the sequential models reflected the data availability in the trials. Age and sex data were available for all 52 trials. Comorbidity count data (including for individuals who did not pass screening) were available for 31 trials and race/ethnicity data were available for 45 trials. Both race/ethnicity and comorbidity count data were available for 27 trials.

Trial-level factors and screen failure
On visual inspection, there were no identifiable associations between year of trial conduct, trial size or trial phase and proportion of individuals who did not pass screening ( Figure 2).
Trial-level factors were not explored further in meta-analysis models.

Individual participant-level factors associated with screen failure: primary analysis with trial nested within index condition
On modelling age and sex (n=52 trials), neither was associated with increased odds of screen failure (odds ratio [OR] 1.01; 95% CI 0.99 to 1.03 for age per 10-year increment; OR 0.97; 95% CI 0.93 to 1.01 for male versus female sex; Table 2). After additional adjustment for comorbidity count (n=31 trials), there was a weak association between reduced odds of screen failure for older age and for male sex, though the latter just crossed the null ( Table 2).
Comorbidity count was weakly associated with screen failure, but in an unexpected direction (n=31 trials). In a model including solely comorbidity count the OR was 0.93 per additional comorbidity (95% CI 0.87 to 1.00). On additionally adjusting for age and sex (n=31 trials), and age, sex and race/ethnicity (n=27 trials) the odds ratios were similar ( Table 2). In sensitivity analyses (restricting analyses to participants with one or more, or two or more comorbidities), the association between comorbidity count and screen failure was considerably weaker, the credible intervals included the null and overall were consistent with no association (OR 0.99; 95% CI 0.97 to 1.02 for both sensitivity analyses).
On modelling race/ethnicity in a univariate analysis (n=45 trials), all the credible intervals included the null (Table 2); however, self-reported Black race/ethnicity appeared to be weakly associated with higher likelihood of failing screening (OR 1.04; 95% CI 0.99 to 1.09). After . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint adjustment for age, sex and comorbidity count (n=27 trials), the point estimate and credible interval was similar (OR 1.05; 95% CI 0.98 to 1.12).
There was no evidence of an interaction between age and sex, sex and comorbidity count, or age, sex and comorbidity count (Supplementary Table S1). On modelling an interaction between male sex and Black race/ethnicity, there was a slightly stronger association in women (OR 1.09; 95% CI 1.00 to 1.19) than men (OR 0.97; 0.75 to 1.19). However, the credible interval for the odds ratio for the interaction included the null (OR 0.90; 0.70 to 1.08) and this comparison should be interpreted circumspectly.

Variation across trials, index conditions and treatment comparisons: secondary analyses
The associations between individual participant-level factors and screen failure were similar in models where index condition and treatment comparison were ignored, and where trial was nested within both index condition and treatment comparison (Table 3 and Supplementary   Table S2). There was no evidence of substantial variation across index conditions or treatment comparisons for associations between age, race/ethnicity and screen failure (Table 3 and Figure 3). There was a clear association between male sex and reduced odds of screen failure in trials in hypertension and chronic obstructive pulmonary disease (Figures 3 and 4). For comorbidity count, although the point estimates were in the same direction (below one) for all the index conditions, the associations were more markedly negative for asthma and for rhinitis, and to a lesser extent, for osteoporosis and diabetes (Figures 3 and 4).

Model diagnostics
The analysis code, model outputs from the trial-level logistic regression models fit within the trial safe havens, and the model outputs from the Bayesian hierarchical models, are available in the project GitHub repository (https://github.com/ChronicDiseaseEpi/screenfail_public). For the latter we also provide model diagnostics in terms of the number of divergent transitions, the Rhat and the bulk and tail effective sample sizes. There were no divergent transitions for any of the models. Rhat (a convergence diagnostic which compares the between-chain and within-chain estimates) was always <= 1.02 and for most models and terms was < 1.01, indicating satisfactory convergence. For some of the models where index condition was ignored, and those where trial was nested within index condition and treatment comparison the effective sample size was less than 400. However, for all the models presented in the main manuscript the effective sample sizes (bulk and tail) were >400.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
Aggregate data analyses of randomised trial participants show underrepresentation of women, those with multi-morbidity and non-white ethnic groups 1-3 . However, it is not clear whether this under-representation arises at the invitation or screening phase. In this IPD analysis of 52 trials in chronic medical conditions, there was a weak association between higher age and increased likelihood of screen failure, with a lower likelihood of screen failure in individuals of male sex, although the credible interval for the latter included the null. Considering the detected associations between participant characteristics and screen failure were small, under-representation may be more driven by selection at the invitation to screening phase, rather than by application of trial eligibility criteria by the trial team during screening.

Strengths and limitations
The strengths of this study are in the use of IPD across diverse trials conducted in chronic medical conditions, while limited previous comparisons have used aggregate trial data, questionnaires or been limited to particular index conditions. However, we acknowledge some limitations. Firstly, data describing screened populations are not routinely reported either in clinical trial repositories (such as ClinicalTrials.gov) nor in published trials: we cannot be certain of the accuracy of data collection within potential trial participants who have not consented to complete data collection 19 . Nevertheless, this limitation also demonstrates the scarcity and value of our data. Secondly, there were inadequate data available within the IPD to which we had access to allow us to examine the reasons for failing screening. While underlying associations for screen failure were weak overall, specific reasons for failing screening may be for other reasons, such as frailty, which we have not investigated. Thirdly, reflecting our initial trial selection, trials conducted in cancer, infections, psychiatric and developmental disorders were excluded. Similarly, we were only able to obtain IPD for trials contained within the Vivli trial data-sharing repository, and for sponsors who share data using this repository. It is possible that these findings are not representative of trials. Due to incomplete reporting of data on screen failures in trial registries such as ClinicalTrials.gov, we cannot measure the representativeness of these included trials for assessment of screen failure across other disease groups, sponsors or in non-industry funded trials; however, we previously illustrated that IPD trials are broadly representative of trials registered on ClinicalTrials.gov for assessment of trial attrition 16 . Finally, our measure of comorbidity was crudean overall count, and it is plausible that either specific comorbidities, interactions between comorbidities, and/or interactions between comorbidities and the index condition are predictive of screen failure.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Comparison with the previous literature
This is the first exploration examining participant characteristics associated with failing screening using trial IPD, across a wide variety of phase 3 and 4 industry-funded trials conducted for chronic medical conditions. However, a few studies have examined trial selection using other methodologies. A nationally representative survey found that women and men were equally likely to be invited to participate in trials 3 . We demonstrate that women are slightly more likely to fail trial screening than men across most index conditions, and clearly more likely to fail screening in trials of hypertension and chronic obstructive pulmonary disease. In a previous analysis of trial IPD, we showed that attrition after randomisation is not more likely in women 16 . Together, these studies suggest that enhancing the proportion of women invited to screening may improve female representation in trials.
We identified a weak and inconsistent association between Black race/ethnicity and increased likelihood of failing screening. Our findings are in keeping with the literature, which shows that people from racial and ethnic minorities are not substantially more likely to decline trial participation if offered 20,21 , but remain systematically under-represented in trials [22][23][24] . This has prompted the development of guidelines to recruit and retain participants from ethnic minority groups (Trial Forge Guidance 3) 2 . The guidelines point to unintended exclusions of ethnic minorities because of restrictive eligibility criteria and recruitment pathways (certain comorbidities are more common among ethnic minorities 25,26 ); provision of trial materials and information in poorly accessible forms (e.g., failure to consider language support, differences in literacy or cultural differences in the nature of communication); lack of cultural competence among trial staff; and an absence of trusting relationships between trialists and ethnic minority people. Ethnic minority groups may also have different motivations for trial participation, particularly in countries where universal healthcare is not provided 3 , and may stem from historical events (e.g., Tuskegee syphilis study), as well as discrimination that persists 27 . Since we found at most a weak association between Black race/ethnicity and screen failure among screened participants, this suggests that under-representation is more likely to have arisen at the invitation rather than the screening phase. Furthermore, our findings show important heterogeneity in patterns across groups, highlighting the importance of studying specific racial and ethnic groups. Consequently, approaches to improve representation may also be more effective if targeted at the invitation phase.
We identified a paradoxical association such that lower comorbidity count was associated with increased likelihood of failing screening; however, in sensitivity analysesexcluding people with very low levels of comorbiditythere was no apparent association between comorbidity count and screen failure. The most likely explanation for this observation is reporting or . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint recording bias: potential participants may be more likely to recall medications and conditions when they have decided to participate in a trial, or investigators may make greater efforts to record such information in individuals they think are unlikely to fail screening.

Implications for practice and policy
Ours is the first of which we are aware to meta-analyse associations between individual-level characteristics and failing to pass trial screening, and it was only possible due to our access to trial individual-level participant data. To better understand and improve trial representativeness, reporting guideline groups (such as CONSORT), representatives of journals (such as the International Committee of Medical Journal Editors), and trial registries which mandate results reporting (such as ClincialTrials.gov) may wish to consider requiring reporting of invited and screened participants as part of trial dissemination.
In lieu of more widespread reporting, our own findingswhile limited to a relatively small and selected set of phase 3 and 4 industry-funded trials for which IPD were availablesuggest that processes during the invitation to screening phase may be more important as regards trial representativeness.

Conclusion
Age, sex, comorbidity count and race/ethnicity were not strongly associated with increased likelihood of screen failure. Proportionate increases in screening these underserved populations may improve representation in trials.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Transparency declaration
The lead author affirms that the manuscript is an honest, accurate and transparent account of the study being reported.

Public and Patient Involvement statement
Patients and the public were involved directly in the trial data used for this analysis, but were not directly involved in the design or planning of the current analysis. This paper has important messages for patients and the public, and support from Patient and Public Involvement and Engagement (PPIE) groups at the University of Glasgow will be sought to assist with dissemination of the research message.

Ethical approval
Ethical approval was not required for this study.

Declaration
This manuscript is based in part on research using data from data contributors, Boehringer Ingelheim, Eli Lilly, Roche, Takeda and UCB, that has been made available through Vivli, Inc.
Vivli has not contributed to or approved, and Vivli, Boehringer Ingelheim, Eli Lilly, Roche, Takeda and UCB, are not in any way responsible for, the contents of this manuscript. SVK acknowledges funding from the Medical Research Council (MC_UU_00022/2) and the Scottish Government Chief Scientist Office (SPHSU17). Outside the submitted work, JSL . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Data sharing statement
Individual patient-level data are available from the Vivli Centre for Global Clinical Research Data platform (https://vivli.org). Trial-level results, model outputs and analysis code are provided on the project GitHub repository: -https://github.com/ChronicDiseaseEpi/screenfail_public . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 17, 2023 Characteristics of randomised participants (Randomised = "Yes") and screen failures (Randomised = "No") for included studies. Race/ethnicity categories included White ("White"), Black or African descent ("Black"), Asian ("Asian"), American Indian or Alaska Native and Native American or Other Pacific Islander ("Indigenous") and Multiple or Other ("Other").

Models for the log odds ratio (standard error) [95% credible interval] for screen failure examining variation in estimates between trial,
between condition and between trial and condition. Models, adjusted for age, sex, comorbidity count and race/ethnicity, were conducted at three levels: trial (where condition and treatment were ignored); trial nested within condition; and trial nested within condition and treatment. Figure 1 Barriers to trial completion Invitation to screening Screening Randomisation Trial completion

Screen failure
Attrition Screening declined Not invited to screening . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted April 17, 2023. ; https://doi.org/10.1101/2023.04.14.23288549 doi: medRxiv preprint Figure 3 Forest plots showing mean odds ratio (OR) and 95% credible intervals of likelihood of screen failure for any reason by index condition. Results are displayed for age (per 10-year increase), sex (male versus female), comorbidity count (per 1 additional comorbidity) and self-reported /ethnicity. The black dotted line is the reference line (no effect OR = 1).

Models for the log odds ratio (standard error) [95% credible interval] for screen failure examining variation in estimates between trial,
between condition and between trial and condition. Model 1: adjusted for age and sex; Model 2: comorbidity count only; Model 3: race/ethnicity only; Model 4: adjusted for age, sex and comorbidity count; Model 5: adjusted for age, sex, comorbidity count and race/ethnicity. Models were conducted at three levels: trial (where condition and treatment were ignored); trial nested within condition; and trial nested within condition and treatment.