Abstract
The amount of time and resources invested in bringing novel therapeutics to market has increased year over year with fewer successful treatments reaching patients. In the lifecycle of drug development, the clinical phase is a major contributor to this decreasing efficiency in the development of clinical trials. One major barrier to the successful execution of a randomized control trial (RCT) is the attrition of patients who no longer participate in a trial either following enrollment or randomization. To address this problem, we have assembled a unique dataset by integrating multiple public databases including ClinicalTrials.gov and Aggregate Analysis of ClincalTrials.gov (AACT) to assemble a trial sponsor-independent dataset. This data spans 20 years of clinical trials and over 1 million patients (3,175 cohorts consisting of 1,020,085 patients and 79 curated features) in the respiratory domain and enabled a data-driven approach to identify top features influencing patient attrition in a trial. Top Features included Duration of Trial, Duration of Treatment, Indication, and Number of Adverse Events. We evaluated multiple machine learning models and found the best performance on the Test Set with Random Forest (Test subset: n=637 cohorts; RMSE 6.64). We envisage that our work will enable clinical trial sponsors to optimize trial run time by better anticipating and correcting for potential patient attrition using patient-centric strategies to improve patient engagement, thus enabling new therapies to be delivered to patients more quickly.
Competing Interest Statement
All authors were employees of AstraZeneca at the time of the execution of this work.
Funding Statement
AstraZeneca
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Data Availability
Dataset, data dictionary, and code of model development and feature selection is available at the GitHub URL: https://github.com/AstraZeneca/CTELC-Patient-Attrition-Model
https://github.com/AstraZeneca/CTELC-Patient-Attrition-Model