Evaluation of Clinical Trial Data Sharing Policy in Leading Medical Journals

Background. The benefits from responsible sharing of individual-participant data (IPD) from clinical studies are well recognized, but stakeholders often disagree on how to align those benefits with privacy risks, costs, and incentives for clinical trialists and sponsors. Recently, the International Committee of Medical Journal Editors (ICMJE) required a data sharing statement (DSS) from submissions reporting clinical trials effective July 1, 2018. We set out to evaluate the implementation of the policy in three leading medical journals (JAMA, Lancet, and New England Journal of Medicine (NEJM)). Methods. A MEDLINE/PubMed search of clinical trials published in the three journals between July 1, 2018 and April 4, 2020 identified 487 eligible trials (JAMA n = 112, Lancet n = 147, NEJM n = 228). Two reviewers evaluated each of the 487 articles independently. Captured outcomes were declared data availability, data type, access, conditions and reasons for data (un)availability, and funding sources. Findings. 334 (68.6%, 95% confidence interval (CI), 64.1%-72.5%) articles declared data sharing, with non-industry NIH-funded trials exhibiting the highest rates of declared data sharing (88.9%, 95% CI, 80.0%-97.8) and industry-funded trials the lowest (61.3%, 95% CI, 54.3%-68.3). However, only two IPD datasets were actually deidentified and publicly available as of April 10, 2020. The remaining were supposedly accessible via request to authors (42.8%, 143/334), repository (26.6%, 89/334), and company (23.4%, 78/334). Among the 89 articles declaring to store IPD in repositories, only 17 articles (19.1%) deposited data, mostly due to embargo and regulatory approval. Embargo was set in 47.3% (158/334) of data-sharing articles, and in half of them the period exceeded 1 year or was unspecified. Interpretation. Most trials published in JAMA, Lancet, and NEJM after the implementation of the ICMJE policy declared their intent to make data available. However, a wide gap between declared and actual data sharing exists. To improve transparency and data reuse, journals should promote the use of unique pointers to dataset location and standardized choices for embargo periods and access requirements. All data, code, and materials used in this analysis are available on OSF at https://osf.io/s5vbg/.

Introduction prevalent disincentives and incentives (e.g., data authorship 23,24 ) for clinical trial data sharing have only recently entered the public realm 3,23,25,26 , in part accelerated by discussions surrounding the ICMJE's data sharing policy 11,27 when many points of agreement and disagreement among stakeholders were articulated. 3,5,6,27,28 The ICMJE policy requires investigators to state whether they will share data (or not) while simultaneously providing an opportunity for them to place multiple restrictions and conditions regarding data access. Specifically, the DSS provides an opportunity for authors and sponsors to specify periods of data exclusivity or embargo. In addition, authors can specify in the DSS how the data will be made available, reasons for data (un)availability, and related preferences. Thus, the data sharing statements, required by the ICMJE's policy, provide a window into data sharing norms, practices, and perceived risks among trialists and sponsors.
We set out to evaluate how the ICMJE's data sharing policy has been implemented in three leading medical journals that are also member journals of ICMJE (JAMA 9 , Lancet 10 , New England Journal of Medicine 11 (NEJM)).

Methods
A MEDLINE/PubMed search of clinical trials published in the three journals between July 1, 2018 and April 4, 2020 identified 629 potentially eligible articles. 486 of them included a DSS while others were either submitted before July 2018 or were letters. One article, published in 2020, met all other inclusion criteria but contained no DSS, and was included in the study sample as not sharing data on the ground that articles published in 2020 were likely submitted after July . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint 1, 2018, and are therefore required to contain a DSS. We conducted a cross-sectional observational study for all 487 article (JAMA n = 112, Lancet n = 147, NEJM n = 228). Тwo reviewers evaluated each article independently. Discrepancies were resolved unanimously or by a third reviewer.
Data was classified as available when authors answered "Yes" (JAMA, NEJM) or gave an unstructured positive response (Lancet) to the data availability question. Information about data type, access, conditions and reasons for data (un)availability were taken from the DSS. We also compared declared to actual data availability in repositories by examining whether information about data and data themselves are available in the respective repository.
Funding sources were classified as industry, non-industry NIH, non-industry non-NIH, and mixed. Industry refers to research funding from companies. Non-industry NIH refers to research funding from the U.S. National Institutes of Health (NIH). Non-industry non-NIH refers to research funding from foundations, trusts, associations, national institutes outside the USA, etc.
Mixed refers to any combination of the other research-funding categories.
We conduct descriptive analysis of variables associated with data sharing by type of funding and publication journal. For the primary outcome variable, declared data sharing, we report the 95% confidence intervals determined by bootstrapping (100,000 iterations). We used Python programming language (Python Software Foundation, available at https://www.python.org/) and Jupyter Notebook to perform data analysis and to generate summary statistics and graphs. All . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020

Results
Overall, 334 (68.6%, 95% confidence interval (CI), 64.1%-72.5%) articles declared data sharing ( Table 1). Prevalence of declared data sharing varied by journal and funder type. Non-industry NIH-funded trials had the highest rates of declared data sharing (88.9%, 95% CI, 80.0%-97.8) and industry-funded trials the lowest (61.3%, 95% CI, 54.3%-68.3) ( Figure 1A). Тhe highest rate of declared data sharing of NIH-funded trials is consistent across the three journals ( Figure   1B). No substantial changes in the prevalence of declared data sharing were observed over the span of the first seven quarters of policy implementation ( Figure 1C).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
Data repositories have a central role in improving sharing, security, discoverability, and reuse of research data, 29,30 and in particular of individual-level participant data from clinical trials. [31][32][33] Among the 89 articles proposing to make IPD available through repositories, many planned to store data in general-purpose repositories, including the Clinical Study Data Request (n = 31), the Yale Open Data Access (YODA) Project (n = 7), and Vivli (n = 7). Another 30 articles planned to store IPD in NIH-supported domain-specific data repositories such as NCTN/NCORP Data Archive (n = 10), the NHLBI Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC) (n = 9), and the NICHD Data and Specimen Hub (DASH) (n = 5) ( Figure 2 and Table S1).
We compared declared to actual data availability in repositories (Table 2). Among 89 articles, information about the data was uncommon to find in the repository (22.5%, 20/89) and the data themselves were even less frequently available there (19.1%, 17/89). Although data of NIHfunded trials (31.8%) were somewhat more likely than data of industry-funded trials (15.2%) to be available in repositories, most trials provided neither information nor data in the respective repositories, mostly due to embargo and pending regulatory approval. Specifically, among the 72 articles that declared their intent but did not store data on repository, 37 (51.4%) made data access conditional on embargo or product approval.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
Most trials published in JAMA, Lancet, and NEJM after the endorsement of the ICMJE policy declared their intent to make clinical data available. Non-industry funded trials communicated greater intent to share data than industry-funded trials. However, the commitment to data sharing substantially decreases when we consider indicators of actual versus declared data sharing-out of 334 articles declaring to share data, only two IPD datasets were actually deidentified and publicly available on journal website, and among the 89 articles declaring to store IPD in repositories, data from only 17 articles were found on the respective repository ( Figure 3).
Consistent with prior research of clinical trial data registries 34,35 and data sharing statements, 7 DSS language was often ambivalent. Offering of aggregate data, collaboration demands, lengthy or unspecified embargo periods, and the use of legacy methods for access such as author or company request communicate only lukewarm commitment. Repositories can be instrumental for sharing but real practices may diverge from intent.
Major funders are in the midst of designing or updating data sharing policies. Recently, for example, the National Institutes of Health (NIH) has drafted and requested public comments on a . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint data sharing policy. 25 The draft data sharing policy was discussed in the context of clinical trial data sharing. 21 Our findings highlighted inefficiencies in clinical trial data sharing practices, and addressing those in NIH and other major funders' policies could narrow the wide gap we identified between declared and actual availability of clinical trial IPD.
Our study has limitations that should be acknowledged. First, only three journals were considered. Moreover, we could readily investigate declared versus actual data sharing practices only for repositories. Finally, only two IPD datasets were deidentified and available on journal website, so we could not meaningfully examine the usability of shared data or reproducibility 8 of the clinical trial studies. As more IPD datasets become available, it would be interesting to assess whether they are easy to use, and how complete is the information being provided.
To promote transparency and data reuse, journals and funders should work towards incentivizing data sharing via funding mechanisms 21 and data authorship 23 , and simultaneously discourage ambivalent wording in DSS and possibly mandate data sharing. They can promote the use of unique pointers to dataset location in repositories and to data request forms. Standardized choices for embargo periods, access requirements, and conditions for data use as part of the data sharing process could also reduce unnecessary data withholding and turn declarative data sharing into actual transparency in clinical trial data.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint Table 2 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint  Table S1 for details about the clinical trial data repositories.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020  . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint Funding/Support: METRICS is supported by a grant from the Laura and John Arnold Foundation.
Role of the Funder/Sponsor: The funder had no role in the design, data collection, analysis, and interpretation of data, or preparation, review and approval of the manuscript, or decision to submit the manuscript for publication. Data Sharing Statement: All the data, computer code, and materials for this study are publicly available from the Open Science Framework at https://osf.io/s5vbg/.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Inclusion and exclusion criteria
In order to be eligible to be selected in the study, a published paper must meet the following inclusion criteria: 1. Publication reports clinical trial results 2. Published in JAMA, NEJM, Lancet 3. Published since July 1, 2018 4. Type of publication is Article 5. Contain a Data Sharing Statement (DSS)* *Articles published in 2020 that meet criteria 1 to 4 are eligible even if no DSS is present as these are likely submitted after July 1, 2018, and therefore fall under the requirements of the ICMJE data sharing policy.
Excluded from the study were publications that contain no Data Sharing Statement due to: 1. Submission prior to July 1, 2018 2. Study not a clinical trial (e.g., observational) 3. Type of publication is letter/correspondence

Search strategy
A MEDLINE/PubMed search was performed on April 4, 2020 using the following search strategy: As of April 04 2020, the search yielded 629 results. Out of those, 486 publications were clinical trials and contained a DSS. One 2020 publication met 1 to 4 inclusion criteria but contained no DSS, and was included in the study sample as not sharing data because articles published in 2020 were likely submitted after policy's effective date, July 1, 2018. We conducted a cross-sectional observational study of the resulting sample of 487 articles.

Independent review of articles
For each of the 487 articles, two reviewers independently evaluated the Data Sharing Statement and funding statement, using procedures described in the Codebook. Discrepancies were resolved unanimously or by a third reviewer.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 11, 2020. . https://doi.org/10.1101/2020.05.07.20094656 doi: medRxiv preprint