Abstract
Objective Crafting high-quality code sets is time-consuming and requires a range of clinical, terminological, and informatics expertise. Despite widespread agreement on the importance of reusing code sets, code set repositories suffer from clutter and redundancy, greatly complicating efforts at reuse. When users encounter multiple code sets with the same name or ostensibly representing the same clinical condition, it can be difficult to choose amongst them or determine if any differences among them are due to error or intentional decision.
Methods This paper offers a view of code set development and reuse based on a field study of researchers and informaticists. The results emerge from an analysis of relevant literature, reflective practice, and the field research data.
Results Qualitative analysis of our study data, the relevant literature, and our own professional experience led us to three dichotomous concepts that frame an understanding of diverse practices and perspectives surrounding code set development:
Permissible values versus analytic code sets;
Prescriptive versus descriptive approaches to controlled medical vocabulary use; and
Semantic and empirical types of code set development and evaluation practices and the data they rely on.
This three-fold framework opens up the redundancy problem, explaining why multiple code sets may or may not be needed and advancing academic understanding of code set development.
Conclusion The paper catalogues the methods and practices used and which are appropriate in different contexts. It provides practical aid in managing the code set development process and exposes opportunities for innovation in software to support recommendations made here and in prior literature and to help users navigate thickets of ostensibly redundant code sets not just to choose between them, but to make use of their differences in crafting code sets appropriate to researchers’ needs.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
The first author's work was supported in part by NSF award DGE-1632976.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
University of Maryland IRB #1405794-8
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability
Analytic results of survey and interview data are contained in the manuscript. Raw data is no longer available.