Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Practices, norms, and aspirations regarding the construction, validation, and reuse of code sets in the analysis of real-world data

View ORCID ProfileSigfried Gold, View ORCID ProfileHarold Lehmann, View ORCID ProfileLisa Schilling, View ORCID ProfileWayne Lutters
doi: https://doi.org/10.1101/2021.10.14.21264917
Sigfried Gold
aUniversity of Maryland, College of Information Studies, 4130 Campus Dr., 0201 Hornbake, College Park, MD 20742 USA
bJohns Hopkins 2024 East Monument St. Suite 1-200. Baltimore, Maryland 21205 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sigfried Gold
  • For correspondence: sigfried@sigfried.org
Harold Lehmann
bJohns Hopkins 2024 East Monument St. Suite 1-200. Baltimore, Maryland 21205 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Harold Lehmann
Lisa Schilling
cUniversity of Colorado Denver - Anschutz Medical Campus, 1635 Aurora Ct, Aurora, CO 80045 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lisa Schilling
Wayne Lutters
aUniversity of Maryland, College of Information Studies, 4130 Campus Dr., 0201 Hornbake, College Park, MD 20742 USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Wayne Lutters
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Data/Code
  • Preview PDF
Loading

Abstract

Objective Code sets play a central role in analytic work with clinical data warehouses, as components of phenotype, cohort, or analytic variable algorithms representing specific clinical phenomena. Code set quality has received critical attention and repositories for sharing and reusing code sets have been seen as a way to improve quality and reduce redundant effort. Nonetheless, concerns regarding code set quality persist. In order to better understand ongoing challenges in code set quality and reuse, and address them with software and infrastructure recommendations, we determined it was necessary to learn how code sets are constructed and validated in real-world settings.

Methods Survey and field study using semi-structured interviews of a purposive sample of code set practitioners. Open coding and thematic analysis on interview transcripts, interview notes, and answers to open-ended survey questions.

Results Thirty-six respondents completed the survey, of whom 15 participated in follow-up interviews. We found great variability in the methods, degree of formality, tools, expertise, and data used in code set construction and validation. We found universal agreement that crafting high-quality code sets is difficult, but very different ideas about how this can be achieved and validated. A primary divide exists between those who rely on empirical techniques using patient-level data and those who only rely on expertise and semantic data. We formulated a method- and process-based model able to account for observed variability in formality, thoroughness, resources, and techniques.

Conclusion Our model provides a structure for organizing a set of recommendations to facilitate reuse based on metadata capture during the code set development process. It classifies validation methods by the data they depend on — semantic, empirical, and derived — as they are applied over a sequence of phases: (1) code collection; (2) code evaluation; (3) code set evaluation; (4) code set acceptance; and, optionally, (5) reporting of methods used and validation results. This schematization of real-world practices informs our analysis of and response to persistent challenges in code set development. Potential re-users of existing code sets can find little evidence to support trust in their quality and fitness for use, particularly when reusing a code set in a new study or database context. Rather than allowing code set sharing and reuse to remain separate activities, occurring before and after the main action of code set development, sharing and reuse must permeate every step of the process in order to produce reliable evidence of quality and fitness for use.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

Sigfried Gold's contribution to this research was supported in part by NSF award DGE-1632976.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

IRB of University of Maryland College Park gave ethical approval for this work. IRB #1405794-8.

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • lehmann{at}jhmi.edu, lisa.schilling{at}cuanschutz.edu, lutters{at}umd.edu

Data Availability

Data produced in the present study are identifiable and private according to the protocol. On reasonable request to the authors, we may be able to produce a deidentified extract.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted October 25, 2021.
Download PDF
Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Practices, norms, and aspirations regarding the construction, validation, and reuse of code sets in the analysis of real-world data
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Practices, norms, and aspirations regarding the construction, validation, and reuse of code sets in the analysis of real-world data
Sigfried Gold, Harold Lehmann, Lisa Schilling, Wayne Lutters
medRxiv 2021.10.14.21264917; doi: https://doi.org/10.1101/2021.10.14.21264917
Reddit logo Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Practices, norms, and aspirations regarding the construction, validation, and reuse of code sets in the analysis of real-world data
Sigfried Gold, Harold Lehmann, Lisa Schilling, Wayne Lutters
medRxiv 2021.10.14.21264917; doi: https://doi.org/10.1101/2021.10.14.21264917

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (227)
  • Allergy and Immunology (501)
  • Anesthesia (110)
  • Cardiovascular Medicine (1233)
  • Dentistry and Oral Medicine (206)
  • Dermatology (147)
  • Emergency Medicine (282)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (529)
  • Epidemiology (10012)
  • Forensic Medicine (5)
  • Gastroenterology (498)
  • Genetic and Genomic Medicine (2448)
  • Geriatric Medicine (236)
  • Health Economics (479)
  • Health Informatics (1636)
  • Health Policy (751)
  • Health Systems and Quality Improvement (635)
  • Hematology (248)
  • HIV/AIDS (532)
  • Infectious Diseases (except HIV/AIDS) (11860)
  • Intensive Care and Critical Care Medicine (625)
  • Medical Education (252)
  • Medical Ethics (74)
  • Nephrology (268)
  • Neurology (2277)
  • Nursing (139)
  • Nutrition (350)
  • Obstetrics and Gynecology (452)
  • Occupational and Environmental Health (534)
  • Oncology (1245)
  • Ophthalmology (375)
  • Orthopedics (133)
  • Otolaryngology (226)
  • Pain Medicine (155)
  • Palliative Medicine (50)
  • Pathology (324)
  • Pediatrics (729)
  • Pharmacology and Therapeutics (311)
  • Primary Care Research (282)
  • Psychiatry and Clinical Psychology (2280)
  • Public and Global Health (4828)
  • Radiology and Imaging (834)
  • Rehabilitation Medicine and Physical Therapy (490)
  • Respiratory Medicine (650)
  • Rheumatology (283)
  • Sexual and Reproductive Health (237)
  • Sports Medicine (226)
  • Surgery (266)
  • Toxicology (44)
  • Transplantation (125)
  • Urology (99)