Objective: This study compares the yield and characteristics of diabetes cohorts identified using heterogeneous phenotype definitions.
Materials and methods: Inclusion criteria from seven diabetes phenotype definitions were translated into query algorithms and applied to a population (n=173 503) of adult patients from Duke University Health System. The numbers of patients meeting criteria for each definition and component (diagnosis, diabetes-associated medications, and laboratory results) were compared.
Results: Three phenotype definitions based heavily on ICD-9-CM codes identified 9-11% of the patient population. A broad definition for the Durham Diabetes Coalition included additional criteria and identified 13%. The electronic medical records and genomics, NYC A1c Registry, and diabetes-associated medications definitions, which have restricted or no ICD-9-CM criteria, identified the smallest proportions of patients (7%). The demographic characteristics for all seven phenotype definitions were similar (56-57% women, mean age range 56-57 years).The NYC A1c Registry definition had higher average patient encounters (54) than the other definitions (range 44-48) and the reference population (20) over the 5-year observation period. The concordance between populations returned by different phenotype definitions ranged from 50 to 86%. Overall, more patients met ICD-9-CM and laboratory criteria than medication criteria, but the number of patients that met abnormal laboratory criteria exclusively was greater than the numbers meeting diagnostic or medication data exclusively.
Discussion: Differences across phenotype definitions can potentially affect their application in healthcare organizations and the subsequent interpretation of data.
Conclusions: Further research focused on defining the clinical characteristics of standard diabetes cohorts is important to identify appropriate phenotype definitions for health, policy, and research.
Keywords: Clinical Research; Diabetes; Electronic Health Records; Patient Registries; Phenotypes; Secondary Data Use.