Elsevier

Biological Psychiatry

Volume 83, Issue 12, 15 June 2018, Pages 997-1004
Biological Psychiatry

Techniques and Methods
High Throughput Phenotyping for Dimensional Psychopathology in Electronic Health Records

https://doi.org/10.1016/j.biopsych.2018.01.011Get rights and content
Under a Creative Commons license
open access

Abstract

Background

Relying on diagnostic categories of neuropsychiatric illness obscures the complexity of these disorders. Capturing multiple dimensional measures of neuropathology could facilitate the clinical and neurobiological investigation of cognitive and behavioral phenotypes.

Methods

We developed a natural language processing–based approach to extract five symptom dimensions, based on the National Institute of Mental Health Research Domain Criteria definitions, from narrative clinical notes. Estimates of Research Domain Criteria loading were derived from a cohort of 3619 individuals with 4623 hospital admissions. We applied this tool to a large corpus of psychiatric inpatient admission and discharge notes (2010–2015), and using the same cohort we examined face validity, predictive validity, and convergent validity with gold standard annotations.

Results

In mixed-effect models adjusted for sociodemographic and clinical features, greater negative and positive symptom domains were associated with a shorter length of stay (β = −.88, p = .001 and β = −1.22, p < .001, respectively), while greater social and arousal domain scores were associated with a longer length of stay (β = .93, p < .001 and β = .81, p = .007, respectively). In fully adjusted Cox regression models, a greater positive domain score at discharge was also associated with a significant increase in readmission risk (hazard ratio = 1.22, p < .001). Positive and negative valence domains were correlated with expert annotation (by analysis of variance [df = 3], R2 = .13 and .19, respectively). Likewise, in a subset of patients, neurocognitive testing was correlated with cognitive performance scores (p < .008 for three of six measures).

Conclusions

This shows that natural language processing can be used to efficiently and transparently score clinical notes in terms of cognitive and psychopathologic domains.

Keywords

Computed phenotype
Electronic health record
Natural language processing
Research Domain Criteria
Topic modeling
Transdiagnostic

Cited by (0)

1

THM and SY contributed equally to this work. TC and RHP contributed equally to this work.