Abstract
Objective Electronic health record (EHR) databases enable scalable investigations of serious mental illness (SMI), including bipolar disorder (BD), severe or recurrent major depressive disorder (MDD), schizophrenia (SCZ), and other chronic psychoses. The authors analyzed structured and unstructured EHR data from a large mental health facility to characterize SMI clinical features and trajectories.
Methods Diagnostic codes, information from clinical notes, and healthcare use data, were extracted from the EHR database of Clínica San Juan de Dios in Manizales, Colombia for the years 2005-2022, including 22,447 individuals (ages 4-90, 60% female) treated for SMI. The reliability of diagnostic codes was assessed in relation to diagnoses obtained from manual chart review (n=105). A Natural Language Processing (NLP) pipeline was developed to extract features from clinical notes. Diagnostic stability was quantified in patients with ≥ 3 visits (n=12,962). Finally, mixed-effect logistic regression models were used to identify factors associated with diagnostic stability.
Results Assigned EHR diagnoses showed very good agreement with those obtained from manual chart review (Cohen’s kappa 0.78). The NLP algorithm (which demonstrated excellent balance between precision and recall with average F1=0.88) identified high frequencies of suicidality and psychosis, transdiagnostically. Most SMI patients (64%) displayed multiple EHR diagnoses, including switches between primary diagnoses (19%), comorbidities (30%), and combinations of both (15%). Predictors of changes in EHR diagnoses include Delusions in clinical notes (OR=1.50, p=2e-18) and a history of previous diagnostic changes (OR=4.02, p=3e-250).
Conclusions Longitudinal EHR databases enable scalable investigation of transdiagnostic clinical features and delineation of granular SMI trajectories through the integration of information from clinical notes and diagnostic codes.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
Research reported here was supported by R01MH123157 (to LMOL, CLJ, and NBF), R01MH113078 (to CEB, CLJ, and NBF), R00MH116115 (to LMOL), T32MH073526 (to JFDLH) and the Fulbright Commission in Colombia through a Fulbright-Colciencias grant (to JFDLH). The content is solely the responsibility of the authors and does not necessarily represent the official views of Fulbright or the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
All procedures involving human subjects/patients were approved by the Institutional Review Boards at Clinica San Juan de Dios Manizales and University of California, Los Angeles
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.
Yes
Footnotes
This version emphasizes the overlaps of clinical features with diagnostic categories. We highlight the utility of NLP in identifying transdiagnostic features like suicidality and psychosis, as well as the integration of structured and unstructured EHR data to delineate comprehensive trajectories of severe mental illness.
Data Availability
NA