Detecting Problematic Opioid Use in the Electronic Health Record: Automation of the Addiction Behaviors Checklist in a Chronic Pain Population

Importance: Individuals whose chronic pain is managed with opioids are at high risk of developing an opioid use disorder. Large data sets, such as electronic health records, are required for conducting studies that assist with identification and management of problematic opioid use. Objective: Determine whether regular expressions, a highly interpretable natural language processing technique, could automate a validated clinical tool (Addiction Behaviors Checklist1) to expedite the identification of problematic opioid use in the electronic health record. Design: This cross-sectional study reports on a retrospective cohort with data analyzed from 2021 through 2023. The approach was evaluated against a blinded, manually reviewed holdout test set of 100 patients. Setting: The study used data from Vanderbilt University Medical Center’s Synthetic Derivative, a de-identified version of the electronic health record for research purposes. Participants: This cohort comprised 8,063 individuals with chronic pain. Chronic pain was defined by International Classification of Disease codes occurring on at least two different days.18 We collected demographic, billing code, and free-text notes from patients’ electronic health records. Main Outcomes and Measures: The primary outcome was the evaluation of the automated method in identifying patients demonstrating problematic opioid use and its comparison to opioid use disorder diagnostic codes. We evaluated the methods with F1 scores and areas under the curve - indicators of sensitivity, specificity, and positive and negative predictive value. Results: The cohort comprised 8,063 individuals with chronic pain (mean [SD] age at earliest chronic pain diagnosis, 56.2 [16.3] years; 5081 [63.0%] females; 2982 [37.0%] male patients; 76 [1.0%] Asian, 1336 [16.6%] Black, 56 [1.0%] other, 30 [0.4%] unknown race patients, and 6499 [80.6%] White; 135 [1.7%] Hispanic/Latino, 7898 [98.0%] Non-Hispanic/Latino, and 30 [0.4%] unknown ethnicity patients). The automated approach identified individuals with problematic opioid use that were missed by diagnostic codes and outperformed diagnostic codes in F1 scores (0.74 vs. 0.08) and areas under the curve (0.82 vs 0.52). Conclusions and Relevance: This automated data extraction technique can facilitate earlier identification of people at-risk for, and suffering from, problematic opioid use, and create new opportunities for studying long-term sequelae of opioid pain management.


Introduction
Chronic pain affects more than 40 million individuals in the United States, of which approximately 10 million experience high-impact chronic pain affecting daily activities .2 Prescription opioids are a primary treatment for chronic pain management. Given the highly addictive nature of opioids, the risk of developing an opioid use disorder (OUD) is estimated to be high at approximately 18%. 3 OUD is associated with a financial burden of more than $1 trillion when accounting for health care, lost work productivity, and criminal justice costs. 4 In order to address this problem from a healthcare perspective, we must first be able to identify which patients suffer from OUD and/or are at risk for developing OUD.
The magnitude of this particular problem necessitates large-scale data sources for identifying individuals across the continuum of problematic opioid use. Currently, the largest source of health data is the electronic health record (EHR) used for routine clinical care. The standard method for detecting a clinical problem in the EHR is through diagnostic indicators, such as problem lists or International Classification of Diseases (ICD) codes used for billing purposes. 5, 6 However, ICD codes are not a reliable source of OUD diagnosis because the codes are under-utilized, which has been attributed to OUD-related stigma and provider concerns about barriers to future pain management. 3,[7][8][9][10] Expanding the search to additional areas that contain clinical notes has been explored. For example, Palmer et al. used natural language processing (NLP) based on matching terms in a customized dictionary of 1,248 problematic opioid use keywords developed by subject matter experts. They discovered NLP techniques could identify many individuals with problematic opioid use that did not have relevant ICD codes; however, they also found many patients with relevant ICD codes who were not flagged by NLP methods. 11 Similar results were reported by Carrell et al. who developed a customized dictionary (of 1,288 unique terms) based on recommendations from subject matter experts and iterative reviews of example text. 12 The results from these papers would suggest limitations to a customized dictionary approach to NLP (or an inadequacy of EHR documentation).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint The Addiction Behaviors Checklist (ABC) is a valid and reliable instrument that can be used for identifying OUD risk among chronic pain patients. 1 The ABC collects risk information provided by clinicians, making it a particularly suitable tool to adapt to EHR data. We used the ABC instrument instead of other OUD risk assessment tools (e.g., the Revised Opioid Risk Tool 13 ) because the ABC collects risk information from language patterns used by clinicians, who are the authors of the notes used in this study. Use of such an assessment tool could guide the NLP methods for automatically searching the clinicians' notes within the EHR. NLP can be leveraged for automating information extraction from text documents and includes techniques ranging from pattern matching to concept extraction to advanced numerical vector embeddings. 14 If an NLP method that is highly interpretable by clinicians, such as one that facilitates review of matching text, could perform comparably to manual chart reviews, there is potential to expedite and scale the identification of addictive behaviors within EHRs.
Therefore, the objective of this study was to leverage an interpretable NLP technique to automate the ABC instrument for purposes of expediting research or clinical chart reviews.

Cohort Definition and Data Collection
We conducted a retrospective, observational cohort study of individuals with chronic pain. We selected this phenotype because individuals with chronic pain are known to have a higher incidence of opioid use and OUD than the general population. 15 We studied this cohort in our prior publication. 16 We derived the study data from Vanderbilt University Medical Center's Synthetic Derivative, a de-identified version of the electronic health record for research purposes. 17 Chronic pain was defined based on ICD codes (e.g., 338.2, 338.21, G89.2, G89.21 -see Appendix A for a full list of ICD codes) occurring on at least two different days to decrease false positives when including patients with only a spurious code. 18 We collected all electronic health record free-text notes (e.g., progress notes, patient communication, history and physical) for a patient restricted to the window 30 days before the patient's first chronic pain ICD code through 30 days after the last chronic pain ICD code. We also collected demographic information . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint (age, gender, ethnicity, and race) from the electronic health record to characterize our population. We limited the cohort to individuals aged 13 years or older at time of their first chronic pain diagnosis.

Instrument Development
We used regular expressions to identify text patterns in the clinical notes. Regular expressions are an easily implementable and interpretable form of NLP that can be used to search for pattern matches in a corpus of text. 14  Following a review of matches, we examined whether additional filtering for matches near 133 opioid-related terms (Appendix B) and/or at least 7 negation detection terms (Appendix C) improved performance. For example, with ABC item 2 ("Patient has hoarded meds."), we used regular expressions to search for "hoard" and then filtered to include only those variations of "hoard", "stash", "left over", "storing", and "stockpil"[sic] that were followed by variations of "pain med," "opioid", "opiod"[sic], "narc", "analges" [sic] or an opioid drug name. Then, of those sentences, any that included a negating term preceding "hoard" (or one of the related verbs) were excluded. We added a final step for some expressions where we included common false positive matches. For example, opioids were frequently mentioned in the Discharge Instructions of a patient's chart. We removed opioid matches if they were preceded by the phrase "Discharge Instructions." is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint expressions (see Appendix D) representing the ABC items. We implemented the regular expressions in python (version 3.10). We applied each regular expression to every clinical note. If 1 or more matches for a given ABC item were discovered in any of the patient's notes within the time window that defined the cohort, that patient received 1 point toward an overall total score.

Statistical Analysis
We evaluated our methods against a gold-standard manual review in a holdout test set of 100 patients that had been adjudicated in our previous study 16 where we classified individuals as having no, some, or high evidence of OUD and substance use disorders (SUD). These 100 patients were randomly selected from the chronic pain cohort and were only used for evaluating our phenotyping methods. We reviewed patients' records guided by a keyword template based on the Diagnostic and Statistical Manual of Mental Disorders, 5 th Ed. (DSM V) criteria for OUD, 19 the ABC instrument, 1 and others' studies focused on detecting problematic opioid use within EHR data. 11,12,20 We calculated the sensitivity (i.e., recall -proportion of cases with a match), specificity (proportion of controls without a match), precision (i.e., positive predictive value [PPV] -proportion of matches that were cases), negative predictive value ([NPV] proportion of non-matches that were controls), and F1-score (harmonic mean of sensitivity and positive predictive value) of our regular expression scoring system against the manually adjudicated labels (n=100). We examined the pairwise phi coefficients (for binary variables) between each item of the ABC instrument in the entire dataset (n=8,063). We compared the presence of OUD and SUD ICD codes (see Appendix A for full list) present in individuals' records to serve as another source of validation. We used recall-precision curves and area under the receiver operating characteristic curves (AUC) to evaluate performance of both the total ABC score and the OUD ICD codes against the manual reviews.
This study follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines. 21 We acquired ethics approval under Institutional Review Board study #181443 and #201918. All code is publicly available at [unblinded URL after review].
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Cohort Description
Our cohort comprised 8,063 chronic pain patients with 3,485,348 accompanying notes (see Table   1 for demographic data). We identified 150,270 total regular expression pattern matches based on the ABC criteria. 52 patients (0.64%) had no associated notes based on search criteria. 1,329 patients (16.5%) had an SUD ICD code on least 2 days while 714 patients (8.9%) had OUD ICD code on at least 2 days.

ABC Item Performance
Ninety-nine out of 100 patients in the hold-out test set had clinical notes that met inclusion criteria. Of the 20 ABC items, 15 items were associated with positive matches in the test set (see Table 2).
When evaluating each regular expression-based ABC item against the test set, sensitivity ranged 0.00-0.90, specificity ranged 0.43-1.00, positive predictive value ranged 0.00-1.00, negative predictive value ranged 0.49-0.60, and F1-scores ranged 0.00-0.73. The ABC item with the highest F1-score performance was "Patient used illicit drugs or evidences problem drinking. The proportion of patients with an ABC item match was similar between the entire cohort and the Test Set with one exception. The "Patient reports minimal/inadequate relief from narcotic analgesic" item had a greater proportion of matches in the Test Set than the entire cohort. The item-level pairwise phi correlation coefficients from the ABC instrument yielded values between -0.01 through 0.32, indicating low item-level correlation in the entire dataset. The sensitivity (recall) and precision (positive predictive value) of the total ABC score (as compared to manual review) were higher than that of the ICD codes (see Figure 1). The total ABC score achieved F1 values of 0.74 (total score >= 1), 0.73 (total score >= 2), 0.56 (total score >= 3), and an AUC of 0.82 (see Figure 2) compared to the manual review, all of which were equal to or better than the ICDbased system (see below).

ICD Code Performance
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint The prevalence of OUD ICD codes (on at least 2 separate days) among individuals with some or high evidence of OUD was small (3.0% and 5.9%, respectively; see eTable 1). The prevalence of ICD codes for the more generic condition of substance use disorder was higher than that of OUD (15.2% and 47.1%, respectively; see eTable 1). Figure 1 illustrates the sensitivity (recall) versus precision (PPV) of the OUD ICD codes as compared to the manual review and were lower than that of the ABC score. The OUD ICD score achieved an F1 value of 0.08 and an AUC of 0.52 (see Figure 2) compared to the manual review. As the total ABC score increased, the proportion of individuals with an OUD ICD code on at least 2 separate days increased (see Table 3).

Discussion
In this study of patients with chronic pain, we demonstrated the feasibility of using regular expressions (an NLP technique) to automate the ABC instrument for identifying OUD problematic opioid use in clinical notes. To our knowledge, this is the first attempt to automate the ABC instrument. Even with a limited set of the patients' clinical notes for review, the automated approach identified individuals with problematic opioid use that were missed by ICD codes.
Our finding that ICD codes have limitations in identifying problematic opioid use in EHRs aligns with prior work. Clinical text and administrative billing codes have each been consistently shown to provide unique information that can assist with identification of problematic opioid use. 11,12,16 The idea that a single domain of the EHR (e.g., ICD codes, laboratory values, clinical notes) can adequately yield a valid phenotype is increasingly gaining scrutiny. 20,22,23 The ABC instrument has historically been completed by clinicians to assess for risk of aberrant opioid use, with a threshold of 3 indicating potential inappropriate use; 1,24,25 however, some studies have used a threshold of 2 positive items to identify concerning opioid behaviors. 26,27 In this study, we found the best performance (based on F1 measures) was a threshold of 1. While a low cut-off score on a 20-item measure might not be intuitive, it could be a reflection of the highly-diagnostic nature of many of the ABC items for opioid misuse (e.g. buying drugs on the street).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint In previous research, the most frequently endorsed items of the ABC instrument that have been associated with global clinical judgment include: (a) "difficulty with using medication agreement," (b) "increased use of narcotics (since last visit)," (c) "used more narcotics than prescribed," and (d) "patient indicated that s/he 'needs' or 'must have' analgesic meds ."1 In our study, we identified that (a) was associated with no matches in our test set; however, we did identify text matches with (b) and (c), albeit with low F1 scores. We were unable to develop regular expressions that appropriately represented the intent of item (d). This could be due to difficulty in capturing intent using current NLP tools or that some expressions will still necessitate patient self-report. Notably, the best-performing items were "Discussion of analgesic meds was the predominant issue of visit" and "Patient used illicit drugs or evidences problem drinking." The former finding was surprising to us because regular expressions are not typically helpful in understanding a theme or making a generalization about text. For the implementation of this item, we simply required the mention of at least two opioid terms be near each other. The latter finding is perhaps unsurprising because SUD is highly co-morbid with OUD, and while SUD is also rarely reported, SUD behaviors are more frequently documented in the EHR than OUD behaviors. 28 Our study also has its limitations. We used a single medical center's EHR data and in a somewhat homogenous cohort of patients with chronic pain. Keywords determined by OUD subject area experts might not be representative of the variety of language in EHR notes. It is possible additional input from external stakeholders and manual reviews of a larger corpus of notes could generate more expressions that would capture additional examples of representing ABC items in clinical notes, which would also enhance generalizability.

Conclusion
In this study, we leveraged a publicly-available, valid, and reliable instrument for developing our text-based scoring system. Benefits of this method are interpretability (i.e., one can review examples in the chart that match a regular expression), generalizability to other organizations given it can be implemented in multiple software programs, and outperformance of diagnostic codes. Additionally, NLP . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint approaches can serve as one of many approaches to identifying problematic opioid use. For example, a more complex system that also includes laboratory values, coding data, or any other data elements can be constructed. As the regular expressions are improved with input from other investigators and evaluated in larger sets of clinical notes, there is potential for this approach to both automate EHR note reviews and assist in representing problematic opioid use as a continuum rather than a binary condition. Advances in this area will continue to facilitate earlier identification of people at-risk for, and suffering from, problematic opioid use, which will create new opportunities for studying long-term sequelae of opioid pain management.
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint Table 1    . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint Figure 1. Recall-precision curve of combined score from regular expression-based ABC instrument compared to manual review (blue) and ICD codes (orange). Figure 2. Area under the receiver operating characteristic curve of ABC instrument compared to manual review (blue) and ICD codes (orange).
. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Data Sharing Statement
Although the data are de-identified, our data use agreement prohibits sharing raw text data with external entities. Those who would like to review our aggregated data should contact the corresponding author to request a copy of the aggregated data. All code is publicly available at [unblinded URL after review]. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 12, 2023. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 12, 2023. ; https://doi.org/10.1101/2023.06.08.23290894 doi: medRxiv preprint