Appl Clin Inform 2020; 11(01): 172-181
DOI: 10.1055/s-0040-1702214
Research Article
Georg Thieme Verlag KG Stuttgart · New York

Detecting Social and Behavioral Determinants of Health with Structured and Free-Text Clinical Data

Daniel J. Feller
1   Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Oliver J. Bear Don't Walk IV
1   Department of Biomedical Informatics, Columbia University, New York, New York, United States
,
Jason Zucker
2   Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Michael T. Yin
2   Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Peter Gordon
2   Division of Infectious Diseases, Department of Internal Medicine, Columbia University Irving Medical Center, New York, New York, United States
,
Noémie Elhadad
1   Department of Biomedical Informatics, Columbia University, New York, New York, United States
› Author Affiliations
Funding This study was funded by the following sources:
National Library of Medicine—T15 LM007079: “Training in Biomedical Informatics at Columbia University.”
National Institute of Allergy and Infectious Diseases—T32AI007531 “Training in Pediatric Infectious Diseases.”
National Institute of General Medical Sciences—R01 GM114355.
Further Information

Publication History

14 October 2019

07 January 2020

Publication Date:
04 March 2020 (online)

Abstract

Background Social and behavioral determinants of health (SBDH) are environmental and behavioral factors that often impede disease management and result in sexually transmitted infections. Despite their importance, SBDH are inconsistently documented in electronic health records (EHRs) and typically collected only in an unstructured format. Evidence suggests that structured data elements present in EHRs can contribute further to identify SBDH in the patient record.

Objective Explore the automated inference of both the presence of SBDH documentation and individual SBDH risk factors in patient records. Compare the relative ability of clinical notes and structured EHR data, such as laboratory measurements and diagnoses, to support inference.

Methods We attempt to infer the presence of SBDH documentation in patient records, as well as patient status of 11 SBDH, including alcohol abuse, homelessness, and sexual orientation. We compare classification performance when considering clinical notes only, structured data only, and notes and structured data together. We perform an error analysis across several SBDH risk factors.

Results Classification models inferring the presence of SBDH documentation achieved good performance (F1 score: 92.7–78.7; F1 considered as the primary evaluation metric). Performance was variable for models inferring patient SBDH risk status; results ranged from F1 = 82.7 for LGBT (lesbian, gay, bisexual, and transgender) status to F1 = 28.5 for intravenous drug use. Error analysis demonstrated that lexical diversity and documentation of historical SBDH status challenge inference of patient SBDH status. Three of five classifiers inferring topic-specific SBDH documentation and 10 of 11 patient SBDH status classifiers achieved highest performance when trained using both clinical notes and structured data.

Conclusion Our findings suggest that combining clinical free-text notes and structured data provide the best approach in classifying patient SBDH status. Inferring patient SBDH status is most challenging among SBDH with low prevalence and high lexical diversity.

Protection of Human and Anmial Subjects

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects, and was reviewed by the Institutional Review Board at Columbia University Medical Center.


Supplementary Material

 
  • References

  • 1 Cohen SM, Hu X, Sweeney P, Johnson AS, Hall HI. HIV viral suppression among persons with varying levels of engagement in HIV medical care, 19 US jurisdictions. J Acquir Immune Defic Syndr 2014; 67 (05) 519-527
  • 2 Facente SN, Pilcher CD, Hartogensis WE. , et al. Performance of risk-based criteria for targeting acute HIV screening in San Francisco. PLoS One 2011; 6 (07) e21813-e21813
  • 3 Haukoos JS, Hopkins E, Bender B, Sasson C, Al-Tayyib AA, Thrun MW. ; Denver Emergency Department HIV Testing Research Consortium. Comparison of enhanced targeted rapid HIV screening using the Denver HIV risk score to nontargeted rapid HIV screening in the emergency department. Ann Emerg Med 2013; 61 (03) 353-361
  • 4 Lauby J, Zhu L, Milnamow M. , et al. Get real: evaluation of a community-level HIV Prevention intervention for young MSM who engage in episodic substance use. AIDS Educ Prev 2017; 29 (03) 191-204
  • 5 Gottlieb LM, Tirozzi KJ, Manchanda R, Burns AR, Sandel MT. Moving electronic medical records upstream: incorporating social determinants of health. Am J Prev Med 2015; 48 (02) 215-218
  • 6 Weir CR, Staggers N, Gibson B, Doing-Harris K, Barrus R, Dunlea R. A qualitative evaluation of the crucial attributes of contextual information necessary in EHR design to support patient-centered medical home care. BMC Med Inform Decis Mak 2015; 15: 30
  • 7 Cantor MN, Chandras R, Pulgarin C. FACETS: using open data to measure community social determinants of health. J Am Med Inform Assoc 2018; 25 (04) 419-422
  • 8 Feller DJ, Zucker J, Don't Walk OB. , et al. Towards the inference of social and behavioral determinants of sexual health: development of a gold-standard corpus with semi-supervised learning. AMIA Annu Symp Proc AMIA Symp 2018; 2018: 422-429
  • 9 Cantor MN, Thorpe L. Integrating data on social determinants of health into electronic health records. Health Aff (Millwood) 2018; 37 (04) 585-590
  • 10 Riese A, Tarr EE, Baird J, Alverson B. Documentation of sexual history in hospitalized adolescents on the general pediatrics service. Hosp Pediatr 2018; 8 (04) 179-186
  • 11 Siegel J, Coleman DL, James T. Integrating social determinants of health into graduate medical education: a call for action. Acad Med 2018; 93 (02) 159-162
  • 12 Andermann A. Screening for social determinants of health in clinical care: moving from the margins to the mainstream. Public Health Rev 2018; 39 (01) 19
  • 13 Dubin SN, Nolan IT, Streed Jr. CG, Greene RE, Radix AE, Morrison SD. Transgender health care: improving medical students' and residents' training and awareness. Adv Med Educ Pract 2018; 9: 377-391
  • 14 Hatef E, Rouhizadeh M, Tia I. , et al. Assessing the availability of data on social and behavioral determinants in structured and unstructured electronic health records: a retrospective analysis of a multilevel health care system. JMIR Med Inform 2019; 7 (03) e13802
  • 15 Gottlieb L, Ackerman S, Wing H, Adler N. Evaluation activities and influences at the intersection of medical and social services. J Health Care Poor Underserved 2017; 28 (03) 931-951
  • 16 Chen ES, Manaktala S, Sarkar IN, Melton GB. A multi-site content analysis of social history information in clinical notes. AMIA Annu Symp Proc 2011; 2011: 227-236
  • 17 Walsh C, Elhadad N. Modeling clinical context: rediscovering the social history and evaluating language from the clinic to the wards. AMIA Jt Summits Transl Sci Proc 2014; 2014: 224-231
  • 18 Bejan CA, Angiolillo J, Conway D. , et al. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J Am Med Inform Assoc JAMIA 2018; 25 (01) 61-71
  • 19 Navathe AS, Zhong F, Lei VJ. , et al. Hospital readmission and social risk factors identified from physician notes. Health Serv Res 2018; 53 (02) 1110-1136
  • 20 McCormick PJ, Elhadad N, Stetson PD. Use of semantic features to classify patient smoking status. AMIA Annu Symp Proc 2008; 2008: 450-454
  • 21 Uzuner O, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical discharge records. J Am Med Inform Assoc 2008; 15 (01) 14-24
  • 22 Savova GK, Ogren PV, Duffy PH, Buntrock JD, Chute CG. Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 2008; 15 (01) 25-28
  • 23 Yetisgen M, Vanderwende L. Automatic identification of substance abuse from social history in clinical text. In: Teije AT, Popow C, Holmes JH, Sacchi L. Artificial Intelligence in Medicine: 16th Conference on Artificial Intelligence in Medicine, AIME 2017. Vienna, Austria: Springer Cham; 2017: 171-181
  • 24 Carter EW, Sarkar IN, Melton GB, Chen ES. Representation of drug use in biomedical standards, clinical text, and research measures. AMIA Annu Symp Proc 2015; 2015: 376-385
  • 25 Carrell DS, Cronkite D, Palmer RE. , et al. Using natural language processing to identify problem usage of prescription opioids. Int J Med Inform 2015; 84 (12) 1057-1064
  • 26 Gundlapalli AV, Carter ME, Palmer M. , et al. Using natural language processing on the free text of clinical documents to screen for evidence of homelessness among US veterans. AMIA Annu Symp Proc 2013; 2013: 537-546
  • 27 Cohen R, Elhadad M, Elhadad N. Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies. BMC Bioinformatics 2013; 14: 10
  • 28 Demner-Fushman D, Elhadad N. Aspiring to unintended consequences of natural language processing: a review of recent developments in clinical and consumer-generated text processing. IMIA Yearb 2016; (01) 224-233
  • 29 Bejan CA, Angiolillo J, Conway D. , et al. Mining 100 million notes to find homelessness and adverse childhood experiences: 2 case studies of rare and severe social determinants of health in electronic health records. J Am Med Inform Assoc 2018; 25 (01) 61-71
  • 30 Arons A, DeSilvey S, Fichtenberg C, Gottlieb L. Documenting social determinants of health-related clinical activities using standardized medical vocabularies. JAMIA Open 2019; 2 (01) 81-88
  • 31 Vest JR, Grannis SJ, Haut DP, Halverson PK, Menachemi N. Using structured and unstructured data to identify patients' need for services that address the social determinants of health. Int J Med Inform 2017; 107: 101-106
  • 32 Erickson J, Abbott K, Susienka L. Automatic address validation and health record review to identify homeless social security disability applicants. J Biomed Inform 2018; 82: 41-46
  • 33 Hripcsak G, Shang N, Peissig PL. , et al. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019; 96: 103253
  • 34 Pedregosa F, Varoquaux G, Gramfort A. , et al. Scikit-learn: machine learning in python. J Mach Learn Res 2011; 12: 2825-2830
  • 35 Liu J, Zhang Z, Razavian N. Deep EHR: chronic disease prediction using medical notes. Available at: http://arxiv.org/abs/1808.04928 . Accessed September 19, 2018
  • 36 Institute of Medicine. Clinical rationale for collecting sexual orientation and gender identity data. In: Collecting Sexual Orientation and Gender Identity Data in Electronic Health Records: Workshop Summary. Washington (DC): National Academies Press (U.S.); 2013
  • 37 Populations I of M (US) B on the H of S. Existing Data Collection Practices in Clinical Settings. National Academies Press (United States). Available at: https://www-ncbi-nlm-nih-gov.ezproxy.cul.columbia.edu/books/NBK154082/ Accessed January 30, 2020
  • 38 Friedman NL, Banegas MP. Toward addressing social determinants of health: a health care system strategy. Perm J 2018; 22: 18-95
  • 39 Taylor LA, Tan AX, Coyle CE. , et al. Leveraging the social determinants of health: what works?. PLoS One 2016; 11 (08) e0160217
  • 40 Perotte A, Ranganath R, Hirsch JS, Blei D, Elhadad N. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J Am Med Inform Assoc 2015; 22 (04) 872-880
  • 41 Scheurwegs E, Luyckx K, Luyten L, Daelemans W, Van den Bulcke T. Data integration of structured and unstructured sources for assigning clinical codes to patient stays. J Am Med Inform Assoc 2016; 23 (e1): e11-e19
  • 42 Miotto R, Li L, Kidd BA, Dudley JT. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci Rep 2016; 6: 26094-26094
  • 43 Liu H. Automatic argumentative-zoning using word2vec. Available at: http://arxiv.org/abs/1703.10152 . Accessed October 12, 2018
  • 44 Rajkomar A, Oren E, Chen K. , et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 2018; 1 (01) 18
  • 45 Halpern Y, Choi Y, Horng S, Sontag D. Using anchors to estimate clinical state without labeled data. AMIA Annu Symp Proc 2014; 2014: 606-615
  • 46 Huang K, Altosaar J, Ranganath R. ClinicalBERT: modeling clinical notes and predicting hospital readmission. Available at: http://arxiv.org/abs/1904.05342 . Accessed May 9, 2019
  • 47 Alsentzer E, Murphy JR, Boag W. , et al. Publicly available clinical BERT embeddings. Available at: http://arxiv.org/abs/1904.03323 . Accessed May 9, 2019
  • 48 Ferraro JP, Ye Y, Gesteland PH. , et al. The effects of natural language processing on cross-institutional portability of influenza case detection for disease surveillance. Appl Clin Inform 2017; 8 (02) 560-580