Abstract
Background Social determinants of health (SDoH) like socioeconomics and neighborhoods strongly influence outcomes, yet standardized SDoH data is lacking in electronic health records (EHR), limiting research and care quality.
Methods We searched PubMed using keywords “SDOH” and “EHR”, underwent title/abstract and full-text screening. Included records were analyzed under five domains: 1) SDoH screening and assessment approaches, 2) SDoH data collection and documentation, 3) Use of natural language processing (NLP) for extracting SDoH, 4) SDoH data and health outcomes, and 5) SDoH-driven interventions.
Results We identified 685 articles, of which 324 underwent full review. Key findings include tailored screening instruments implemented across settings, census and claims data linkage providing contextual SDoH profiles, rule-based and neural network systems extracting SDoH from notes using NLP, connections found between SDoH data and healthcare utilization/chronic disease control, and integrated care management programs executed. However, considerable variability persists across data sources, tools, and outcomes.
Discussion Despite progress identifying patient social needs, further development of standards, predictive models, and coordinated interventions is critical to fulfill the potential of SDoH-EHR integration. Additional database searches could strengthen this scoping review. Ultimately widespread capture, analysis, and translation of multidimensional SDoH data into clinical care is essential for promoting health equity.
Introduction
The concept of social determinants of health (SDoH) acknowledges that health is not simply a product of biological factors or access to medical care, but is profoundly influenced by the social, economic, and physical conditions that shape people’s lives [1]. Research across disciplines including public health, sociology, economics, and medicine provides clear evidence that circumstances in the environments where people exist have a fundamental impact on shaping patterns of health and well-being [2]. The World Health Organization defines SDoH as “the conditions in which people are born, grow, live, work, and age, along with the wider set of forces and systems shaping the conditions of daily life” [3]. These determinants are broadly categorized into five interdependent domains that form the structural and social hierarchies in society: economic stability, neighborhood and built environment, health care access, education access and quality, and social and community context [4,5]
Specifically, adverse SDoH like poverty, unequal access to education, lack of public resources in neighborhoods, high crime rates, racial segregation, and pollution have all been extensively linked to higher rates of morbidity, mortality, and health risk behaviors across populations [1]. On the other hand, protective and promoting SDoH like higher household income, safe green spaces, strong social support, affordable nutrition options, and accessible transportation demonstrate significant associations with positive health indicators, from self-rated health status to reduced prevalence of diabetes to longer life expectancy.
Health outcomes are significantly influenced by more than just clinical encounters; indeed, research suggests that only about 20% of a person’s health outcomes can be attributed to clinical care [6,7]. The majority is determined by a combination of individual behaviors and various external factors that are collectively referred to as SDoH. These “causes of the causes” of health, or upstream social determinants, are estimated to account for up to 55% of population health variation in high-income countries, though some models suggest they may account for as much as 70-80% [7]. Aspects of physical environment, socioeconomic status, race, and gender contribute to systemic inequities that become embodied as adverse outcomes. This makes social determinants fundamental considerations for achieving health equity and overarching population health improvement [1].
Healthy People 2030 in the United States includes objectives to reduce homelessness, increase educational attainment, and improve access to care – aligning with several SDoH domains [8]. The “Improving Social Determinants of Health Act of 2021”[5] further legislates Centers for Disease Control and Prevention [9].
SDoH-driven Translational Research: Deriving and Translating Health Data to Actionable Knowledge into Clinical Care
Incorporating SDoH into clinical practice is essential for achieving health equity, yet these determinants are seldom consistently recorded or collected within or outside of electronic health records (EHRs). This lack of information is problematic because healthcare professionals can only address SDoH with interventions and programs effectively if they are aware of and can access this information.
Given the impact of SDoH on health outcomes and the limitations of existing EHRs and documentation practices, public health, care providers, and clinical researchers have implemented various approaches to standardize the collection, study, and mobilization of knowledge generated from this information into clinical care (see Figure 1). SDoH data, often collected from the community or public health surveys [10], discrete modules within the EHR [11], and patient-reported outcome surveys [12], can be aggregated into a unified SDoH repository, supporting targeted research initiatives. This integration transforms scattered SDoH data into a structured resource for comprehensive study. However, the collection, integration, and utility of this data remain varied and inconsistent across systems. To be useful, this information must be not just integrated, but also standardized using SDoH ontology [13], common representations [14], and value sets [15]. Once integrated and standardized, SDoH information can be studied in association with positive and negative patient health outcomes and be connected to existing programs to address their needs [16]. Leveraging SDoH data within EHRs can activate embedded tools like alerts and flags, guiding interventions like nutrition assistance based on hunger scores [17], community health worker referrals for those in disadvantaged neighborhoods [18], and creating high-risk patient panels for targeted care [19]. Such integration facilitates personalized care management and supports health equity through patient-centric technologies like self-scheduling apps [20] and digital navigators [21], fostering a learning health system that continuously adapts to emerging patient needs and outcomes.
Challenges and Barriers
Integrating SDoH data into EHRs is essential for enabling better decision-making and research to promote health equity, yet significant barriers exist [3,22]. Primarily, data on key factors like food insecurity, housing, transportation, social isolation, and financial strain are inconsistently and incompletely in structured data fields of the EHR [22,23].
Furthermore, healthcare professionals use validated screening tools, but of inconsistent types with different data standards, which limits interoperability across systems [24]. On many occasions, clinical and administrative workflows do not facilitate routine collection and updating of SDoH data over time. [25,26]
From a regulatory standpoint, confidentiality rules combined with patients’ mistrust of how private information is disclosed and shared prevent SDoH data from being shared for extended application [27]. A multi-pronged approach involving policy change, system redesign, and community engagement is imperative to fulfill the potential of SDoH [28].
Preliminary results demonstrated that important SDoH factors such as food insecurity, transportation barriers, and unstable housing were commonly discussed in clinical encounters across a range of specialties. However, these social needs were rarely codified in discrete structured fields. This finding aligns with prior literature suggesting substantial gaps in systematic SDoH data capture in EHR systems [29]. While NLP approaches may help unlock SDoH data trapped in free-text notes, a more robust data collection and integration framework is needed to optimize SDoH data capabilities in EHRs.
Once it is known that a patient is experiencing an SDoH and carries risks of associated poor health outcomes, e.g., 30-day readmission, it is critical for the care providers to connect patients to existing programs and services to mitigate the risk by addressing the disparity. However, it can be challenging to know what programs and services are available to the patient and whether the patient meets enrollment criteria. Clinical decision support systems and other digital health technologies could play a critical role in determining patient eligibility criteria and bringing service recommendations to care providers, e.g., case management or discharge planning teams [30].
While prior works have reviewed aspects of SDoH data collection, NLP approaches, and health impacts separately, a comprehensive overview integrating evidence across domains could further knowledge. This scoping review aimed to map the literature landscape surrounding SDoH-EHR integration to address several key questions:
1) What standardized tools and workflows currently exist for structured SDoH data capture in EHRs?
2) How are external SDoH data sources linked to patient records to enable enriched contextual patient profiles?
3) What NLP solutions show promise for extracting unstructured SDoH insights from clinical notes at scale?
4) What impacts do harmonized SDoH data elements have on predicting health outcomes and targeting interventions?
By systematically searching evidence related to these questions, this review identifies current best practices, remaining gaps, and future directions across the spectrum of SDoH data integration. The goal is to advance standardized frameworks for reliably collecting multidimensional social data within EHRs and translating derived knowledge to improve care delivery and health equity.
Method
Scoping Literature Review
To explore the current landscape of integrating SDoH data into EHRs, we conducted a scoping literature review. This methodological approach allows for a comprehensive overview of a broad field of study, and is particularly suitable for fields that have yet to be comprehensively reviewed or where the literature is large, diverse, and complex.
Search Strategy
The literature search was conducted in PubMed on 2023 May 8th, a widely-recognized database for biomedical literature. We utilized MeSH (Medical Subject Headings) terms to refine our search, focusing on articles indexed with terms “Electronic Health Records” and “Social Determinants of Health.” This combination was chosen to specifically target studies that discuss the intersection of EHRs with SDoH (Search strategy see Supplement Table 1).
Screening and Selection process
The inclusion process involved an initial title and abstract screening phase to identify papers related to SDoH capabilities and EHRs.
Selected papers were categorized into five non-mutually exclusive, key topics to structure our analysis (depicted in Figure 2):
● SDoH Screening Tools and Assessments: Papers discussing various tools and methodologies for screening SDoH.
● SDoH Data Collection and Documentation: Studies focusing on how SDoH data is collected and documented within EHR systems.
● Use of Natural Language Processing (NLP) for SDoH: Research exploring the application of NLP techniques to identify and extract SDoH information from unstructured EHR data.
● Associations between SDoH and Health Outcomes: Papers examining the relationship between SDoH and various health outcomes.
● SDoH Interventions: Studies that evaluate the effectiveness of interventions aimed at addressing SDoH within healthcare settings.
In phase two, junior authors were randomly assigned manuscripts for initial meta-data extraction. The full text screening involved detailed examination to confirm studies had substantive focus on the screening tools, data harmonization techniques, text analytics, associations, or interventions sub-topics. Additional irrelevant studies were excluded in this stage.
Phase three then consisted of evidence synthesis and conflict resolution by senior authors, who double-checked and verified the accuracy and completeness of extractions to enhance consistency. Through this additional quality assurance process, senior authors validated phase one and two results. The PRISMA flow diagram[31] was utilized to depict the screening process, detailing the numbers of identified, included and excluded studies across the systematic search and screening phases, along with reasons for exclusion.
The multi-stage screening and extraction process with independent categorization, full-text meta-data extraction, and consensus meetings helped embed quality checks aligning with scoping evidence best practices.
Results
In this scoping review, we present the findings of SDoH in the EHR according to 5 domains of interest.
Data Collection and synthesis
The initial PubMed query resulted in 685 articles. Articles were reviewed for inclusion based on their titles and abstracts resulting in 415 articles. Of these 415 articles, 324 articles included full text for qualitative synthesis. The reviewed articles were then classified according to SDoH in the EHR domains. The majority of articles focused on SDoH and health outcomes, SDoH data collection and documentation followed by NLP for SDoH, then SDoH screening tools and assessments. In the following sections, we reviewed the major themes and highlighted works for each of the five SDoH in the EHR domains (see Figure 3, percentage see Supplement Table 2).
Aligned with PRISMA guidelines (Tricco [31], the screening process involved an initial title/abstract review phase led by author C.L. to categorize papers into one or more of the 5 topics. Targeted meta-data extraction was performed by assigned reviewers as follows: Screening Assessments (R.Y.), SDoHData Collection (C.L.), NLP Approaches (C.L.), and Interventions (X.M.). The SDoH and Outcomes papers were randomly assigned to the broader reviewer pool (C.L., R.Y., S.H., D.L.M., U.V., H.K.D.) for meta-data extraction. Additional irrelevant studies were excluded in this second phase (see Table 2 for details). The full-text meta-data extraction phase allowed confirmation of accurate categorization and extraction, with discrepancies resolved through consensus meetings. Evidence synthesis leads included: C.L. for SDoH Screening tools and SDoH and health outcomes, D.L.M for SDoH Data collection, NLP for SDoH), X.M. for SDoH Interventions, overseen by senior authors M.J.M. and D.L.M.
SDoH Screening Tools
We included 29 papers (details see Supplement Table 3) incorporating SDoH Screening tools into EHRs in our review.The majority of the studies utilized home-grown tools for screening SDoH. Some studies [29,32–34] developed their own questionnaires and screening sets, reflecting a trend towards customized tools tailored to specific healthcare settings or populations. Vendor-specific tools, like the two-item screening tool [35] integrated into Epic SDoH Wheel, were less common, but still present. The screened determinants varied, but commonly included factors included housing, food insecurity, transportation, and mental health indicators like stress and depression.
Studies targeted a diverse range of populations. For example, children were the focus in some studies [32,36], while adults were the primary subjects in studies [33,37]. Various healthcare settings were represented, from primary care clinics [38,39] to emergency departments [40], as well as school-based clinics [34] indicating the widespread recognition of the importance of SDoH in different medical environments, underscoring the growing acknowledgment of SDoH’s relevance across the healthcare spectrum.
Active screening methods, where healthcare providers proactively administered questionnaires or interviews, were predominant (n = 28) [29,33]. Passive methods like the analysis of EHR data [41] were less common, but they represented an emerging trend. However, the utilization of EHR data for passive screening suggests an emerging trend that could streamline the process in the future. While many studies focused on personal health determinants (n = 19), others also assessed structural determinants like housing quality and social networks [32,39,42]. Not many studies (n = 8), looked at both personal and structural determinants.
Challenges and Opportunities
The reviewed studies collectively highlight the challenges in standardizing SDoH screening across various contexts but also point towards the potential benefits of such screenings to patient care. The diversity in approaches, as seen in studies [29,32–34], reflects the complexity of addressing SDoH in clinical practice. However, it also demonstrates a concerted effort towards more comprehensive patient care. The prevalence of home-grown tools [32,34,39,43] indicates a trend towards customization, tailored to specific patient populations and healthcare settings. This is likely due to the unique needs and circumstances of different patient demographics. The variability in tools and approaches (e.g., the number of questions in studies [29,44], and the use of paper-based vs. EHR-based tools [33,38] highlight the challenges in standardizing SDH screening. This variability could impact the comparability of data and the scalability of successful approaches. Despite the challenges, the focus on SDoH screening illustrates a shift towards patient care. Recognizing and addressing social and behavioral factors [34,45] can lead to more effective healthcare interventions and better health outcomes.
SDoH Data Collection and Documentation
In our review, we identified 76 articles (details see Supplement Table 4) describing SDoH data collection and documentation practices. Studies focused on engagement with populations and leveraged a variety of technologies to support collection and documentation processes. Few studies (n = 9) included interviews of patients and clinicians, engagement with communities, focus groups, town hall meetings and the like [46–54]. Other studies (n = 7) collected SDoH information using paper-based entry, iPads/tablets, patient or clinician-facing web portals and other web-based toolkits and forms [55–61].
Several studies made use of publicly available, external data resources to infer structural SDoH information for a given population. The most common external SDoH data sources linked to EHRs were US census and community survey data (at both patient/individual and area/neighborhood levels), administrative data/claims records, and disease registries. Commonly linked community surveys and systems include the Behavioral Risk Factor Surveillance System (BRFSS), the National Health and Nutrition Examination Survey (NHANES), the National Health Interview Survey (NHIS) [62], the National Institutes of Health PROMIS® (Patient-Reported Outcomes Measurement Information System)[63], the National Survey of Children’s Health (NSCH)[64], the Center for Disease Control Youth Risk Behavior Surveillance System (YRBSS) [65], the Center for Medicare and Medicaid Services (CMS) Accountable Health Communities’ Health-Related Social Needs Screening Tool [66], the National Center for Education Statistics, the Uniform Crime Reports, and the American Community Survey [67–70].
Longitudinal study data included the National Longitudinal Study of Adolescent to Adult Health (Add Health) [71]. Other administrative data sources included the HCUP Nationwide Readmissions Database [72], claims data [73], and Medicaid data warehouse [74]. Few studies describe use of disease-specific registries e.g. cancer registries such as SEER-CMS, SEER-Medicare, SEER-Medicaid [75].
Several studies (n = 8) describe methods for inferring structural SDoH using geocoding of patient addresses and linking to public census tract data [60,76–78] to integrate information related to neighborhood and community-level characteristics (e.g., SES, crime incidence, health facility locations) [79] and neighborhood factors (e.g., poverty level, education, employment status, etc.) [68,80,81].
External data provided various socioeconomic factors (income, education, employment, poverty level, air quality), neighborhood variables (segregation, safety, walkability), and health behaviors (diet, exercise, smoking) [62,76,82–86]. These complemented and expanded the individual-level SDoH data (food/housing security, transportation, interpersonal violence, etc.) captured directly in EHRs [25,26,28,46,73,82,84,87–93].
Few studies (n = 3) aimed to integrate EHR, genomic, and public health data to study the intersection of lifestyle, genetics, and environmental influences [94–96]
Challenges and Opportunities
Although these works highlight the potential for study of personal and structural SDoH, there is considerable effort for systematically collecting, linking, and analyzing SDoH data from external sources together with EHR data at the community, state, and national levels [84,97,98]. Few studies have incorporated the use of common data models for improving standardization and interoperability of collected SDoH data [99–101]. Furthermore, few studies demonstrated how this information could be leveraged to connect individuals with SDoH risk factors to social programs.
NLP in SDoH
In our review, we identified 36 articles (details see Supplement Table 5). describing NLP methods for powering SDoH studies. Many SDoH elements are locked within clinical free-text notes requiring NLP for identification, encoding, and extraction to standardized terminologies such as SNOMED-CT or representations [102]. Few studies focused on lexicon development using methods such as lexical associations, word embeddings, term similarity, and query expansion. Lexicons and regular expressions have been demonstrated to extract SDoH and psychosocial risk factors [103–105], learn distinct social risk factors with mappings them to standard vocabularies and code sets including ICD-9/10, ICD Z codes, Unified Medical Language System (UMLS), and SNOMED-CT. Most articles (n = 15) describe rule-based approaches using regular expressions and/or hybrid machine learning methods leveraging platforms. Some articles (n = 5) highlighted well-known rule-based toolkits and platforms adapted with lexicons and regular expressions for SDoH extraction including Moonstone, Easy Clinical Information Extraction System (EasyCIE), Medical Text Extraction, Reasoning and Mapping System (MTERMS), Queriable Patient Inference Dossier (QPID), and Clinical Event Recognizer (CLEVER) [19,106–109]. Other articles (n = 7) describe rule-based systems paired with traditional machine learning approaches i.e., an ensemble, particularly using NLP systems such as General Architecture for Text Engineering (GATE), Clinical Language Annotation, Modeling, and Processing Toolkit (CLAMP), Extract SDOH from EHRs (EASE), Yale clinical Text Analysis and Knowledge Extraction System (cTAKES), Relative Housing Stability in Electronic Documentation (ReHouSED), and toolkits such as spaCy and medspaCy in conjunction with conditional random fields and support vector machines [110–113]. In contrast, several investigators have leveraged open-source NLP toolkits like spaCy and medspaCy without supervised learners to extract SDoH variables [114–116]. Other studies (n = 19) have solely leveraged traditional supervised and unsupervised learning techniques, support vector machines (SVM), logistic regression (LR), Naïve Bayes, Adaboost, Random Forest, XGBoost, Bio-ClinicalBERT, Latent Dirichlet Allocation (LDA), and bidirectional Long Short-Term Memory (BI-LSTM) [16,117–121] to extract and standardize social and behavioral determinants of health (SBDoH), e.g., alcohol abuse, drug use, sexual orientation, homelessness,substance use, sexual history, HIV status, drug use, housing status, transportation needs, housing insecurity, food insecurity, financial insecurity, employment/income insecurity, insurance insecurity, and poor social support. In more recent years, studies (n = 9) have focused on the training and tuning of deep learning approaches, primarily transformer-based approaches i.e., Bidirectional Encoder Representations from Transformers (BERT), RoBERTa, BioClinical-BERT models for extracting SBDoHs including relationship status, social status, family history, employment status, race/ethnicity, gender, social history, sexual orientation, diet, alcohol, smoking housing insecurity, unemployment, social isolation, and illicit drug use—from clinical notes, PubMed, among other specialised texts e.g., LitCOVID [122–131]. The frequency of papers for SDoH extraction NLP algorithms within EHR systems, highlighting the combinations and intersections of utilized methodologies can be found in Figure 4.
Challenges and Opportunities
Although these works highlight the potential for extracting SDoH from texts, several challenges remain. Few studies focused on lexicon development, make use of standard terminologies for encoding SDoH data, and explore deep extraction and representation of SDoH attributes and relationships. Also, many studies focus on extraction and encoding SDoH data from a single site and fail to assess the portability of methods to new textual data sources beyond clinical notes and PubMed articles such as digital technologies and chatbots. The introduction of shared datasets like the Social History Annotated Corpus (SHAC) dataset is an important step towards demonstrating generalizability of NLP-powered, SDoH extraction systems. Emerging generative models may also improve upon the state-of-the-art demonstrated by common shared task datasets.
SDoH and Health Outcomes
In our review, we identified 164 articles (details see Supplement Table 6). describing SDoH and health outcomes. SDoH and health outcome studies examined a wide variety of health-related events and outcomes in relation to SDoH factors. A predominant focus was on infectious disease outcomes, with 16 studies examining drivers of COVID-19 hospitalization, mortality, treatment disparities and differences in positivity rates across social groups [132–135]. Another major category included healthcare utilization metrics like preventable hospital readmissions (n = 11) [136–145], ED reliance (n = 16) [146–168], and telehealth adoption [169]. Beyond infectious outcomes and healthcare utilization, studies also assessed chronic disease control across conditions like diabetes (n = 11) [170–180], hypertension [173,181,182], kidney disease [183,184], and obesity (n = 7) [146,175,185–189], along with risk factors like elevated blood pressure and cardiovascular events. Some studies focused on cancer (n = 11) screening, diagnoses, treatment disparities and survival outcomes [167,190–199], while others addressed mental health (n = 6) indicators [146–151] ranging from dementia incidence [200] to suicide (n = 2) risk factors [201,202]. Additional outcomes evaluated included maternal morbidity (n = 2) [203,204], pediatric health metrics ranging from vaccine completion rates to epilepsy-related consequences.
The most common quality measures reported were standardized condition control thresholds like HbA1c levels for diabetes control [173,205], blood pressure (n = 5) levels for hypertension control, and established cancer staging guidelines [191]. Some studies used validated risk prediction models for outcomes like hospital readmissions (n = 5), suicide risk [201] or mortality (n = 6). Beyond clinical indicators, several studies incorporated validated SDoH indexes like the Area Deprivation Index [206,207], Social Deprivation Index [208] and CDC’s Social Vulnerability Index[209]. In terms of analysis approaches, common methods included multivariate regression models like logistic regression (n = 13) [159,179,180,192,199,210–217] and Cox proportional hazards models (n = 3) [138,196,212] to assess adjusted outcome associations with SDoH factors. Other advanced techniques leveraged included machine learning algorithms [215], geospatial analysis for clustering [218], and phenome-wide association studies [150] in select studies.
SDoH Interventions
A total of 19 papers (details see Supplement Table 7). were found that collected supplementary SDoH data to support population health intervention initiatives targeting hospitals/clinics (n = 10) or communities (including primary care, n = 9) at the meso (institution) level. Two articles discussed policy potential and proposed policy reform at the macro (system) level [219,220]. The majority of the selected research (n = 16) focused on implementing a social and healthcare supportive program to address the social needs of the target population. Interventions were implemented in various settings for hospital-based initiatives, including post hospital discharge [221], the emergency department [222], and clinics specializing in different medical disciplines [223–227]. On the other hand, community-based initiatives concentrated mainly on integrating interventions into primary care services [228–231]. The social and healthcare supportive programs included a range of initiatives designed towards improving community health. These initiatives encompassed the introduction of new healthcare programs [232,233], health education and coaching [225,232,234], the strengthening of medical-legal partnerships [228], the enhancement of integrated care planning [229,231,235] and the improvement of patient navigation [222,236,237]. Only a few papers have examined the potentials of incorporating the family or social support element into their intervention design [223,232,234]. Meanwhile, 4 studies investigated the potential for enhancing resource allocation through surveying outcomes. The improvement objectives encompassed the allocation of staff and equipment [221], the enhancement of patient navigation [224], and the transformation of health service practices [226,238].
Health outcomes, such as improvements in health metrics, reductions in disease incidence, changes in vital signs, and quality of life, are commonly used as measures to determine the feasibility of initiating an intervention (n = 10). Several studies have also assessed social SDoH in relation to patient satisfaction and acceptability [230,235,237,238]. Since most programs were new efforts, it was not possible to determine the effectiveness of the intervention in the short term, nor could the potential generalizability be assessed.
Discussion
This scoping review set out to map SDoH-EHR integration literature across five key domains: structured data capture tools, external data linkage approaches, NLP-based extraction techniques, and applications for outcomes analysis, and health care interventions. Our discussion synthesized major themes and collective gaps within each sphere. Regarding our first aim, predominant tailored screening instruments enable assessment but standardization barriers persist. For the second objective, enriching patient profiles via claims and census linkage shows promise but systematic consolidation is lacking. On research question three, rule-based systems boast precision while neural networks improve unstructured element recognition—yet reproducibility hurdles remain. Finally, concerning predicting outcomes and targeting programs, consistent risk evidence conflicts with implementation uncertainty. Across objectives, we fulfill vital scoping aims in benchmarking maturation levels and specifying needs for advancing frameworks. Our cross-domain perspective illuminates such interdependencies requiring a “full-stack” approach. Thereby, we strengthen capacity for wisely collecting and translating multifaceted social data to guide health equity solutions.
Key Findings by Theme
SDoH Screening Tools
The studies implemented a variety of screening tools to assess patients’ social determinants of health (SDoH) across diverse healthcare settings. The most prevalent SDoH domains screened included housing instability/insecurity, food insecurity, transportation and utility service needs, interpersonal safety/violence, financial resource strain, social isolation, health literacy, and education level. Notably, the majority utilized home-grown screening instruments developed within their health systems rather than relying on standardized validated tools. The most commonly referenced standardized tools were PRAPARE, the National Academy of Medicine’s recommendations, and the Center for Medicare and Medicaid Services’ Accountable Health Communities screening tool. Researchers tested these tools across primary care clinics including family medicine, internal medicine, pediatrics, obstetrics/gynecology, and community health centers. Some studies also examined screening acceptance in emergency departments and inpatient units. While most relied on active screening conducted by providers during visits, several studies explored passive methods like paper questionnaires or electronic tablets. The instruments largely focused on assessing individual-level SDoH rather than community or structural factors. However, a minority did attempt to capture measures of both. Across diverse populations and implementation strategies, researchers found SDoH screening feasible to conduct and able to successfully identify unmet social needs amongst patients. Still further research remains warranted, particularly regarding optimal referral systems and interventions to address identified needs.
SDoH Data Collection and Documentation
The current body of research demonstrates that numerous external data sources have been linked with electronic health records (EHRs) to enhance the capture of social determinants of health (SDoH). The most common linkages have connected individual and area-level census metrics, community surveys, administrative claims data, and disease registries to EHRs. Other integrated sources encompassing geospatial data, crime statistics, built environment factors, education data, and proprietary population health platforms offer additional context. Qualitative interviews and surveys have also supplemented patient SDoH insights not routinely documented in the course of clinical care [239]. These external data elements provide critical information surrounding socioeconomic position, neighborhood landscape, and health behaviors. By uniting individual and ecological variables, researchers can assemble more holistic patient profiles to advance risk stratification, outcomes studies, care coordination, and health equity initiatives [27,240].
Technical approaches underlying external SDoH data integration with EHRs have harnessed geocoding of addresses, aggregation of community measures, and linkage based on unique identifiers. While progress has occurred, further research must promote systematic collection, analysis, and application of integrated data sources in practice. Key steps for the field center on implementing reliable linkage mechanisms for disparate datasets and embedding multidimensional patient social profiles within clinical decision tools and workflows [26,241,242]. Only through purposeful integration and translation efforts can external SDoH data fully support identification of at-risk populations, patient-centered risk assessments, and targeted community-clinical interventions.
NLP in SDoH
A range of natural language processing (NLP) approaches have been leveraged to identify critical social determinants from unstructured clinical notes. Common methods include rule-based systems using expert-curated lexicons and regular expressions as well as supervised machine learning models like convolutional neural networks and recurrent neural networks. More advanced studies have also employed contextual embedding models such as BERT and achieved promising performance. Both generic NLP software libraries and custom systems tailored to the social and behavioral health domains have been implemented. Reported accuracy metrics vary substantially by model type and target social determinant, but precision and recall generally exceed 80% for housing insecurity, occupations, and selected social risks. Simpler models boast high precision whereas recent neural networks improve sensitivity in capturing key entities from free-text fields. Importantly, these NLP approaches recognize more patient social factors than structured EHR data alone to enable richer risk assessments and interventions.
In tandem with advanced informatics solutions, a range of interventions have been tested to address identified social needs and related disparities. These encompass social programs like community health initiatives, resource referrals and patient navigation services, integrated care management models, and group education sessions with peer support. Certain studies have allocated supplementary resources such as medical staff or equipment to vulnerable groups. At times, system-level policy changes are required to promote health equity. Such initiatives have been executed across community, clinic, and health system settings – each with their own merits and challenges. While many efforts underline promising impact on knowledge, self-efficacy, and even select health outcomes, more evidence surrounding sustainability, scalability, and implementation barriers is required.
Moving forward, better standardized corpora are needed to develop and evaluate reusable NLP systems for social domains. Similarly, integration of validated SDoH screening workflows rather than one-off analyses will facilitate routine practice. Ontologies, shareable custom systems, and improved linkages to longitudinal outcomes can further the field. In tandem, more rigorous assessment of multi-sector SDoH interventions and targeting of specific mechanisms of impact will maximize reach across at-risk populations.
SDoH and Health Outcomes
This review provides useful insights into current approaches and gaps in research on SDoH and health outcomes. Most studies were retrospective analyses examining links between neighborhood disadvantage, food and housing issues, healthcare access barriers and related problems. They studied connections to healthcare use, chronic illness control, and infection diseases. Some assessed mental health, cancer, mortality but less so. Neighborhood factors were studied most among social elements, followed by individual/family-level food and housing problems.
COVID-19 has stimulated greater attention to health disparities and social determinants. Nearly 20 studies examined links between deprivation, barriers, instability and infection rates, severity, and outcomes. Consistently higher COVID-19 risks and deaths among minorities, low-income groups, and those with prior conditions underscored existing inequitable distribution of social risks. Researchers leveraged diverse data from records, census indices, and surveys to quantify the disproportionate pandemic burden on disadvantaged groups. Studies also displayed sophisticated applications of predictive analytics and machine learning to model dynamics. This crisis has expanded SDoH data infrastructure and methodology while underscoring long-term disparities. Assessing pandemic response and recovery across social levels is critical, as disruptions threaten to worsen effects on vulnerable groups.
Techniques like regression modeling helped characterize adjusted outcome associations. But prediction and more advanced analytics were less common. There is some consistent evidence tying greater social deprivation to poorer health across conditions. But differences across settings and mixed findings remain. More controlled, computational research could better uncover precise interactions. Clearer reporting on predictive models applied could also help comparisons to guide health equity solutions.
We see more screening for patient social needs in clinics now. But we still need standardized processes and data integration into records for tracking over time. Currently observational analyses dominate. Moving forward, translating findings into community initiatives for at-risk groups remains vital.
Analyzing Real-World Data in healthcare, especially with the integration of SDoH, is challenging due to the lack of standardized frameworks and complex nature of SDoH data. Developing analytic guidelines is crucial for effectively navigating these complexities, enabling the extraction of actionable insights that inform healthcare decisions and strategies. Such advancements are essential for health outcomes research, as they provide a foundation for creating more effective, evidence-based interventions and policies that consider the broad influences of social factors on health.
SDoH Interventions
An intervention in response to health care-related issues may transpire at the micro, meso, and macro levels, namely patient care, healthcare institutions, and healthcare policy [243–245]. According to the study, SDoH recognition facilitates intervention programs more at the meso level, such as primary care and specialist referrals, patient navigation services, integrated care management, group education sessions, and medical staff or equipment allocation. A few policy modifications piloted macro-level practice transformations. Only about half of the SDoH-navigated programs emphasized family/peer support and social connections for individual patients (micro-level) as part of the intervention.
The majority of interventions had some benefit, but because they only addressed a few targets in one health system, their generalizability was restricted. This finding holds true in the US, where health delivery systems are fragmented as regional networks. It also demonstrates the challenge of implementing interventions for vulnerable populations [246,247], whose demographics vary across communities on account of cultural and geographical factors. It is imperative for healthcare professionals to bear in mind that identifying SDoH within a community is the initial stage. Establishing connections between individuals who are grappling with health issues and social issues can be challenging due to mental or physical difficulties. The process of establishing trust with vulnerable individuals is iterative and requires ongoing emotional and material input from healthcare professionals in collaboration with community social workers. Besides, in the formation of a regional support network, stakeholders forge lasting partnerships with organizations holding the necessary resources to address challenges identified by SDoH (e.g., accommodation, and transportation).
For more effective intervention, a mature plan with SDoH collection tools embedded in an EHR system should be adopted and modified to reflect the composition and needs of the population being surveyed. The panel should select the platform for survey distribution and storage in order to prevent duplication of effort and data loss and ensure the program’s sustainability. They should ensure that the data utilized to inform interventions is collected directly from the population undergoing the intervention. Utilizing a pilot survey to pre-test a data collection instrument in line with a co-design methodology is suggested [248,249]. Co-designing is a collaborative effort aimed at establishing a conduit for vulnerable individuals to develop a more supportive environment, thereby mitigating the occurrence of unanticipated obstacles.
Cross-Cutting Insights
Despite some progress, barriers like screening variability [250], data silos [251], predictive model opacity [252], and program adoption challenges [246] restrict reliable, equitable SDoH data usage. Presently, expertise resides in siloes—screening, linkage, extraction, analyses, programs. But real optimization depends on an integrated “full-stack” approach recognizing interdependencies. For example, incorporating extensive SDoH variables strains statistical models in outcomes research, demanding guidance on principled variable selection and penalization procedures. Breaking down walls between activities could promote a comprehensive platform spanning the spectrum from collection to application.
This demands rethinking workflows and breaking down antiquated interdependency barriers that isolate screening from risk detection or hinder integrating contextual data into real-world utilities [26]. Architectural paradigms promoting modular reuse with transparency safeguards can help reconfigure fragmented efforts into an interoperable ecosystem better serving those with disproportionate risk. [253] Since conducting searches, exponential growth in large language models (LLMs) has occurred, presenting new opportunities [254,255]. LLM-based extraction and temporal reasoning models could facilitate reliable SDoH entity recognition across contexts [254]. As research continues, embracing interoperable design principles and controlled evaluation around representative datasets, model transparency, and equitable outcomes remains vital [252].
Limitations
This scoping review faces certain limitations in comprehensively capturing the state of SDoH data integration into EHRs. Firstly, while efficient, relying solely on PubMed for literature searches and English papers only may introduce selection bias, omitting potentially relevant research indexed in other databases. Supplementing with sources like SCOPUS or Web of Science could have revealed additional insights and applications.
Second, due to resource constraints, the metadata extraction from the final set of included studies was completed by a single reviewer. Having dual independent extraction with consensus meetings is ideal to ensure accuracy and completeness of scoping review data abstractions. The feasibility and impact of implementing a second reader should be evaluated in future updates to strengthen robustness.
Finally, heterogeneity across settings, populations, tools, and outcomes creates complexity in evaluating SDoH-EHR integration maturity. Varying implementation stages and study designs introduce difficulty benchmarking best practices. The scoping methodology prioritized inclusiveness over appraising integration quality, leaving gaps in assessing real-world effectiveness. Capturing nuanced, multi-dimensional integration processes by diverse healthcare systems persists as a challenge, though framework refinement helps structure insights.
Conclusion
Overall, while collecting patient social contexts shows immense potential to rectify health disparities, realizing possibilities requires ongoing informatics innovation alongside economic investments and policy reforms targeting root societal drivers. This review contributes an evidence base for such continued progress in wisely applying multidimensional SDoH data to promote health equity.
The integration of SDoH data into healthcare practice holds transformative potential for addressing health disparities. Realizing this potential demands continued innovation, strategic investment, and policy evolution, guided by the evidence and insights garnered from comprehensive SDoH research.
Data Availability
Extracted metadata for all articles is available upon reasonable request to the authors. We will release the data online if published by journal.
Contributorship Statement
M.J.B. C.L., and D.L.M: Conceptualized the study design and methodology.
C.L., D.L.M., X.M., S.H, and M.J.B.: Led data synthesis and analysis as well as initial manuscript drafting.
C.L., R.Y., S.H., D.L.M., U.V., H.K.D., and P.F.: Contributed to metadata extraction from selected articles.
C.L., Z.A. and H.K.D.: Developed visualizations and tables to summarize key data.
All authors contributed intellectually to the interpretation of findings, critically revised the manuscript, gave final approval of the version to be published, and agree to be accountable for the work.
Supplement
Supplement Table 3 – Meta Data – SDoH Screening and Assessment
Supplement Table 4 – Meta Data – SDoH Data Collection and Documentation
Supplement Table 5 – Meta Data – NLP for SDoH
Supplement Table 6 – Meta Data – SDoH and Health Outcomes
Supplement Table 7 – Meta Data – SDoH Intervention
Acknowledgements
We thank the National Institutes of Health for supporting this research: R01-HL162354, K24-HL167127-01A1, and P50MH127511.
Footnotes
There are no conflicts of interest to declare.
We have to edit the authorship statement and author order.
References
- 1.↵
- 2.↵
- 3.↵
- 4.↵
- 5.↵
- 6.↵
- 7.↵
- 8.↵
- 9.↵
- 10.↵
- 11.↵
- 12.↵
- 13.↵
- 14.↵
- 15.↵
- 16.↵
- 17.↵
- 18.↵
- 19.↵
- 20.↵
- 21.↵
- 22.↵
- 23.↵
- 24.↵
- 25.↵
- 26.↵
- 27.↵
- 28.↵
- 29.↵
- 30.↵
- 31.↵
- 32.↵
- 33.↵
- 34.↵
- 35.↵
- 36.↵
- 37.↵
- 38.↵
- 39.↵
- 40.↵
- 41.↵
- 42.↵
- 43.↵
- 44.↵
- 45.↵
- 46.↵
- 47.
- 48.
- 49.
- 50.
- 51.
- 52.
- 53.
- 54.↵
- 55.↵
- 56.
- 57.
- 58.
- 59.
- 60.↵
- 61.↵
- 62.↵
- 63.↵
- 64.↵
- 65.↵
- 66.↵
- 67.↵
- 68.↵
- 69.
- 70.↵
- 71.↵
- 72.↵
- 73.↵
- 74.↵
- 75.↵
- 76.↵
- 77.
- 78.↵
- 79.↵
- 80.↵
- 81.↵
- 82.↵
- 83.
- 84.↵
- 85.
- 86.↵
- 87.↵
- 88.
- 89.
- 90.
- 91.
- 92.
- 93.↵
- 94.↵
- 95.
- 96.↵
- 97.↵
- 98.↵
- 99.↵
- 100.
- 101.↵
- 102.↵
- 103.↵
- 104.
- 105.↵
- 106.↵
- 107.
- 108.
- 109.↵
- 110.↵
- 111.
- 112.
- 113.↵
- 114.↵
- 115.
- 116.↵
- 117.↵
- 118.
- 119.
- 120.
- 121.↵
- 122.↵
- 123.
- 124.
- 125.
- 126.
- 127.
- 128.
- 129.
- 130.
- 131.↵
- 132.↵
- 133.
- 134.
- 135.↵
- 136.↵
- 137.
- 138.↵
- 139.
- 140.
- 141.
- 142.
- 143.
- 144.
- 145.↵
- 146.↵
- 147.
- 148.
- 149.
- 150.↵
- 151.↵
- 152.
- 153.
- 154.
- 155.
- 156.
- 157.
- 158.
- 159.↵
- 160.
- 161.
- 162.
- 163.
- 164.
- 165.
- 166.
- 167.↵
- 168.↵
- 169.↵
- 170.↵
- 171.
- 172.
- 173.↵
- 174.
- 175.↵
- 176.
- 177.
- 178.
- 179.↵
- 180.↵
- 181.↵
- 182.↵
- 183.↵
- 184.↵
- 185.↵
- 186.
- 187.
- 188.
- 189.↵
- 190.↵
- 191.↵
- 192.↵
- 193.
- 194.
- 195.
- 196.↵
- 197.
- 198.
- 199.↵
- 200.↵
- 201.↵
- 202.↵
- 203.↵
- 204.↵
- 205.↵
- 206.↵
- 207.↵
- 208.↵
- 209.↵
- 210.↵
- 211.
- 212.↵
- 213.
- 214.
- 215.↵
- 216.
- 217.↵
- 218.↵
- 219.↵
- 220.↵
- 221.↵
- 222.↵
- 223.↵
- 224.↵
- 225.↵
- 226.↵
- 227.↵
- 228.↵
- 229.↵
- 230.↵
- 231.↵
- 232.↵
- 233.↵
- 234.↵
- 235.↵
- 236.↵
- 237.↵
- 238.↵
- 239.↵
- 240.↵
- 241.↵
- 242.↵
- 243.↵
- 244.
- 245.↵
- 246.↵
- 247.↵
- 248.↵
- 249.↵
- 250.↵
- 251.↵
- 252.↵
- 253.↵
- 254.↵
- 255.↵