An informatics approach to profiling patient experiences using electronic health records: constructing and clustering the burden space of individuals under 65 years of age with multiple long-term conditions

Mozhdeh Shiranirad; Zlatko Zlatev; Roberta Chiovoloni; Emilia Holland; Jakub Dylag; Nisreen A. Alwan; Ann Berrington; Michael Boniface; Simon D. S. Fraser; Rebecca B. Hoyle

doi:10.64898/2025.11.27.25341182

Abstract

Living with multiple long-term conditions (MLTC) profoundly impacts patients’ lives, affecting not only their health but also their financial, emotional, and social well-being. It can impose a significant burden on people. Here we take a novel approach, exploring the lived experience of individuals with MLTC by identifying patterns of burden—spanning physical, emotional, social, and financial domains—using machine learning techniques applied to electronic health records (EHR).

We constructed a cohort of 310,990 individuals born between January 1, 1958, and December 31, 1967, all with two or more long-term conditions. Proxy indicators of burden were extracted from EHR data. Using k-means clustering, we identified subgroups of individuals with distinct burden profiles and analyzed the distribution of burden indicators within each cluster.

Several large clusters were characterized by high prevalence of one or more of pain, anxiety, and depression. Most clusters were predominantly female, with females over-represented compared to the overall burden cohort. Socioeconomic disparities were evident: clusters marked by pain had a higher proportion of individuals from the most deprived areas, while clusters characterised by stress or anxiety alone had a higher proportion of those from the least deprived areas. Certain combinations of burden indicators tended to be over-represented in the same clusters, such as pain with mobility problems, and depression with very high A&E arrivals, and separation/divorce.

This study demonstrates the utility of machine learning for uncovering nuanced, patient-centered patterns in the experience of living with MLTC. The clustering approach reveals how different burdens intersect and vary across demographic and socioeconomic lines, offering insights that could inform more tailored and equitable care strategies.

Author summary Although a growing number of people are living with multiple long-term conditions (MLTCs), the nature of the burden faced by individuals and the common patterns of such person-centred burdens remain largely unknown. Previous MLTC studies have often clustered people by their long-term conditions to uncover how these conditions group together in electronic health records (EHRs). However, this approach does not capture the true complexity of MLTCs or their impact on patient experience. In this study, we identified a series of proxy burden indicators, highlighted the challenges of extracting them from EHRs, and developed data-driven methods to uncover important patterns of patient-centred burden within this large, complex space—opening new insights and a fresh research direction for understanding MLTCs. Health systems, policymakers, and clinicians stand to benefit from this study’s findings by gaining clearer insight into the expected challenges faced by different groups living with MLTCs, potentially informing more targeted support, smarter resource allocation, and better care outcomes. Researchers, in turn, benefit from a systematic methodology for clustering patient burden.

Competing Interest Statement

Rebecca B. Hoyle reports a relationship with Smith Institute Ltd that includes: scientific board membership. All other authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding Statement

Yes

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The study was conducted in accordance with the UK Policy Framework for Health and Social Care Research. Ethics approval for this study was obtained from the University of Southampton Faculty of Medicine Ethics committee (ERGO II Reference 66810). The SAIL Databank independent Information Governance Review Panel approved this study (SAIL Project: 1377).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Data Availability

Data may be obtained from a third party and are not publicly available. The data used in this study are available in the SAIL Databank at Swansea University, Swansea, UK. Applications to access data via SAIL can be made following their established process https://saildatabank.com/data/apply-to-work-with-the-data/.

https://saildatabank.com/data/apply-to-work-with-the-data/

The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.