Abstract
Introduction Acute myocardial infarction (AMI) remains a leading cause of mortality, with the coexistence of other conditions (i.e., multimorbidity) complicating management and outcomes. Currently, healthcare providers see major challenges in consideration of the patient with a multimorbid profile, especially as this is a progressive issue where the temporal evolution of diseases is complex in nature, with a profound impact on clinical outcomes.
Methods Data on 12,701 AMI patients from the UK Biobank were selected for analysis from the cohort of 502,000 volunteers and then grouped into pre- (up to 1 year prior) and early (within 5 years) post-AMI periods. Using Dynamic Time Warping (DTW) clustering, sequences of ICD-10 diagnoses accumulated over time in the post-AMI period were used to cluster participants. Topic modelling of cluster-specific diagnoses informed thematic labels for these profiles (clusters) of AMI patients. Using data from pre-AMI, along with socio-demographic variables (age, IMD score, BMI, and sex), four predictive supervised models, namely, Logistic Regression, Random Forest, XGBoost, and CatBoost, were developed, with CatBoost achieving the highest accuracy for profile membership prediction. Model interpretability via SHapley Additive exPlanations (SHAP) identified key diagnostic categories that were driving profile assignments. Then, survival analyses compared SMART (Second Manifestations of Arterial Disease) risk scores across the profiles, adjusting for clinical covariates to evaluate adverse cardiovascular outcomes - death. Finally, Phenome-Wide Association Studies (PheWAS) were employed to link profile-specific diagnostic themes to underlying genetic mechanisms.
Results Using the above approaches, three multimorbidity profiles were identified in the post-AMI period: Acute cardio-renal-respiratory instability with chronic metabolic disease (ACUTE-CARD), Cardiometabolic disease with mixed arrhythmic-ischemic burden (CARDIOMIX), and Smoking-related cardiovascular disease with multimorbidity (SMO-CARD). CatBoost predicted profile membership with AUROC 0.77. Participants in the SMO-CARD cluster showed the highest rates of mortality, while ACUTE-CARD had the most favourable outcomes (SMART risk score = 11.2, and 6.8% CVD deaths). SMO-CARD displayed a broad range of cardiopulmonary and systemic associations. PheWAS revealed profile-specific genetic associations and pathway enrichments were consistent with clinical features; for example, cardiometabolic genes were associated with the CARDIOMIX cluster, and immune-related pathways were associated with SMO-CARD, supporting the biological plausibility of these profiles.
Conclusion Integrating temporal clustering with explainable machine learning reveals distinct multimorbidity patterns in AMI patients. This framework supports personalised risk stratification and outcome prediction in clinical care.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study did not receive any funding
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
UK Biobank
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data Availability Statement
This study was conducted using data from the UK Biobank under approved Application Number 83988. The UK Biobank dataset is not publicly available due to participant privacy protections and data governance restrictions. Researchers may apply for access to the UK Biobank resource through the established application process at: https://www.ukbiobank.ac.uk/enable-your-research/apply-for-access
Derived data products (including cluster labels, aggregated feature tables, and model outputs) generated during this study may be shared upon reasonable request to the corresponding author, subject to UK Biobank’s data sharing policies and ethical approval requirements. No individual-level data can be shared.





