Abstract
Multiple long-term conditions (MLTCs) or multimorbidity – the co-occurrence of multiple chronic conditions –presents a growing challenge for primary care. Current predictive models often target single outcomes and overlook the complexities of time-to-event risk in real-world, longitudinal health data. Here, we present SurvivEHR, a generative transformer-based foundation model trained on over 7.6 billion coded events from 23 million patients in UK primary care. SurvivEHR introduces a competing risk time-to-event pretraining objective that enables accurate forecasting of future diagnoses, investigations, medications, and mortality. We demonstrate that SurvivEHR achieves strong risk stratification performance, captures clinically meaningful trajectories, and outperforms benchmark survival models across multiple tasks. The model also transfers effectively to fine-tuned prognostic tasks, particularly in low-resource settings. By learning patient trajectories directly from routine health records, SurvivEHR offers a scalable and privacy-preserving approach for building generalisable clinical risk tools that address the complexity of MLTCs in primary care.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This study/project is funded by the National Institute for Health Research (NIHR) Intelligence under its programme Artificial Intelligence for Multiple Long-Term Conditions under the title ``OPTIMising therapies, disease trajectories, and AI assisted clinical management for patients Living with complex multimorbidity" (OPTIMAL study) under Award ID: NIHR202632 (\url{https://fundingawards.nihr.ac.uk/award/NIHR202632}). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. Additional funding was provided by the UK Engineering and Physical Sciences Research Council (Ref: EP/Y018192/1). Christopher Yau is supported by an UKRI Turing AI Acceleration Fellowship (Ref: EP/V023233/1).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Ethical approval for this study was obtained from the Independent Scientific Advisory Committee of Clinical Practice Research DataLink (protocol no. 21_000683).
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Code release policy, data and model availability
The full source code used to pre-process, train, evaluate, and reproduce our experiments is available at http://github.com/cwlgadd/FastEHR and http://github.com/cwlgadd/SurvivEHR under an opensource licence. As the proposed SurvivEHR model uses generative AI, we are not able to distribute the trained model weights to avoid compromising patient privacy and contravening data-sharing agreements as we cannot guarantee that exact copies of real records could not be reproduced by the model. We provide detailed instructions to allow others to retrain the model from scratch on appropriately licensed data. Data for this project was made available by CPRD under study reference ID 21_000683 (https://www.cprd.com/approved-studies/optimising-therapies-and-disease-trajectories-patients-living-complex). Raw data from the study are not publicly available. Data for the study were obtained under licence from CPRD; pseudonymised patient data are available from CPRD subject to Research Data Governance approval; see https://www.cprd.com/how-access-cprd-data for more information. Codelists for CPRD data extraction can be found at http://github.com/THINKINGGroup/phenotypes.





