Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records

Charles Gadd, Krishna Gokhale, Aditya Acharya, Jennifer Cooper, Francesca Crowe, Leah Fitzsimmons, Thomas Jackson, Krishnarajah Nirantharakumar, View ORCID ProfileChristopher Yau, The OPTIMAL collaborative
doi: https://doi.org/10.1101/2025.08.04.25332916
Charles Gadd
1Nuffield Department for Women’s & Reproductive Health, University of Oxford, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Krishna Gokhale
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Aditya Acharya
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer Cooper
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Francesca Crowe
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Leah Fitzsimmons
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Thomas Jackson
2University of Birmingham, Birmingham, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Krishnarajah Nirantharakumar
2University of Birmingham, Birmingham, UK
3Kings College London, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher Yau
1Nuffield Department for Women’s & Reproductive Health, University of Oxford, Oxford, UK
4Health Data Research UK, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Christopher Yau
  • For correspondence: christopher.yau{at}wrh.ox.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Multiple long-term conditions (MLTCs) or multimorbidity – the co-occurrence of multiple chronic conditions –presents a growing challenge for primary care. Current predictive models often target single outcomes and overlook the complexities of time-to-event risk in real-world, longitudinal health data. Here, we present SurvivEHR, a generative transformer-based foundation model trained on over 7.6 billion coded events from 23 million patients in UK primary care. SurvivEHR introduces a competing risk time-to-event pretraining objective that enables accurate forecasting of future diagnoses, investigations, medications, and mortality. We demonstrate that SurvivEHR achieves strong risk stratification performance, captures clinically meaningful trajectories, and outperforms benchmark survival models across multiple tasks. The model also transfers effectively to fine-tuned prognostic tasks, particularly in low-resource settings. By learning patient trajectories directly from routine health records, SurvivEHR offers a scalable and privacy-preserving approach for building generalisable clinical risk tools that address the complexity of MLTCs in primary care.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This study/project is funded by the National Institute for Health Research (NIHR) Intelligence under its programme Artificial Intelligence for Multiple Long-Term Conditions under the title ``OPTIMising therapies, disease trajectories, and AI assisted clinical management for patients Living with complex multimorbidity" (OPTIMAL study) under Award ID: NIHR202632 (\url{https://fundingawards.nihr.ac.uk/award/NIHR202632}). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. Additional funding was provided by the UK Engineering and Physical Sciences Research Council (Ref: EP/Y018192/1). Christopher Yau is supported by an UKRI Turing AI Acceleration Fellowship (Ref: EP/V023233/1).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Ethical approval for this study was obtained from the Independent Scientific Advisory Committee of Clinical Practice Research DataLink (protocol no. 21_000683).

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Code release policy, data and model availability

The full source code used to pre-process, train, evaluate, and reproduce our experiments is available at http://github.com/cwlgadd/FastEHR and http://github.com/cwlgadd/SurvivEHR under an opensource licence. As the proposed SurvivEHR model uses generative AI, we are not able to distribute the trained model weights to avoid compromising patient privacy and contravening data-sharing agreements as we cannot guarantee that exact copies of real records could not be reproduced by the model. We provide detailed instructions to allow others to retrain the model from scratch on appropriately licensed data. Data for this project was made available by CPRD under study reference ID 21_000683 (https://www.cprd.com/approved-studies/optimising-therapies-and-disease-trajectories-patients-living-complex). Raw data from the study are not publicly available. Data for the study were obtained under licence from CPRD; pseudonymised patient data are available from CPRD subject to Research Data Governance approval; see https://www.cprd.com/how-access-cprd-data for more information. Codelists for CPRD data extraction can be found at http://github.com/THINKINGGroup/phenotypes.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC 4.0 International license.
Back to top
PreviousNext
Posted August 06, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records
Charles Gadd, Krishna Gokhale, Aditya Acharya, Jennifer Cooper, Francesca Crowe, Leah Fitzsimmons, Thomas Jackson, Krishnarajah Nirantharakumar, Christopher Yau, The OPTIMAL collaborative
medRxiv 2025.08.04.25332916; doi: https://doi.org/10.1101/2025.08.04.25332916
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
SurvivEHR: a competing risks, time-to-event foundation model for multiple long-term conditions from primary care electronic health records
Charles Gadd, Krishna Gokhale, Aditya Acharya, Jennifer Cooper, Francesca Crowe, Leah Fitzsimmons, Thomas Jackson, Krishnarajah Nirantharakumar, Christopher Yau, The OPTIMAL collaborative
medRxiv 2025.08.04.25332916; doi: https://doi.org/10.1101/2025.08.04.25332916

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (614)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15277)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6644)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4603)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15960)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5494)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (535)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)