Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models

View ORCID ProfileMaria Herrero-Zazo, View ORCID ProfileVictoria L Keevil, Vince Taylor, Helen Street, View ORCID ProfileAfzal N Chaudhry, View ORCID ProfileTomas Fitzgerald, View ORCID ProfileJohn Bradley, View ORCID ProfileEwan Birney
doi: https://doi.org/10.1101/2021.06.18.21258885
Maria Herrero-Zazo
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
2Department of Medicine for the Elderly, Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Maria Herrero-Zazo
  • For correspondence: birney@ebi.ac.uk
Victoria L Keevil
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
2Department of Medicine for the Elderly, Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, UK
3Department of Medicine, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Victoria L Keevil
  • For correspondence: birney@ebi.ac.uk
Vince Taylor
4Cambridge Clinical Informatics, Addenbrooke’s Hospital, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Helen Street
5Research and Development, Cambridge University Hospitals NHS Foundation Trust, Hills Road, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Afzal N Chaudhry
3Department of Medicine, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge, UK
6NIHR Cambridge Biomedical Research Centre, Cambridge Biomedical Campus, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Afzal N Chaudhry
Tomas Fitzgerald
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Tomas Fitzgerald
John Bradley
3Department of Medicine, University of Cambridge, Addenbrooke’s Hospital, Hills Road, Cambridge, UK
6NIHR Cambridge Biomedical Research Centre, Cambridge Biomedical Campus, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for John Bradley
Ewan Birney
1European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ewan Birney
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

The implementation of Electronic Health Records (EHR) in UK hospitals provides new opportunities for clinical ‘big data’ analysis. The representation of observations routinely recorded in clinical practice is the first step to use these data in several research tasks. Anonymised data were extracted from 11 158 first emergency admission episodes (AE) in older adults. Irregular records from 23 laboratory blood tests and vital signs were normalized and regularised into daily bins and represented as numerical multivariate time-series (MVTS). Unsupervised Hidden Markov Models (HMM) were trained to represent each day of each AE as one of 17 state spaces. The visual clinical interpretation of these states showed remarkable differences between patients who died at the end of the AE and those who were discharged. All states had marked features that allowed their clinical interpretation and differentiation between those associated with the patients’ disease burden, their physiological response to this burden or the stage of admission. The most evident relationships with hold-out clinical information were also confirmed by Chi-square tests, with two states strongly associated with inpatient mortality (IM) and 12 states (71%) associated with at least one admission diagnosis. The potential of these data representations on prediction of hospital outcomes was also explored using Logistic Regression (LR) and Random Forest (RF) models, with higher prediction performance observed when models were trained with MVTS data compared to HMM state spaces. However, the outputs of generative and discriminative analyses were complementary. For example, highest ranking features of the best performing RF model for IM (ROC-AUC 0.851) resembled the laboratory blood test and vital sign variables characterising the ‘Early Inflammatory Response-like’ state, itself strongly associated with IM. These results provide evidence of the capability of generative models to extract biological signals from routinely collected clinical data and their potential to represent interpretable patients’ trajectories for future research in hypothesis generation or prediction modelling.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Protocols

https://osf.io/6zp3d

Funding Statement

This research was supported by the NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care. VLK was funded by a MRC/NIHR Clinical Academic Research Partnership Grant (CARP; grant code: MR/T023902/1). VT is supported by Cancer Research UK. EB and TF were funded by the EMBL European Bioinformatics Institute (EMBL-EBI).

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Informed consent was not required for this study since the routinely collected healthcare data presented were anonymised. The project was approved by the NHS Health Research Authority (HRA) (IRAS: 253457), North East - Newcastle & North Tyneside 1 Research Ethics Committee (REC) (REC reference: 19/NE/0013) and by the EMBL Scientific Advisory Committee (BIAC).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • * Joint first authorship

Data Availability

The anonymised data that support this research project are available from Cambridge University Hospitals NHS Foundation Trust. Restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Reasonable data requests will be considered by the authors with permission of Cambridge University Hospitals NHS Foundation Trust.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted June 23, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models
Maria Herrero-Zazo, Victoria L Keevil, Vince Taylor, Helen Street, Afzal N Chaudhry, Tomas Fitzgerald, John Bradley, Ewan Birney
medRxiv 2021.06.18.21258885; doi: https://doi.org/10.1101/2021.06.18.21258885
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Big Data Analysis of Electronic Health Records: Clinically interpretable representations of older adult inpatient trajectories using time-series numerical data and Hidden Markov Models
Maria Herrero-Zazo, Victoria L Keevil, Vince Taylor, Helen Street, Afzal N Chaudhry, Tomas Fitzgerald, John Bradley, Ewan Birney
medRxiv 2021.06.18.21258885; doi: https://doi.org/10.1101/2021.06.18.21258885

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (174)
  • Allergy and Immunology (421)
  • Anesthesia (97)
  • Cardiovascular Medicine (901)
  • Dentistry and Oral Medicine (170)
  • Dermatology (102)
  • Emergency Medicine (257)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (407)
  • Epidemiology (8783)
  • Forensic Medicine (4)
  • Gastroenterology (405)
  • Genetic and Genomic Medicine (1861)
  • Geriatric Medicine (178)
  • Health Economics (388)
  • Health Informatics (1291)
  • Health Policy (644)
  • Health Systems and Quality Improvement (491)
  • Hematology (207)
  • HIV/AIDS (394)
  • Infectious Diseases (except HIV/AIDS) (10557)
  • Intensive Care and Critical Care Medicine (564)
  • Medical Education (193)
  • Medical Ethics (52)
  • Nephrology (218)
  • Neurology (1754)
  • Nursing (103)
  • Nutrition (266)
  • Obstetrics and Gynecology (342)
  • Occupational and Environmental Health (461)
  • Oncology (964)
  • Ophthalmology (282)
  • Orthopedics (107)
  • Otolaryngology (176)
  • Pain Medicine (117)
  • Palliative Medicine (43)
  • Pathology (264)
  • Pediatrics (557)
  • Pharmacology and Therapeutics (264)
  • Primary Care Research (219)
  • Psychiatry and Clinical Psychology (1843)
  • Public and Global Health (3983)
  • Radiology and Imaging (654)
  • Rehabilitation Medicine and Physical Therapy (344)
  • Respiratory Medicine (535)
  • Rheumatology (215)
  • Sexual and Reproductive Health (178)
  • Sports Medicine (166)
  • Surgery (197)
  • Toxicology (37)
  • Transplantation (106)
  • Urology (80)