Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Linked electronic health records for research on a nationwide cohort including over 54 million people in England

CVD-COVID-UK consortium, Manuscript drafting and revising, View ORCID ProfileAngela Wood, View ORCID ProfileRachel Denholm, Sam Hollings, Jennifer Cooper, View ORCID ProfileSamantha Ip, View ORCID ProfileVenexia Walker, View ORCID ProfileSpiros Denaxas, View ORCID ProfileAshley Akbari, View ORCID ProfileJonathan Sterne, View ORCID ProfileCathie Sudlow, Data wrangling, QA and analysis (including generating phenotype definitions), Angela Wood (chair), Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros Denaxas, Amitava Banerjee, William Whiteley, Figures/graphics, Alvina Lai, Consortium coordination (BHF Data Science Centre core team), Rouven Priedon, Cathie Sudlow, Lynn Morrice, Debbie Ringham, Public and patient advisory panel, Suzannah Power, Lynn Laidlaw, Michael Molete, John Walsh, NHS Digital (coordination, data management/provision, data access request and information governance support Trusted Research Environment support, Garry Coleman, Cath Day, Elizabeth Gaffney, Tim Gentry, Lisa Gray, Sam Hollings, Richard Irvine, Brian Roberts, Estelle Spence, Janet Waterhouse
doi: https://doi.org/10.1101/2021.02.22.21252185
1Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK
Angela Wood
4British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
5British Heart Foundation Centre of Research Excellence, University of Cambridge, Cambridge, UK
8Health Data Research UK Cambridge, Wellcome Genome Campus and University of Cambridge, Cambridge, UK
13National Institute for Health Research Blood and Transplant Research Unit in Donor Health and Genomics, University of Cambridge, Cambridge, UK
19The Alan Turing Institute, London, UK
Roles: (writing committee chair)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Angela Wood
  • For correspondence: bhfdsc{at}hdruk.ac.uk
Rachel Denholm
2Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK
10Health Data Research UK, South West Better Care Partnership, Bristol, UK
14National Institute for Health Research Bristol Biomedical Research Centre, University of Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Rachel Denholm
Sam Hollings
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer Cooper
2Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK
10Health Data Research UK, South West Better Care Partnership, Bristol, UK
14National Institute for Health Research Bristol Biomedical Research Centre, University of Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Samantha Ip
4British Heart Foundation Cardiovascular Epidemiology Unit, Department of Public Health and Primary Care, University of Cambridge, Cambridge, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Samantha Ip
Venexia Walker
7Department of Surgery, University of Pennsylvania Perelman School of Medicine, Philadelphia, USA
12MRC University of Bristol Integrative Epidemiology Unit, Bristol, UK; Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Venexia Walker
Spiros Denaxas
3British Heart Foundation Research Accelerator, University College London, UK
11Institute of Health Informatics, University College London, 222 Euston Road, London, UK
15National Institute for Health Research University College London Hospitals Biomedical Research Centre, Uni-versity College London, UK
19The Alan Turing Institute, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Spiros Denaxas
Ashley Akbari
18Population Data Science and Health Data Research UK, Swansea University, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Ashley Akbari
Jonathan Sterne
2Bristol Medical School: Population Health Sciences, University of Bristol, Bristol, UK
10Health Data Research UK, South West Better Care Partnership, Bristol, UK
14National Institute for Health Research Bristol Biomedical Research Centre, University of Bristol, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jonathan Sterne
  • For correspondence: bhfdsc{at}hdruk.ac.uk
Cathie Sudlow
9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK
21Usher Institute, Edinburgh Medical School, The University of Edinburgh, Edinburgh, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Cathie Sudlow
  • For correspondence: bhfdsc{at}hdruk.ac.uk
Angela Wood (chair)
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rachel Denholm
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sam Hollings
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jennifer Cooper
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Samantha Ip
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Venexia Walker
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Spiros Denaxas
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Amitava Banerjee
1Barts Health NHS Trust, The Royal London Hospital, Whitechapel Rd, London, UK
11Institute of Health Informatics, University College London, 222 Euston Road, London, UK
20University College London Hospitals NHS Trust, 235 Euston Road, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
William Whiteley
6Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK
17Nuffield Department of Population Health, University of Oxford, Oxford, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alvina Lai
11Institute of Health Informatics, University College London, 222 Euston Road, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rouven Priedon
9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cathie Sudlow
9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lynn Morrice
9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Debbie Ringham
9BHF Data Science Centre, Health Data Research UK, Gibbs Building, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Suzannah Power
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lynn Laidlaw
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Michael Molete
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
John Walsh
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Garry Coleman
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Cath Day
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elizabeth Gaffney
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Tim Gentry
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Lisa Gray
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Sam Hollings
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Richard Irvine
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brian Roberts
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Estelle Spence
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Janet Waterhouse
16NHS Digital, 1 Trevelyan Square, Leeds, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

ABSTRACT

Objectives Describe a new England-wide electronic health record (EHR) resource enabling whole population research on Covid-19 and cardiovascular disease whilst ensuring data security and privacy and maintaining public trust.

Design Cohort comprising linked person-level records from national healthcare settings for the English population accessible within NHS Digital’s new Trusted Research Environment.

Setting EHRs from primary care, hospital episodes, death registry, Covid-19 laboratory test results and community dispensing data, with further enrichment planned from specialist intensive care, cardiovascular and Covid-19 vaccination data.

Participants 54.4 million people alive on 1st January 2020 and registered with an NHS general practitioner in England.

Main measures of interest Confirmed and suspected Covid-19 diagnoses, exemplar cardiovascular conditions (incident stroke or transient ischaemic attack (TIA) and incident myocardial infarction (MI)) and all-cause mortality between 1st January and 31st October 2020.

Results The linked cohort includes over 96% of the English population. By combining person-level data across national healthcare settings, data on age, sex and ethnicity are complete for over 95% of the population. Among 53.2M people with no prior diagnosis of stroke/TIA, 98,721 had an incident stroke/TIA, of which 30% were recorded only in primary care and 4% only in death registry records. Among 53.1M people with no prior history of MI, 62,966 had an incident MI, of which 8% were recorded only in primary care and 12% only in death records. A total of 959,067 people had a confirmed or suspected Covid-19 diagnosis (714,162 in primary care data, 126,349 in hospital admission records, 776,503 in Covid-19 laboratory test data and 48,433 participants in death registry records). While 58% of these were recorded in both primary care and Covid-19 laboratory test data, 15% and 18% respectively were recorded in only one.

Conclusions This population-wide resource demonstrates the importance of linking person-level data across health settings to maximize completeness of key characteristics and to ascertain cardiovascular events and Covid-19 diagnoses. Although established initially to support research on Covid-19 and cardiovascular disease to benefit clinical care and public health and to inform health care policy, it can broaden further to enable a very wide range of research.

Competing Interest Statement

The authors have declared no competing interest.

Clinical Trial

Registered on the Health Data Research UK Innovation Gateway: CVD-COVID-UK TRE asset in Health Data Research Innovation Gateway [Internet]. [cited 2021 Feb 18]. Available from: https://web.www.healthdatagateway.org/dataset/7e5f0247-f033-4f98-aed3-3d7422b9dc6d

Clinical Protocols

https://github.com/BHFDSC

https://portal.caliberresearch.org/collections/bhf-data-science-centre

Funding Statement

The BHF Data Science Centre (BHF Grant no. SP/19/3/34678, awarded to Health Data Research UK) funded co-development (with NHS Digital) of the TRE, provision of linked datasets, data access, user software licences, computational usage and data management and wrangling support, with additional contributions from the HDR UK Data and Connectivity component of the UK Chief Scientific Adviser National Core Studies programme to coordinate national Covid-19 priority research. Consortium partner organisations funded the time of contributing data analysts, biostatisticians, epidemiologists and clinicians. The results described are based on data from patients, collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make anonymised data available for research. AA is supported by Health Data Research UK [HDR-9006] which receives its funding from the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation (BHF) and the Wellcome Trust; and Administrative Data Research UK which is funded by the Economic and Social Research Council [grant ES/S007393/1]. AB is supported by research funding from NIHR, British Medical Association, Astra-Zeneca and UK Research and Innovation. AB, AW and SD are part of the BigData@Heart Consortium, funded by the Innovative Medicines Initia-tive-2 Joint Undertaking under grant agreement No. 116074. AW and SI are supported by the BHF-Turing Cardiovascular Data Science Award (BCDSA\100005) and by core funding from: UK Medical Research Council (MR/L003120/1), British Heart Foundation (RG/13/13/30194; RG/18/13/33946) and NIHR Cambridge Biomedical Research Centre (BRC-1215-20014). JC, JS and RD are supported by the HDRUK South West Better Care Partnership and NIHR Bristol Biomedical Research Centre. SD is supported by Health Data Research UK London, which receives its funding from Health Data Research UK Ltd funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), British Heart Foundation, and the Wellcome Trust; Alan Turing Fellow-ship (EP/N510129/1); National Institute for Health Research (NIHR) Biomedical Research Centre (BRC) at University College London Hospital NHS Trust (UCLH). VW is supported by the University of Bristol Medical Research Council Integrative Epidemiology Unit (MC_UU_00011/4). WW is supported by a Scottish Senior Clinical Fellowship, CSO (SCAF/17/01)

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

The North East - Newcastle & North Tyneside 2 Research Ethics Committee provided ethical approval for the CVD-COVID-UK research programme (REC number: 20/NE/0161).

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Footnotes

  • Consortium membership: https://www.hdruk.ac.uk/wp-content/uploads/2021/01/210128-CVD-COVID-UK-Consortium-Members.pdf and see Annexe 1 – Consortium members, who contributed to discussions leading up to this manuscript and provided very helpful insights and comments

  • Typo in discussion corrected: 200,000 people under 30 replaced with 20 million

Data Availability

Data are available for bona fide researchers accessible within the NHS Digital trusted Research Environment for England. Contact bhfdsc{at}hdruk.ac.uk for information on how to join the CVD-COVID-UK consortium for access.

https://www.hdruk.ac.uk/wp-content/uploads/2021/02/210215-CVD-COVID-UK-TRE-Dataset-Dashboard_CLMS.pdf

https://web.www.healthdatagateway.org/dataset/7e5f0247-f033-4f98-aed3-3d7422b9dc6d

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-ND 4.0 International license.
Back to top
PreviousNext
Posted February 26, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Linked electronic health records for research on a nationwide cohort including over 54 million people in England
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Linked electronic health records for research on a nationwide cohort including over 54 million people in England
CVD-COVID-UK consortium, Manuscript drafting and revising, Angela Wood, Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros Denaxas, Ashley Akbari, Jonathan Sterne, Cathie Sudlow, Data wrangling, QA and analysis (including generating phenotype definitions), Angela Wood (chair), Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros Denaxas, Amitava Banerjee, William Whiteley, Figures/graphics, Alvina Lai, Consortium coordination (BHF Data Science Centre core team), Rouven Priedon, Cathie Sudlow, Lynn Morrice, Debbie Ringham, Public and patient advisory panel, Suzannah Power, Lynn Laidlaw, Michael Molete, John Walsh, NHS Digital (coordination, data management/provision, data access request and information governance support Trusted Research Environment support, Garry Coleman, Cath Day, Elizabeth Gaffney, Tim Gentry, Lisa Gray, Sam Hollings, Richard Irvine, Brian Roberts, Estelle Spence, Janet Waterhouse
medRxiv 2021.02.22.21252185; doi: https://doi.org/10.1101/2021.02.22.21252185
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Linked electronic health records for research on a nationwide cohort including over 54 million people in England
CVD-COVID-UK consortium, Manuscript drafting and revising, Angela Wood, Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros Denaxas, Ashley Akbari, Jonathan Sterne, Cathie Sudlow, Data wrangling, QA and analysis (including generating phenotype definitions), Angela Wood (chair), Rachel Denholm, Sam Hollings, Jennifer Cooper, Samantha Ip, Venexia Walker, Spiros Denaxas, Amitava Banerjee, William Whiteley, Figures/graphics, Alvina Lai, Consortium coordination (BHF Data Science Centre core team), Rouven Priedon, Cathie Sudlow, Lynn Morrice, Debbie Ringham, Public and patient advisory panel, Suzannah Power, Lynn Laidlaw, Michael Molete, John Walsh, NHS Digital (coordination, data management/provision, data access request and information governance support Trusted Research Environment support, Garry Coleman, Cath Day, Elizabeth Gaffney, Tim Gentry, Lisa Gray, Sam Hollings, Richard Irvine, Brian Roberts, Estelle Spence, Janet Waterhouse
medRxiv 2021.02.22.21252185; doi: https://doi.org/10.1101/2021.02.22.21252185

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Health Informatics
Subject Areas
All Articles
  • Addiction Medicine (434)
  • Allergy and Immunology (760)
  • Anesthesia (222)
  • Cardiovascular Medicine (3316)
  • Dentistry and Oral Medicine (366)
  • Dermatology (282)
  • Emergency Medicine (480)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1175)
  • Epidemiology (13403)
  • Forensic Medicine (19)
  • Gastroenterology (900)
  • Genetic and Genomic Medicine (5182)
  • Geriatric Medicine (483)
  • Health Economics (786)
  • Health Informatics (3286)
  • Health Policy (1146)
  • Health Systems and Quality Improvement (1199)
  • Hematology (432)
  • HIV/AIDS (1024)
  • Infectious Diseases (except HIV/AIDS) (14657)
  • Intensive Care and Critical Care Medicine (917)
  • Medical Education (478)
  • Medical Ethics (128)
  • Nephrology (526)
  • Neurology (4957)
  • Nursing (263)
  • Nutrition (735)
  • Obstetrics and Gynecology (889)
  • Occupational and Environmental Health (797)
  • Oncology (2531)
  • Ophthalmology (730)
  • Orthopedics (284)
  • Otolaryngology (348)
  • Pain Medicine (323)
  • Palliative Medicine (90)
  • Pathology (547)
  • Pediatrics (1308)
  • Pharmacology and Therapeutics (552)
  • Primary Care Research (559)
  • Psychiatry and Clinical Psychology (4225)
  • Public and Global Health (7526)
  • Radiology and Imaging (1717)
  • Rehabilitation Medicine and Physical Therapy (1022)
  • Respiratory Medicine (982)
  • Rheumatology (480)
  • Sexual and Reproductive Health (500)
  • Sports Medicine (425)
  • Surgery (551)
  • Toxicology (73)
  • Transplantation (237)
  • Urology (206)