Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification

View ORCID ProfileLauren J Beesley, Bhramar Mukherjee
doi: https://doi.org/10.1101/2019.12.26.19015859
Lauren J Beesley
University of Michigan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Lauren J Beesley
  • For correspondence: lbeesley@umich.edu
Bhramar Mukherjee
University of Michigan
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: bhramar@umich.edu
  • Abstract
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Health research using electronic health records (EHR) has gained popularity, but misclassification of EHR-derived disease status and lack of representativeness of the study sample can result in substantial bias in effect estimates and can impact power and type I error. In this paper, we develop new strategies for handling disease status misclassification and selection bias in EHR-based association studies. We first focus on each type of bias separately. For misclassification, we propose three novel likelihood-based bias correction strategies. A distinguishing feature of the EHR setting is that misclassification may be related to patient-specific factors, and the proposed methods leverage data in the EHR to estimate misclassification rates without gold standard labels. For addressing selection bias, we describe how calibration and inverse probability weighting methods from the survey sampling literature can be extended and applied to the EHR setting. Addressing misclassification and selection biases simultaneously is a more challenging problem than dealing with each on its own, and we propose several new strategies to address this situation. For all methods proposed, we derive valid standard errors and provide software for implementation. We provide a new suite of statistical estimation and inference strategies for addressing misclassification and selection bias simultaneously that is tailored to problems arising in EHR data analysis. We apply these methods to data from The Michigan Genomics Initiative (MGI), a longitudinal EHR-linked biorepository.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work was supported by The University of Michigan Comprehensive Cancer Center core grant supplement 5P30-CA-046592, NSF DMS award 1712933 and The University of Michigan precision health award U067541.

Author Declarations

All relevant ethical guidelines have been followed; any necessary IRB and/or ethics committee approvals have been obtained and details of the IRB/oversight body are included in the manuscript.

Yes

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

MGI data are available after IRB approval to select researchers.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
Back to top
PreviousNext
Posted December 30, 2019.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification
Lauren J Beesley, Bhramar Mukherjee
medRxiv 2019.12.26.19015859; doi: https://doi.org/10.1101/2019.12.26.19015859
Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
Citation Tools
Statistical inference for association studies using electronic health records: handling both selection bias and outcome misclassification
Lauren J Beesley, Bhramar Mukherjee
medRxiv 2019.12.26.19015859; doi: https://doi.org/10.1101/2019.12.26.19015859

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Epidemiology
Subject Areas
All Articles
  • Addiction Medicine (48)
  • Allergy and Immunology (99)
  • Anesthesia (34)
  • Cardiovascular Medicine (311)
  • Dentistry and Oral Medicine (57)
  • Dermatology (42)
  • Emergency Medicine (109)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (134)
  • Epidemiology (3882)
  • Forensic Medicine (1)
  • Gastroenterology (140)
  • Genetic and Genomic Medicine (491)
  • Geriatric Medicine (47)
  • Health Economics (139)
  • Health Informatics (488)
  • Health Policy (244)
  • Health Systems and Quality Improvement (155)
  • Hematology (64)
  • HIV/AIDS (118)
  • Infectious Diseases (except HIV/AIDS) (4111)
  • Intensive Care and Critical Care Medicine (270)
  • Medical Education (73)
  • Medical Ethics (22)
  • Nephrology (61)
  • Neurology (512)
  • Nursing (30)
  • Nutrition (81)
  • Obstetrics and Gynecology (86)
  • Occupational and Environmental Health (164)
  • Oncology (345)
  • Ophthalmology (104)
  • Orthopedics (30)
  • Otolaryngology (68)
  • Pain Medicine (28)
  • Palliative Medicine (11)
  • Pathology (101)
  • Pediatrics (143)
  • Pharmacology and Therapeutics (112)
  • Primary Care Research (60)
  • Psychiatry and Clinical Psychology (551)
  • Public and Global Health (1353)
  • Radiology and Imaging (249)
  • Rehabilitation Medicine and Physical Therapy (105)
  • Respiratory Medicine (204)
  • Rheumatology (66)
  • Sexual and Reproductive Health (47)
  • Sports Medicine (45)
  • Surgery (76)
  • Toxicology (21)
  • Transplantation (21)
  • Urology (34)