Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes

View ORCID ProfileMackenzie J. Edmondson, Chongliang Luo, Rui Duan, Mitchell Maltenfort, Zhaoyi Chen, Justine Shults, View ORCID ProfileJiang Bian, Patrick B. Ryan, Christopher B. Forrest, Yong Chen
doi: https://doi.org/10.1101/2020.12.17.20248194
Mackenzie J. Edmondson
aDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Mackenzie J. Edmondson
Chongliang Luo
aDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Rui Duan
aDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Mitchell Maltenfort
bDepartment of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Zhaoyi Chen
cDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
dCancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Justine Shults
aDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jiang Bian
cDepartment of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
dCancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Jiang Bian
Patrick B. Ryan
eJanssen Research and Development, Titusville, NJ, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Christopher B. Forrest
bDepartment of Pediatrics, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yong Chen
aDepartment of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: Ychen123@upenn.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background Multi-site studies facilitate the study of rare outcomes or exposures through integrating patient information from several distinct care sites. Due to patient privacy concerns, sharing of patient-level information among collaborating sites is often prohibited, suggesting a need for privacy-preserving data analysis methods. Several such methods exist, but have been shown to sometimes result in biased estimation or require extensive communication among sites.

Objective We present a communication-efficient, privacy-preserving method for performing distributed regression on Electronic Health Records (EHR) data across multiple sites for zero-inflated count outcomes. Our approach is motivated by two real-world data problems: modeling frequency of serious adverse events and examining risk factors associated with pediatric avoidable hospitalization.

Methods We use hurdle regression, a two-part (logistic-Poisson) regression model, to characterize the effects of risk factors on zero-inflated count outcomes. Further, we develop a one-shot algorithm for performing hurdle regression (ODAH) across multiple sites, using individual patient data at one site and aggregated data from all other sites to approximate the complete data log likelihood. We evaluate ODAH through extensive simulations and an application to EHR data from the Children’s Hospital of Philadelphia (CHOP) and the OneFlorida Clinical Research Consortium. We compare ODAH estimates to those from meta-analysis and pooled analysis (the gold standard in which all patient data are pooled together).

Results In simulations, ODAH estimates exhibited bias relative to the gold standard of less than 0.1% across several settings. In contrast, meta-analysis estimated exhibited relative bias up to 12.7%, largely dependent on the event rate. When applying ODAH to CHOP data, relative biases for estimates in both components of the hurdle model were less than 5.1%, while meta-analysis estimates exhibited relative bias as high as 63.6%. When analyzing OneFlorida data, ODAH relative biases were less than 10% for eight of the ten log relative risks estimated, while meta-analysis estimates again showed substantially greater bias.

Conclusions Our simulations and real-world applications suggest ODAH is a promising method for performing privacy-preserving distributed learning on EHR data when modeling zero-inflated count outcomes.

Competing Interest Statement

The authors have declared no competing interest.

Funding Statement

This work is support in part by National Institutes of Health Grants 1R01LM012607 (ME, CL, RD and YC) and 1R01AI130460 (ME, CL, RD and YC). Children's Hospital of Philadelphia: This work was supported by a grant from the Commonwealth Universal Research Enhancement (C.U.R.E) program funded by the Pennsylvania Department of Health - 2015 Formula award - SAP #4100072543. OneFlorida: The work at the University of Florida (UF) site is supported in part by the Cancer Informatics Shared Resource at the UF Health Cancer Center, and NIH grants R21AG061431 and R01CA246418.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

Children's Hospital of Philadelphia: CHOP designated the study Not Human Subjects Research. OneFlorida Clinical Research Consortium: The use of OneFlorida data in this study was reviewed and approved by the University of Florida IRB under IRB202003137.

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

Children's Hospital of Philadelphia: The data are not available for deposit into a repository. CHOP can make aggregate results available on request. OneFlorida: OneFlorida data can be requested at https://onefloridaconsortium.org/front-door/; Since OneFlorida data is a HIPAA limited data set, a data use agreement needs to be established with the OneFlorida network.

  • Abbreviations

    EHR
    electronic health record
    ODAH
    One-shot Distributed Algorithm for performing Hurdle regression
    CHOP
    Children’s Hospital of Philadelphia
    DDN
    distributed data network
    OHDSI
    Observational Health Data Sciences and Informatics
    PHI
    protected health information
    AH
    avoidable hospitalization
    OR
    odds ratio
    RR
    relative risk
  • Copyright 
    The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. All rights reserved. No reuse allowed without permission.
    Back to top
    PreviousNext
    Posted December 19, 2020.
    Download PDF

    Supplementary Material

    Data/Code
    Email

    Thank you for your interest in spreading the word about medRxiv.

    NOTE: Your email address is requested solely to identify you as the sender of this article.

    Enter multiple addresses on separate lines or separate them with commas.
    Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes
    (Your Name) has forwarded a page to you from medRxiv
    (Your Name) thought you would like to see this page from the medRxiv website.
    CAPTCHA
    This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
    Share
    Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes
    Mackenzie J. Edmondson, Chongliang Luo, Rui Duan, Mitchell Maltenfort, Zhaoyi Chen, Justine Shults, Jiang Bian, Patrick B. Ryan, Christopher B. Forrest, Yong Chen
    medRxiv 2020.12.17.20248194; doi: https://doi.org/10.1101/2020.12.17.20248194
    Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
    Citation Tools
    Distributed Learning from Multi-Site Observational Health Data for Zero-Inflated Count Outcomes
    Mackenzie J. Edmondson, Chongliang Luo, Rui Duan, Mitchell Maltenfort, Zhaoyi Chen, Justine Shults, Jiang Bian, Patrick B. Ryan, Christopher B. Forrest, Yong Chen
    medRxiv 2020.12.17.20248194; doi: https://doi.org/10.1101/2020.12.17.20248194

    Citation Manager Formats

    • BibTeX
    • Bookends
    • EasyBib
    • EndNote (tagged)
    • EndNote 8 (xml)
    • Medlars
    • Mendeley
    • Papers
    • RefWorks Tagged
    • Ref Manager
    • RIS
    • Zotero
    • Tweet Widget
    • Facebook Like
    • Google Plus One

    Subject Area

    • Health Informatics
    Subject Areas
    All Articles
    • Addiction Medicine (70)
    • Allergy and Immunology (168)
    • Anesthesia (51)
    • Cardiovascular Medicine (455)
    • Dentistry and Oral Medicine (83)
    • Dermatology (55)
    • Emergency Medicine (159)
    • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (191)
    • Epidemiology (5294)
    • Forensic Medicine (3)
    • Gastroenterology (198)
    • Genetic and Genomic Medicine (760)
    • Geriatric Medicine (80)
    • Health Economics (214)
    • Health Informatics (702)
    • Health Policy (362)
    • Health Systems and Quality Improvement (224)
    • Hematology (100)
    • HIV/AIDS (165)
    • Infectious Diseases (except HIV/AIDS) (5934)
    • Intensive Care and Critical Care Medicine (367)
    • Medical Education (105)
    • Medical Ethics (25)
    • Nephrology (83)
    • Neurology (772)
    • Nursing (43)
    • Nutrition (135)
    • Obstetrics and Gynecology (146)
    • Occupational and Environmental Health (234)
    • Oncology (481)
    • Ophthalmology (153)
    • Orthopedics (39)
    • Otolaryngology (97)
    • Pain Medicine (39)
    • Palliative Medicine (20)
    • Pathology (141)
    • Pediatrics (223)
    • Pharmacology and Therapeutics (138)
    • Primary Care Research (99)
    • Psychiatry and Clinical Psychology (865)
    • Public and Global Health (2035)
    • Radiology and Imaging (354)
    • Rehabilitation Medicine and Physical Therapy (159)
    • Respiratory Medicine (287)
    • Rheumatology (94)
    • Sexual and Reproductive Health (74)
    • Sports Medicine (77)
    • Surgery (110)
    • Toxicology (25)
    • Transplantation (29)
    • Urology (39)