Abstract
Background Electronic health records (EHR) are increasingly used for studying multimorbidities. However, concerns about accuracy, completeness, and EHRs being primarily designed for billing and administrative purposes raise questions about the consistency and reproducibility of EHR-based multimorbidity research.
Methods Utilizing phecodes to represent the disease phenome, we analyzed pairwise comorbidity strengths using a dual logistic regression approach and constructed multimorbidity as an undirected weighted graph. We assessed the consistency of the multimorbidity networks within and between two major EHR systems at local (nodes and edges), meso (neighboring patterns), and global (network statistics) scales. We present case studies to identify disease clusters and uncover clinically interpretable disease relationships. We provide an interactive web tool and a knowledge base combining data from multiple sources for online multimorbidity analysis.
Findings Analyzing data from 500,000 patients across Vanderbilt University Medical Center and Mass General Brigham health systems, we observed a strong correlation in disease frequencies ( Kendall’s τ = 0.643) and comorbidity strengths (Pearson ρ = 0.79). Consistent network statistics across EHRs suggest similar structures of multimorbidity networks at various scales. Comorbidity strengths and similarities of multimorbidity connection patterns align with the disease genetic correlations. Graph-theoretic analyses revealed a consistent core-periphery structure, implying efficient network clustering through threshold graph construction. Using hydronephrosis as a case study, we demonstrated the network’s ability to uncover clinically relevant disease relationships and provide novel insights.
Interpretation Our findings demonstrate the robustness of large-scale EHR data for studying phenome-wide multimorbidities. The alignment of multimorbidity patterns with genetic data suggests the potential utility for uncovering shared biology of diseases. The consistent core-periphery structure offers analytical insights to discover complex disease interactions. This work also sets the stage for advanced disease modeling, with implications for precision medicine.
Funding VUMC Biostatistics Development Award, the National Institutes of Health, and the VA CSRD
Competing Interest Statement
JWS is a member of the Scientific Advisory Board of Sensorium Therapeutics (with equity) and has received grant support from Biogen, Inc. He is the principal investigator of a collaborative study of the genetics of depression and bipolar disorder sponsored by 23andMe, for which 23andMe provides analysis time as in-kind support but no payments. DMR has served on advisory boards for Illumina and Alkermes and has received research funds unrelated to this work from PTC Therapeutics. All other authors declare no competing interests.
Funding Statement
NS and YX are supported by the Vanderbilt University Department of Biostatistics Development Award; YX, CB and RH are supported by R21DK127075; YX, DE, EP and DR are supported by P50GM115305; JWS is supported in part by R01 MH118233. The Vanderbilt University Medical Center dataset(s) used for the analyses described were obtained from Vanderbilt University Medical Centers SD/BioVU, which is supported by numerous sources: institutional funding, private agencies, and federal grants. These include the NIH funded Shared Instrumentation Grant S10RR025141; and CTSA grants UL1TR002243, UL1TR000445, and UL1RR024975. Genomic data are also supported by investigator-led projects that include U01HG004798, R01NS032830, RC2GM092618, P50GM115305, U01HG006378, U19HL065962, R01HD074711; and additional funding sources listed at https://victr.vanderbilt.edu/pub/biovu/. This research has been conducted using the UK Biobank Resource under Application Number 43397.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
IRB# 172041 of Vanderbilt University Medical Center (VUMC) gave ethical approval for this work. IRB# 2009P002312 of Mass General Brigham (MGB) gave ethical approval for this work.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
↵* Contribute equally
The manuscript revised; Figure 7 revised, author and affiliation updated; Supplemental file updated
Data Availability
All coding details associated with the models has been shared. Results have been aggregated and reported within this Article to the maximum extent possible, while maintaining privacy from personal health information as required by law. All dynamic online analysis results are available from PheMIME App (https://prod.tbilab.org/PheMIME/). All data are archived within TBILab systems in an audited computing environment secured by the Health Insurance Portability and Accountability Act to facilitate verification of study conclusions. The open-source code for PheMIME is publicly available on our GitHub repository at https://github.com/tbilab/PheMIME.