PT - JOURNAL ARTICLE AU - Tian Gu AU - Phil H Lee AU - Rui Duan TI - COMMUTE: communication-efficient transfer learning for multi-site risk prediction AID - 10.1101/2022.03.23.22272834 DP - 2022 Jan 01 TA - medRxiv PG - 2022.03.23.22272834 4099 - http://medrxiv.org/content/early/2022/06/28/2022.03.23.22272834.short 4100 - http://medrxiv.org/content/early/2022/06/28/2022.03.23.22272834.full AB - Objectives We propose a communication-efficient transfer learning approach (COMMUTE) that efficiently and effectively incorporates multi-site healthcare data for training risk prediction models in a target population of interest, accounting for challenges including population heterogeneity and data sharing constraints across sites.Methods We first train population-specific source models locally within each institution. Using data from a given target population, COMMUTE learns a calibration term for each source model, which adjusts for potential data heterogeneity through flexible distance-based regularizations. In a centralized setting where multi-site data can be directly pooled, all data are combined to train the target model after calibration. When individual-level data are not shareable in some sites, COMMUTE requests only the locally trained models from these sites, with which, COMMUTE generates heterogeneity-adjusted synthetic data for training the target model. We evaluate COMMUTE via extensive simulation studies and an application to multi-site data from the electronic Medical Records and Genomics (eMERGE) Network to predict extreme obesity.Results Simulation studies show that COMMUTE outperforms methods without adjusting for population heterogeneity and methods trained in a single population over a broad spectrum of settings. Using eMERGE data, COMMUTE achieves an area under the receiver operating characteristic curve (AUC) around 0.80, which outperforms other benchmark methods with AUC ranging from 0.51 to 0.70.Conclusion COMMUTE improves the risk prediction in the target population and safeguards against negative transfer when some source populations are highly different from the target. In a federated setting, it is highly communication efficient as it only requires each site to share model parameter estimates once, and no iterative communication or higher-order terms are needed.Competing Interest StatementThe authors have declared no competing interest.Funding StatementNone.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesI confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe eMERGENetwork data used in this work is publicly available through dbGaP phs000888.v1.p1 (https://ega-archive.org/studies/phs000888). https://ega-archive.org/studies/phs000888