PT - JOURNAL ARTICLE AU - Kolobkov, Dmitry AU - Sharma, Satyarth Mishra AU - Medvedev, Aleksandr AU - Lebedev, Mikhail AU - Kosaretskiy, Egor AU - Vakhitov, Ruslan TI - Efficacy of federated learning on genomic data: a study on the UK Biobank and the 1000 Genomes Project AID - 10.1101/2023.01.24.23284898 DP - 2023 Jan 01 TA - medRxiv PG - 2023.01.24.23284898 4099 - http://medrxiv.org/content/early/2023/02/09/2023.01.24.23284898.short 4100 - http://medrxiv.org/content/early/2023/02/09/2023.01.24.23284898.full AB - Combining training data from multiple sources increases sample size and reduces confounding, leading to more accurate and less biased machine learning models. In healthcare, however, direct pooling of data is often not allowed by data custodians who are accountable for minimizing the exposure of sensitive information. Federated learning offers a promising solution to this problem by training a model in a decentralized manner thus reducing the risks of data leakage. Although there is increasing utilization of federated learning on clinical data, its efficacy on genomic data has not been extensively studied. This study aims to contribute to the adoption of federated learning for genomic data by investigating its applicability in two scenarios: phenotype prediction on the UK Biobank data and ancestry prediction on the 1000 Genomes Project data. We show that federated models trained on data split into independent nodes achieve performance close to centralized models, even in the presence of significant inter-node heterogeneity. This paper describes the experiments and provides recommendations on strategies that should be used to reduce computational complexity or communication costs.Competing Interest StatementAll authors had financial support from GENXT LTD for the submitted work.Funding StatementThe study was funded by GENXT LTD, no external funding was received.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:Access to UK Biobank data was granted upon application 43661. UK Biobank has approval from the North West Multi-centre Research Ethics Committee (MREC) to obtain and disseminate data and samples from the participants (http://www.ukbiobank.ac.uk/ethics/), and these ethical regulations cover the work in this study. Written informed consent was obtained from all participants.I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesAll data produced in the present study are available upon reasonable request to the authors. UK Biobank data are available upon request through the UK Biobank website.