Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Scaling genetic discovery for organ volumes using machine learning-assisted imputation and bias-corrected GWAS

Afreen Naz, Brandon Whitcher, Altayeb Ahmed, Marjola Thanaj, Elena P Sorokin, Jimmy D Bell, View ORCID ProfileE Louise Thomas, Madeleine Cule, Hanieh Yaghootkar
doi: https://doi.org/10.1101/2025.11.07.25339752
Afreen Naz
1School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Brandon Whitcher
2Research Center for Optimal Health, School of Life Sciences, University of Westminster, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Altayeb Ahmed
1School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Marjola Thanaj
2Research Center for Optimal Health, School of Life Sciences, University of Westminster, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Elena P Sorokin
3Calico Life Sciences LLC, South San Francisco, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jimmy D Bell
2Research Center for Optimal Health, School of Life Sciences, University of Westminster, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
E Louise Thomas
2Research Center for Optimal Health, School of Life Sciences, University of Westminster, London, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for E Louise Thomas
Madeleine Cule
3Calico Life Sciences LLC, South San Francisco, CA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Hanieh Yaghootkar
1School of Human Development and Health, Faculty of Medicine, University of Southampton, Southampton, UK
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: h.yaghootkar{at}soton.ac.uk
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Background MRI-derived organ and tissue volumes are powerful endophenotypes for studying complex disease, but their availability is limited by cost and throughput. We present a scalable framework that combines machine learning-based phenotypic imputation with probabilistic GWAS (POP-GWAS) to enable robust genetic discovery for imaging-derived phenotypes (IDPs).

Results Using 37,589 UK Biobank MRI scans and 382 biomarkers, we imputed nine IDPs—including volumes of fat depots, muscle, pancreas, and lung—across ∼450,000 individuals. The POP-GWAS framework integrated measured and imputed traits, correcting for imputation uncertainty and increasing effective sample size by up to 200%. We identified 452 independent loci associated with the nine IDPs.

This approach uncovered new insights into the architecture and disease relevance of organ volumes. For example, genetically higher abdominal subcutaneous fat was associated with higher risks of diabetes, polycystic ovary syndrome, cardiovascular disease, gout, osteoarthritis, asthma, psoriasis; higher visceral fat with cholelithiasis and reflux; higher muscle volume with aortic aneurysm, atrial fibrillation, thrombotic events, osteoarthritis, but a lower risk of depression; higher lung volume with higher risks of aortic aneurysm, but a lower risk of heart disease and reflux; higher pancreas volume with lower risk of diabetes. Tissue enrichment analyses revealed organ-specific patterns, e.g., brain tissue for fat traits and pancreatic for pancreas volume.

Conclusions Our study demonstrates that machine learning-assisted GWAS enables scalable discovery in imaging genetics. This framework advances understanding of organ-specific biology and provides a blueprint for leveraging the remaining >60,000 UK Biobank MRI scans to accelerate genetic discovery and uncover mechanisms of disease.

Competing Interest Statement

MC and ES are employees of Calico Life Sciences LLC.

Funding Statement

H.Y. is funded by Diabetes UK (grant 23/0006598) and Calico Life Sciences LLC.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.

Yes

Footnotes

  • ANaz{at}lincoln.ac.uk, B.Whitcher{at}westminster.ac.uk, 28404091{at}students.lincoln.ac.uk, M.Thanaj{at}westminster.ac.uk, sorokin{at}calicolabs.com, J.Bell{at}westminster.ac.uk, l.thomas3{at}westminster.ac.uk, cule{at}calicolabs.com, HYaghootkar{at}lincoln.ac.uk

Data availability

Our research was conducted using UK Biobank data. Under the standard UK Biobank data sharing agreement, we (and other researchers) cannot directly share raw data obtained or derived from the UK Biobank. However, under this agreement, all the data generated, and methodologies used in this paper are returned by us to the UK Biobank, where they will be fully available. Access can be obtained directly from the UK Biobank to all bona fide researchers upon submitting a health-related research proposal to the UK Biobank https://www.ukbiobank.ac.uk.

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY 4.0 International license.
Back to top
PreviousNext
Posted November 09, 2025.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Scaling genetic discovery for organ volumes using machine learning-assisted imputation and bias-corrected GWAS
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Scaling genetic discovery for organ volumes using machine learning-assisted imputation and bias-corrected GWAS
Afreen Naz, Brandon Whitcher, Altayeb Ahmed, Marjola Thanaj, Elena P Sorokin, Jimmy D Bell, E Louise Thomas, Madeleine Cule, Hanieh Yaghootkar
medRxiv 2025.11.07.25339752; doi: https://doi.org/10.1101/2025.11.07.25339752
Twitter logo Facebook logo LinkedIn logo Mendeley logo
Citation Tools
Scaling genetic discovery for organ volumes using machine learning-assisted imputation and bias-corrected GWAS
Afreen Naz, Brandon Whitcher, Altayeb Ahmed, Marjola Thanaj, Elena P Sorokin, Jimmy D Bell, E Louise Thomas, Madeleine Cule, Hanieh Yaghootkar
medRxiv 2025.11.07.25339752; doi: https://doi.org/10.1101/2025.11.07.25339752

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (576)
  • Allergy and Immunology (868)
  • Anesthesia (306)
  • Cardiovascular Medicine (4482)
  • Dentistry and Oral Medicine (449)
  • Dermatology (385)
  • Emergency Medicine (615)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (1528)
  • Epidemiology (15277)
  • Forensic Medicine (31)
  • Gastroenterology (1133)
  • Genetic and Genomic Medicine (6644)
  • Geriatric Medicine (671)
  • Health Economics (1006)
  • Health Informatics (4605)
  • Health Policy (1378)
  • Health Systems and Quality Improvement (1623)
  • Hematology (544)
  • HIV/AIDS (1276)
  • Infectious Diseases (except HIV/AIDS) (15961)
  • Intensive Care and Critical Care Medicine (1111)
  • Medical Education (626)
  • Medical Ethics (147)
  • Nephrology (674)
  • Neurology (6695)
  • Nursing (346)
  • Nutrition (1006)
  • Obstetrics and Gynecology (1153)
  • Occupational and Environmental Health (961)
  • Oncology (3369)
  • Ophthalmology (988)
  • Orthopedics (370)
  • Otolaryngology (421)
  • Pain Medicine (437)
  • Palliative Medicine (131)
  • Pathology (669)
  • Pediatrics (1704)
  • Pharmacology and Therapeutics (700)
  • Primary Care Research (717)
  • Psychiatry and Clinical Psychology (5495)
  • Public and Global Health (9285)
  • Radiology and Imaging (2223)
  • Rehabilitation Medicine and Physical Therapy (1375)
  • Respiratory Medicine (1201)
  • Rheumatology (598)
  • Sexual and Reproductive Health (721)
  • Sports Medicine (535)
  • Surgery (722)
  • Toxicology (100)
  • Transplantation (290)
  • Urology (267)