Abstract
Background MRI-derived organ and tissue volumes are powerful endophenotypes for studying complex disease, but their availability is limited by cost and throughput. We present a scalable framework that combines machine learning-based phenotypic imputation with probabilistic GWAS (POP-GWAS) to enable robust genetic discovery for imaging-derived phenotypes (IDPs).
Results Using 37,589 UK Biobank MRI scans and 382 biomarkers, we imputed nine IDPs—including volumes of fat depots, muscle, pancreas, and lung—across ∼450,000 individuals. The POP-GWAS framework integrated measured and imputed traits, correcting for imputation uncertainty and increasing effective sample size by up to 200%. We identified 452 independent loci associated with the nine IDPs.
This approach uncovered new insights into the architecture and disease relevance of organ volumes. For example, genetically higher abdominal subcutaneous fat was associated with higher risks of diabetes, polycystic ovary syndrome, cardiovascular disease, gout, osteoarthritis, asthma, psoriasis; higher visceral fat with cholelithiasis and reflux; higher muscle volume with aortic aneurysm, atrial fibrillation, thrombotic events, osteoarthritis, but a lower risk of depression; higher lung volume with higher risks of aortic aneurysm, but a lower risk of heart disease and reflux; higher pancreas volume with lower risk of diabetes. Tissue enrichment analyses revealed organ-specific patterns, e.g., brain tissue for fat traits and pancreatic for pancreas volume.
Conclusions Our study demonstrates that machine learning-assisted GWAS enables scalable discovery in imaging genetics. This framework advances understanding of organ-specific biology and provides a blueprint for leveraging the remaining >60,000 UK Biobank MRI scans to accelerate genetic discovery and uncover mechanisms of disease.
Competing Interest Statement
MC and ES are employees of Calico Life Sciences LLC.
Funding Statement
H.Y. is funded by Diabetes UK (grant 23/0006598) and Calico Life Sciences LLC.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
ANaz{at}lincoln.ac.uk, B.Whitcher{at}westminster.ac.uk, 28404091{at}students.lincoln.ac.uk, M.Thanaj{at}westminster.ac.uk, sorokin{at}calicolabs.com, J.Bell{at}westminster.ac.uk, l.thomas3{at}westminster.ac.uk, cule{at}calicolabs.com, HYaghootkar{at}lincoln.ac.uk
Data availability
Our research was conducted using UK Biobank data. Under the standard UK Biobank data sharing agreement, we (and other researchers) cannot directly share raw data obtained or derived from the UK Biobank. However, under this agreement, all the data generated, and methodologies used in this paper are returned by us to the UK Biobank, where they will be fully available. Access can be obtained directly from the UK Biobank to all bona fide researchers upon submitting a health-related research proposal to the UK Biobank https://www.ukbiobank.ac.uk.





