Abstract
Understanding gene-disease associations is important for uncovering pathological mechanisms and identifying potential therapeutic targets. Knowledge graphs offer a powerful solution for representing and integrating data from multiple biomedical sources, but lack individual-level information on target organ structure and function. Here we developed CardioKG, a knowledge graph integrating over 200,000 computer vision-derived cardiovascular phenotypes from biomedical images with data extracted from 18 diverse biological databases modelling over a million relationships. A variational graph auto-encoder was used to generate node embeddings from the knowledge graph, which were used as input features to predict gene-disease associations, assess druggability and propose drug repurposing strategies. The model predicted new genetic associations and therapeutic strategies for leading causes of cardiovascular disease which were also associated with improved survival. Candidate therapies included methotrexate for heart failure and gliptins for atrial fibrillation. Imaging enhanced the ability to leverage biological data for pathway discovery. These capabilities represent an important step toward using biomedical imaging to enhance graph-structured models for identifying treatable disease mechanisms.
Competing Interest Statement
D.P.O'R. receives research support from Bayer AG and Calico Labs, and is a paid consultant to Bayer AG and Bristol Myers Squibb. J.S.W. has received research support from Bristol Myers Squibb, has acted as a paid advisor to Health Lumen, Tenaya Therapeutics, and Solid Biosciences, and is a founder with equity in Saturnus Bio.
Funding Statement
The study was supported by the Medical Research Council (MC_UP_1605/13); the British Heart Foundation (RG/F/24/110138, RE/24/130023, CH/F/24/90015, FS/IPBSRF/22/27059); Bayer AG, and the National Institute for Health Research (NIHR) Imperial College Biomedical Research Centre. D.P.OR. is supported by the British Heart Foundation Big Beat Challenge award to CureHeart (BBC/F/21/220106).
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
The study received ethical approval from the National Research Ethics Service (11/NW/0382), and all participants gave written informed consent.
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Data availability
The scripts for data analysis are publicly available at https://github.com/ImperialCollegeLondon/cardioKG (DOI:10.5281/zenodo.16025952). Data from UK Biobank are available for approved research.





