Abstract
Fine-mapping and gene-prioritisation techniques applied to the latest Genome-Wide Association Study (GWAS) results have prioritised hundreds of genes as causally associated with disease. Here we leverage these recently compiled lists of high-confidence causal genes to interrogate where in the body disease genes operate, which, in previous studies, has mostly been investigated by testing for enrichment of GWAS signal among genes with cell/tissue specific expression. By integrating GWAS summary statistics, gene prioritisation results, and RNA-seq data from 46 tissues and 204 cell types, we directly analyse the gene expression of putative disease genes across the body in relation to 11 major diseases and cancers. In tissues and cell types with established disease relevance, disease genes show higher and more specific gene expression compared to control genes. However, we also detect elevated expression in tissues and cell types without previous links to the corresponding disease. While some of these results may be explained by cell types that span multiple tissues, such as macrophages in brain, blood, lung and spleen in relation to Alzheimer’s disease (P-values < 10-3), the cause for others is unclear and warrants further investigation. To support functional follow-up studies of disease genes, we identify technical and biological factors influencing their expression, and highlight tissues in which higher expression is associated with increased odds of inclusion in drug development programs. We provide our systematic testing framework as an open-source, publicly available tool that can be utilised to offer novel insights into the genes, tissues and cell types involved in any disease, with the potential for informing drug development and delivery strategies.
Competing Interest Statement
The authors have declared no competing interest.
Funding Statement
This work was supported by a grant from the National Institutes of Health (R01MH122866) to PFO, by a 2022 NARSAD Young Investigator Grant (Number 30749) by the Brain & Behavior Research Foundation to JGG, and a grant from the National Human Genome Research Institute (1K99HG013547-01) to JGG. Additionally, this work was supported in part through the computational and data resources and staff expertise provided by Scientific Computing and Data at the Icahn School of Medicine at Mount Sinai and supported by the Clinical and Translational Science Awards (CTSA) grant UL1TR004419 from the National Center for Advancing Translational Sciences. Research reported in this publication was also supported by the Office of Research Infrastructure of the National Institutes of Health under award number S10OD026880 and S10OD030463. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author Declarations
I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.
Yes
The details of the IRB/oversight body that provided approval or exemption for the research described are given below:
Data access to the UK Biobank Resource was approved under application number 18177 to Paul O'Reilly
I confirm that all necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived, and that any patient/participant/sample identifiers included were not known to anyone (e.g., hospital staff, patients or participants themselves) outside the research group so cannot be used to identify individuals.
Yes
I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).
Yes
I have followed all appropriate research reporting guidelines, such as any relevant EQUATOR Network research reporting checklist(s) and other pertinent material, if applicable.
Yes
Footnotes
We have made several revisions that hopefully enhance the overall impact and innovation of this study. Key improvements include: i. Clearly defined control genes: We have now explicitly defined the control gene groups used in both the main and sensitivity analyses, adding a dedicated subsection in the Methods. ii. Performed sensitivity analyses at different thresholds: We have now demonstrated that our results are robust to use of multiple PoPS thresholds and, thus, to variations in the percentage of top-ranked genes. We include a discussion on how PoPS threshold choices influences relative and absolute expression patterns. iii. Used all analysis approach on all diseases: We have now extended our analyses to apply all the analytical approaches across all the expression databases to the three major cancers -breast, prostate and colorectal cancer-in addition to the major diseases to which all approaches were previously applied. iv. Clarified novelty: We have refined key sections in the Introduction and Results to explicitly highlight how our work differs from previous studies and its novel contributions. Importantly, ours is the first study to systematically investigate the gene expression profiles of recently-produced candidate disease gene lists. Additionally, we now include new analyses demonstrating the potential of our approach for drug target prioritization. We show that for some tissues, higher expression of disease genes is associated with increased odds of inclusion in drug development programs. v. Enhanced reproducibility and accessibility: We have now produced a highly user-friendly analysis pipeline so that our analyses can be easily extended to other diseases and new data. We have: - Simplified the main workflow while incorporating flags for sensitivity and secondary analyses. - Replaced the LD panel with the publicly available 1000 Genomes Project, improving accessibility. - Provided direct links to all gene expression resources. - Supplied Singularity containers to ensure reproducibility of our pipeline irrespective of OS/environment. - Made our pipeline publicly available on GitLab: https://gitlab.com/JuditGG/GeneExpressionLandscape
Data Availability
All data produced are available online and in the supplemental materials. The scripts used in the current study are available at https://gitlab.com/JuditGG/GeneExpressionLandscape