Skip to main content
medRxiv
  • Home
  • About
  • Submit
  • ALERTS / RSS
Advanced Search

Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity

View ORCID ProfileSteven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal Dey, Joseph Nasser, Karthik Jagadeesh, Daniel Weiner, Huwenbo Shi, Charles Fulco, Luke O’Connor, Bogdan Pasaniuc, Jesse M. Engreitz, Alkes L. Price
doi: https://doi.org/10.1101/2021.08.02.21261488
Steven Gazal
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
3Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
4Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Steven Gazal
  • For correspondence: gazal@usc.edu price@hsph.harvard.edu
Omer Weissbrod
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Farhad Hormozdiari
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Kushal Dey
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Joseph Nasser
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Karthik Jagadeesh
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Daniel Weiner
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Huwenbo Shi
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles Fulco
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
5Department of Systems Biology, Harvard Medical School, Boston, MA, USA
6Bristol Myers Squibb, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Luke O’Connor
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Bogdan Pasaniuc
7Departments of Computational Medicine, Human Genetics, Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jesse M. Engreitz
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
8Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
9BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Alkes L. Price
1Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
2Broad Institute of MIT and Harvard, Cambridge, MA, USA
10Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: gazal@usc.edu price@hsph.harvard.edu
  • Abstract
  • Full Text
  • Info/History
  • Metrics
  • Supplementary material
  • Data/Code
  • Preview PDF
Loading

Abstract

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.

Competing Interest Statement

C.P.F. is now an employee of Bristol Myers Squibb.

Funding Statement

S.G. is funded by NIH grant R00 HG010160. A.L.P. is funded by NIH grants U01 HG009379, R01 MH101244, R37 MH107649, R01 MH115676 and R01 MH109978.

Author Declarations

I confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.

Yes

The details of the IRB/oversight body that provided approval or exemption for the research described are given below:

No IRB requested for this study

All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.

Yes

I understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).

Yes

I have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.

Yes

Data Availability

The List of 19,995 genes, summary statistics of the 63 independent traits, training and validation critical gene sets, S2G and cS2G strategies, SNP annotations, predicted causal SNP-disease pairs from UK Biobank fine-mapping analyses and from the NHGRI-EBI GWAS catalog, and heritability causally explained by SNPs linked to each gene have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G.

https://alkesgroup.broadinstitute.org/cS2G

Copyright 
The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license.
Back to top
PreviousNext
Posted August 05, 2021.
Download PDF

Supplementary Material

Data/Code
Email

Thank you for your interest in spreading the word about medRxiv.

NOTE: Your email address is requested solely to identify you as the sender of this article.

Enter multiple addresses on separate lines or separate them with commas.
Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity
(Your Name) has forwarded a page to you from medRxiv
(Your Name) thought you would like to see this page from the medRxiv website.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Share
Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity
Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal Dey, Joseph Nasser, Karthik Jagadeesh, Daniel Weiner, Huwenbo Shi, Charles Fulco, Luke O’Connor, Bogdan Pasaniuc, Jesse M. Engreitz, Alkes L. Price
medRxiv 2021.08.02.21261488; doi: https://doi.org/10.1101/2021.08.02.21261488
Digg logo Reddit logo Twitter logo Facebook logo Google logo LinkedIn logo Mendeley logo
Citation Tools
Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity
Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal Dey, Joseph Nasser, Karthik Jagadeesh, Daniel Weiner, Huwenbo Shi, Charles Fulco, Luke O’Connor, Bogdan Pasaniuc, Jesse M. Engreitz, Alkes L. Price
medRxiv 2021.08.02.21261488; doi: https://doi.org/10.1101/2021.08.02.21261488

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Subject Area

  • Genetic and Genomic Medicine
Subject Areas
All Articles
  • Addiction Medicine (174)
  • Allergy and Immunology (421)
  • Anesthesia (97)
  • Cardiovascular Medicine (901)
  • Dentistry and Oral Medicine (170)
  • Dermatology (102)
  • Emergency Medicine (257)
  • Endocrinology (including Diabetes Mellitus and Metabolic Disease) (407)
  • Epidemiology (8789)
  • Forensic Medicine (4)
  • Gastroenterology (405)
  • Genetic and Genomic Medicine (1863)
  • Geriatric Medicine (179)
  • Health Economics (388)
  • Health Informatics (1292)
  • Health Policy (644)
  • Health Systems and Quality Improvement (492)
  • Hematology (207)
  • HIV/AIDS (394)
  • Infectious Diseases (except HIV/AIDS) (10565)
  • Intensive Care and Critical Care Medicine (564)
  • Medical Education (193)
  • Medical Ethics (52)
  • Nephrology (218)
  • Neurology (1756)
  • Nursing (103)
  • Nutrition (266)
  • Obstetrics and Gynecology (343)
  • Occupational and Environmental Health (461)
  • Oncology (965)
  • Ophthalmology (283)
  • Orthopedics (107)
  • Otolaryngology (177)
  • Pain Medicine (118)
  • Palliative Medicine (43)
  • Pathology (264)
  • Pediatrics (557)
  • Pharmacology and Therapeutics (265)
  • Primary Care Research (219)
  • Psychiatry and Clinical Psychology (1845)
  • Public and Global Health (3986)
  • Radiology and Imaging (655)
  • Rehabilitation Medicine and Physical Therapy (344)
  • Respiratory Medicine (535)
  • Rheumatology (215)
  • Sexual and Reproductive Health (178)
  • Sports Medicine (166)
  • Surgery (197)
  • Toxicology (37)
  • Transplantation (106)
  • Urology (80)