Integrative genomics: quantifying significance of phenotype-genotype relationships from multiple sources of high-throughput data

Front Genet. 2013 May 31:3:202. doi: 10.3389/fgene.2012.00202. eCollection 2012.

Abstract

Given recent advances in the generation of high-throughput data such as whole-genome genetic variation and transcriptome expression, it is critical to come up with novel methods to integrate these heterogeneous datasets and to assess the significance of identified phenotype-genotype relationships. Recent studies show that genome-wide association findings are likely to fall in loci with gene regulatory effects such as expression quantitative trait loci (eQTLs), demonstrating the utility of such integrative approaches. When genotype and gene expression data are available on the same individuals, we and others developed methods wherein top phenotype-associated genetic variants are prioritized if they are associated, as eQTLs, with gene expression traits that are themselves associated with the phenotype. Yet there has been no method to determine an overall p-value for the findings that arise specifically from the integrative nature of the approach. We propose a computationally feasible permutation method that accounts for the assimilative nature of the method and the correlation structure among gene expression traits and among genotypes. We apply the method to data from a study of cellular sensitivity to etoposide, one of the most widely used chemotherapeutic drugs. To our knowledge, this study is the first statistically sound quantification of the overall significance of the genotype-phenotype relationships resulting from applying an integrative approach. This method can be easily extended to cases in which gene expression data are replaced by other molecular phenotypes of interest, e.g., microRNA or proteomic data. This study has important implications for studies seeking to expand on genetic association studies by the use of omics data. Finally, we provide an R code to compute the empirical false discovery rate when p-values for the observed and simulated phenotypes are available.

Keywords: FDR; GWAS; eQTLs; gene expression; genomics; integrative genomics; permutation; phenotype.