Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data

Bioinformatics. 2002 Nov;18(11):1462-9. doi: 10.1093/bioinformatics/18.11.1462.

Abstract

Motivation: Recent technological advances such as cDNA microarray technology have made it possible to simultaneously interrogate thousands of genes in a biological specimen. A cDNA microarray experiment produces a gene expression 'profile'. Often interest lies in discovering novel subgroupings, or 'clusters', of specimens based on their profiles, for example identification of new tumor taxonomies. Cluster analysis techniques such as hierarchical clustering and self-organizing maps have frequently been used for investigating structure in microarray data. However, clustering algorithms always detect clusters, even on random data, and it is easy to misinterpret the results without some objective measure of the reproducibility of the clusters.

Results: We present statistical methods for testing for overall clustering of gene expression profiles, and we define easily interpretable measures of cluster-specific reproducibility that facilitate understanding of the clustering structure. We apply these methods to elucidate structure in cDNA microarray gene expression profiles obtained on melanoma tumors and on prostate specimens.

Publication types

  • Comparative Study
  • Evaluation Study
  • Validation Study

MeSH terms

  • Cluster Analysis*
  • DNA / classification
  • DNA / genetics
  • Gene Expression Profiling / methods
  • Gene Expression Regulation, Neoplastic / genetics
  • Humans
  • Male
  • Melanoma / classification
  • Melanoma / genetics*
  • Oligonucleotide Array Sequence Analysis / methods*
  • Pattern Recognition, Automated
  • Prostatic Hyperplasia / classification
  • Prostatic Hyperplasia / genetics*
  • Prostatic Neoplasms / classification
  • Prostatic Neoplasms / genetics*
  • Quality Control
  • Reference Values
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods
  • Stochastic Processes

Substances

  • DNA