Pattern discovery and statistics in gene expression analysis

Yuhai Tu
IBM Research

One of the practical application of the Microarray technology is phenotype prediction. In this talk, we propose a solution to this problem based on a supervised learning algorithm. Our method couples a complex, non-linear similarity metric, which maximizes the probability of discovering discriminative gene expression patterns, and a pattern discovery algorithm, SPLASH, which discovers efficiently all significant gene expression patterns. The statistical significance of patterns are evaluated based on the probability of such pattern to occur by chance in the control experiments. Finally, a greedy set covering algorithm is used to select an optimal subset of statistically significant patterns, which form the basis for a standard likelihood ratio classification scheme. This method is applied to Microarray data on 60 human cancer cell lines. The results are compared with those by other supervised learning method. Our method is shown to perform better, especially for complex phenotype, such as p53 mutation.


Back to Expression Arrays, Genetic Networks and Disease