Analysis on gene expression data of the NCI's 60 cancer cell lines

Jae Lee
University of Virginia
Health Evaluation Sciences

For more than a decade the U.S. National Cancer Institute has been experimenting with and collating a rich set of data of anticancer drugs based on a pool of 60 lines of various types of cancer. In parallel with this massive drug database, several large databases of microarray, oligonucleotide cDNA expression data, and some molecular targets on the 60 cancer cell lines are now available. To effectively investigate these large databases, innovative statistical investigation methods are to be developed. We propose a hierarchical Bayes modeling approach to estimate the variability of various biological factors and their effects simultaneously and to identify genes with significant effects, especially gene-cell interaction effects. For this we first construct a Hierarchical Effects Model (HEM) and estimate the model parameters using Markov Chain Monte Carlo, a recent statistical resampling technique. The vitality of such a statistical development on vast amounts of biological data depends both on close interaction and collaboration between statistical and biological researchers and on the flexibility of our investigation tools to interpret the data from various perspectives. We have developed a web-based system to provide our statistical analysis tools directly to biological and clinical researchers. Key Word: gene expression data, NCI 60 cell lines, hierarchical effects model, Markov chain Monte Carlo, anticancer drug potency data

Presentation (PDF File)

Back to Expression Arrays, Genetic Networks and Disease