Variance Stabilization by Data Transformation in Gene Expression Microarrays

David M. Rocke
Department of Applied Science,
Division of Biostatistics, School of Medicine, and Center for Image Processing and Integrated Computing
University of California, Davis

Gene expression microarrays comprise a suite of related technologies for measuring the expression of thousands of genes simultaneously from a single biological sample. There are also numerous other high-throughput biological assays that can measure large numbers of proteins, lipids, and other biologically active compounds. In this talk, I will focus on one of the statistical challenges in the use of such data. Gene expression measurements have variances that change rapidly with the mean expression, a fact that makes it difficult to apply standard (additive) statistical analysis techniques to the raw data. Use of logarithms (or equivalently, ratios) is often suggested as a cure for this problem, but this now makes data for genes with low expression difficult to use. We introduce a data transformation that can stabilize the variance across the entire range of expression, and allow standard statistical techniques to be used. This transformation also makes normalization an easier process.