Identification of regulatory motifs from a set of potentially coregulated genes

Mark Van Der Laan
UC Berkeley

Identification of regulatory motifs from a set of potentially coregulated genes is an important area of research in computational biology. We develop a motif finding method, CMEME, that searches motifs of a specified family in the upstream control regions of coregulated genes. Motif families are characterized through their entropy curves and we refer to the motifs with nonrandom entropies as regular motifs. We model the upstream control regions by a two component multinomial mixture model and employ motif family specific constraints on the entropy curve. This corresponds to extension of MEME framework (Bailey and Elkan, 1995) in a way that limits the shapes of the motifs that are searched for. Model parameters are estimated with EM algorithm with an M-step employing constraint maximization. We show in our preliminary simulations that CMEME is less variable than MEME. Moreover, under the scenario where all of the sequences to be analyzed contain a random entropy motif and only a subset of the sequences to be analyzed contains a regular motif, CMEME outperforms MEME in identifying the regular motif.

Joint work with Sunduz Keles, Sandrine Dudoit, Mike Eisen, Biao Xing.