Dictionary Models for Motif Finding and Haplotyping

Kenneth Lange
UCLA
Biomathematics

In this talk I will survey applications of the dictionary model proposed by Bussemaker, Li, and Siggia for binding site recognition. The model involves a dictionary of motifs (words) and their usage probabilities. A DNA sequence
is constructed by random concatenation of the dictionary's words. Usage probabilities can be estimated by an MM algorithm, and different dictionaries compared by a combination of maximum likelihood and minimum description length criteria. These capabilities enable heuristic solution of the inverse problem
of constructing a dictionary from observed sequence data.



Dictionary models are also useful in defining haplotype blocks. In this setting,
a haplotype is constructed by randomly concatenating haplotype segments from a
given dictionary of segments. Estimation of segment probabilities,comparison of alternative dictionaries, and construction of the best
dictionary parallel the procedures designed for motif recognition. A major advantage of the dictionary model is that it does not entail sharp block boundaries. Once a haplotype dictionary is available, it can be used in
haplotyping and for discovering tagging SNPs.

Audio (MP3 File, Podcast Ready) Presentation (PDF File)

Back to Long Programs