Sequence and Structure Matching: Applications of Probability Theory

Amir Dembo
Stanford University
Mathematics

The problem of assessing the significance of rare phenomena involving scoring schemes is an example in which probability theory has been quite useful for bio-molecular data analysis.


A key reason for the success of this line of research is the ability to focus on questions and models that retain generality and relevance for applications while introducing enough structure to be of theoretical interest and beauty.


I shall explore this interplay while reviewing few contributions made in this direction.


For example, gapless local alignment is linked
to asymptotic of large exceedances in random
sequences which is closely related to
queuing theory and sequential statistics.
Under somewhat different assumptions it leads
instead to an asymptotic of waiting times
that are highly relevant for information theory.
The assessment of significance of approximate
local matching for 3D protein structures results with asymptotic theory for maxima of partial sums indexed by geometrical structures. Finally, theoretical considerations of local optimality yield for a certain parameter regime both logarithmic growth of the gapped local alignment score and a bound on its p-value.

Audio (MP3 File, Podcast Ready)

Back to Sequence Analysis Toward System Biology