Molecular Information Theory: Molecular Efficiency and Flip-Flops

Thomas Schneider
National Cancer Institute

Information theory was introduced by Claude Shannon in 1948 to
precisely characterize data flows in communications systems.
The same mathematics can also be fruitfully applied to molecular
biology problems. We start with the problem of understanding
how proteins interact with DNA at specific sequences called
binding sites. Information theory allows us to make an average
picture of the binding sites and this can be shown with a
computer graphic called a sequence logo.

Sequence logos show how strongly parts of a binding site are
conserved, on a scale in bits of information. They have been
used to study a variety of genetic control systems. More
recently the same mathematics has been used to look at
individual binding sites using another computer graphic called a
sequence walker. Sequence walkers are being used to predict
whether changes in human genes cause mutations or are neutral
polymorphisms. It may soon be possible to predict the degree of
colon cancer by this method.

Information theory can also be used to understand the
relationship between the binding energy dissipated when two
molecules stick together and the amount of sequence conservation
of the molecules measured in bits. Using the Second Law of
Thermodynamics, this relationship can be expressed as the
efficiency of the molecular interaction. Surprisingly, many
molecular systems including genetic systems, visual pigments and
motility proteins have efficiencies near 70%. A purely
geometrical explanation of this result shows that although
biological systems are selected to have the highest efficiency,
it is restricted to 70% because having precisely distinguishable
molecular states is more important.

We discovered that the Fis protein (which controls many genes in
E. coli) frequently uses pairs of sites 7 or 11 base pairs
apart on DNA. Two overlapping Fis sites separated by 11 base
pairs are found in the E. coli origin of chromosomal replication
(the place where replication begins). We found that only one of
the two overlapping Fis sites is bound by Fis at a time, so the
structure is a molecular flip-flop. Since the two sites are
precisely positioned between two DnaA sites, and these determine
the orientation of the DnaB helicase, we suggest that the
flip-flop directs alternative firing of replication complexes in
opposite directions. Since they can implement Boolean logic,
molecular flipflops could be used to build molecular computers.

For more information see

Back to NANO2002 Workshop I: Alternative Computing