Modern-day biology is characterized by an unprecedented wealth of large-scale sequence data. Sophisticated computational approaches are needed to extract information from raw data, and to infer the rules governing complex biological systems from observations. As an example for this general idea, I will discuss the recently developed Direct-Coupling Analysis (DCA), a statistical-inference approach for detecting direct residue coevolution in large multiple-sequence alignments of homologous proteins. Based on sequence information alone, this analysis allows to extract accurate residue-residue contact predictions, which in turn are helpful to predict tertiary and quaternary protein structures, to reconstruct protein-protein interaction networks, and to infer quantitative mutational landscapes.
Back to Multiple Sequence Alignment