Understanding the structure, function, and interactions of proteins is one of the great post-genome challenges of biology and molecular medicine. The increasing rate of depositions into the Protein Data Bank and the advent of structural genomics projects promises a growing body of structural data with which to address these questions. Software tools and statistical methods for analysis of this data will play a key role; however, much of the work in structural bioinformatics has focused on algorithmic and computational solutions, with less attention paid to fundamental statistical ideas of uncertainty and significance.
We present a statistical framework for analysis, prediction, and discovery in protein structural data using methods adapted from the statistical theory of shape. Our approach provides natural solutions to a variety of problems in the field, including the study of conservation and variability, examination of uncertainty in database searches, algorithms for flexible matching, detection of motions and disorder, and clustering and classification techniques. Several of these will be discussed.