A major obstacle to the exploitation of the large volume of genomic sequence data is the functional characterization of the gene products. At the time of writing, there are 146 published completely sequenced genomes, 344 ongoing prokaryotic and 243 ongoing eukaryotic sequencing projects. A large proportion, typically 30-40% of the predicted protein coding regions of these organisms’ genomes code for proteins of unknown function. Annotation is normally inherited from database matches to similar sequences for which the function is known. New algorithms that make use of the information contained within alignments of multiple sequences are very effective at identifying distant sequence relationships. However, the definition of a match is parameter dependent and this procedure is open to the danger of error propagation. But even using sensitive sequence similarity detection methods a significant proportion of gene products cannot be reliably assigned function.
Recently large-scale protein structure determination projects have got underway. These initiatives are variously referred to as ‘structural genomics’ or ‘structural proteomics’. One goal is to carry out a comprehensive sampling of protein sequence space and the determination of structures representative of a given sequence neighborhood – such an approach would allow the structures of other sequences in the neighborhood to be obtained by the methods of comparative (homology) modeling. Since protein three-dimensional structure is more conserved than sequence these initiatives also open up the possibility of biochemical or biophysical functional characterization via structure.
In parallel to these experimental programs are ongoing efforts to address these questions using computational techniques. The focus of this workshop will be on these computational methods and will discuss the state of the art and current challenges in a variety of topics in the field of structural proteomics, such as protein structure prediction, inferring function from sequence and structure, challenges in comparative modeling, and protein-protein interactions