Generating Peptide Candidates from Protein Sequence Databases for Protein Identification via Mass Spectrometry

Nathan Edwards
Applied Biosystems
Informatics Research

Protein identification via mass spectrometry forms the foundation of high-throughput proteomics. Tandem mass spectrometry, when applied to a complex mixture of peptides, selects and fragments each peptide to reveal its primary structure, its amino-acid sequence, as a mass spectrum. The high-throughput identification of peptides from these mass spectra relies on the comparison of each tandem mass spectrum with peptide candidates from a protein sequence database. The efficient generation of peptide candidates from protein sequence databases is the corner-stone upon which all search engines for tandem mass spectrometry are built. The peptide candidate generation problem will be defined and a number of solution strategies discussed. We demonstrate that to solve this problem efficiently, we must deal with substring redundancy in the amino-acid sequence database and focus our attention on looking up the observed peptide masses quickly. We show that it is possible to solve the peptide candidate generation problem in time asymptotically proportional to the number of distinct peptide candidates output.

Presentation (PowerPoint File)

Back to Workshop I: High Throughput Technologies and Methods of Analysis