Computational approaches for serum marker discovery
Parag Mallick Institute for Systems Biology
Whole cell, genome-wide analysis of gene expression, protein
expression, protein state (e.g. glycosylation, phosphorylation) and
related differential analyses in differentially perturbed cells have
been widely applied to study biological processes and disease states.
Such analysis has also been attempted on blood serum to detect and
identify “fingerprints” or prognostic and diagnostic markers; blood
serum is highly accessible and contains enormous information about
physiologic state. A technology that can perform early detection of
cancer can have a significant impact on cancer mortality when treated
with existing cancer therapies. A key problem with the proteomic
analysis of serum and many other body fluids is the highly skew
composition of blood serum, which is dominated by a few highly abundant
proteins; albumin alone represents over 50% of total serum protein
content. Noting that many clinical biomarkers and therapeutic proteins,
such as Her2/neu, human chorionic gonadotropin, alpha-fetoprotein, PSA
and CA125 are glycosylated, a technique for the enrichment of serum
glyco-proteins was developed.
In addition to experimental challenges, numerous computational
challenges exist to quantitative proteomics of enriched LC-MS samples.
We can describe the problem of identifying fingerprints to determine if
a serum sample came from a healthy patient or from a cancer patient as
a pattern analysis or discrimination problem. Having framed the problem
in a pattern framework, we have begun developing and testing different
algorithms and software tools to treat the three common subproblems of
transduction, feature extraction and classification.
|