Statistical Issues in the Analysis of Mass Spectrometry Data

Hongyu Zhao
Yale University
Public Health & Genetics

Mass Spectrometry (MS) holds great promise for biomarker identification and
genome-wide protein profiling. It has been demonstrated in the literature that
biomarkers can be identified to distinguish normal individuals from cancer
patients using MS data. Such progress is promising for disease diagnosis and
prognosis. Although various statistical methods have been utilized to identify
biomarkers from MS data, data pre-processing and biomarker identification are
two essential components in the analysis of MS data. In this presentation, we
will discuss various issues involved in MS data pre-processing, including noise
removal, spectrum alignment, peak identification, and normalization. In
addition, we compare the performance of several classes of statistical methods
for the classification of cancer based on MS spectra. These methods include:
linear discriminant analysis, quadratic discriminant analysis, k-nearest
neighbor classifier, bagging and boosting classification trees, support vector
machine, and random forest. The methods are applied to ovarian cancer and
control serum samples from the National Ovarian Cancer Early Detection Program
clinic at Northwestern University Hospital. This is joint work with Baolin Wu,
Yinglei Lai, Weichuan Yu, Kenneth Williams, and other members in the Yale NHLBI
Proteomics Center.

Presentation (PDF File)

Back to Workshop I: High Throughput Technologies and Methods of Analysis