|
Statistics
216: High Dimensional Data Analysis
Fall 2000 Room
6229 Math Sciences MWF
12 - 12:50 Instructor:
Ker-Chau Li Course
Description: Dimensionality
is an issue that can arise in every scientific field. Generally speaking, the difficulty lies on how to visualize a
high dimensional function or data set. People often ask :
"How do they look?",
"What structures are there?",
"What model should be used?"
Aside from the differences that underlie the various scientific
contexts, such kind of questions do have a common root in Statistics. This
is the driving force for the
study of high dimensional data analysis. This
course will discuss several statistical methodologies useful for exploring
voluminous data. They
include Principal Component Analysis, Clustering and Classification,
Tree-structured analysis, Neural Network, Hidden Markov
Models, Sliced inverse regression (SIR) and principal Hessian
direction (PHD). SIR
and PHD are two novel dimension reduction methods, useful for the extraction
of geometric information underlying noisy data of several dimensions. The
theory of SIR/PHD will
be discussed in depth. It will be used as the backbone for the entire
course. Examples from various application areas will be given. They
include social/economic problems like unemployment rates, biostatistics
problems like clinic trials with censoring, machine learning problems like
handwritten digital recognition; quality control problems like
performance measurement of digital to analog converters; biomedical
problems like functional Magnet Resonance Imaging, and bioinformatics
problems like micro-array gene expression. Tentative
Course Outline: Week
1: Dimension reduction; principal Component analysis; analysis of variance ; clustering Week
2: Effective Dimension reduction in regression; Sliced inverse regression;
application. Week
3: Transformation, projection pursuit, Classification. Week
4: Multivariate SIR; Aggregated time series and Week
5: Principal Hessian Directions; regression trees. Week
6: Linear design condition; quasi-helices Week
7: Discrete regressors; Error-in-regressor; Censored regression. Week
8: Support vector machine; clustering; neural network. Week
9: Aggregated imaging data; independent component Week
10: Data visualization for simulation data. Textbooks:
None.
Instructor's lecture notes are available in ps and pdf files.
Other related papers are to be distributed in class. |