Statistics 216: High Dimensional Data Analysis  Fall 2000

Room 6229 Math Sciences

MWF 12 - 12:50

Instructor: Ker-Chau Li 

Course Description:

Dimensionality is an issue that can arise in every scientific field.  Generally speaking, the difficulty lies on how to visualize a high dimensional function or data set. People often ask :  "How do they look?",  "What structures are there?",  "What model should be used?"  Aside from the differences that underlie the various scientific contexts, such kind of questions do have a common root in Statistics. This is the driving force for   the study of high dimensional data analysis.

This course will discuss several statistical methodologies useful for exploring voluminous data.   They include Principal Component Analysis, Clustering and Classification, Tree-structured analysis, Neural Network, Hidden Markov  Models, Sliced inverse regression (SIR) and principal Hessian direction (PHD).

SIR and PHD are two novel dimension reduction methods,  useful for the  extraction of geometric information underlying noisy data of several dimensions. The theory  of   SIR/PHD  will be discussed in depth. It will be used as the backbone for the entire course. Examples from various application areas will be given. They include social/economic problems like unemployment rates, biostatistics problems like clinic trials with censoring, machine learning problems like  handwritten digital recognition; quality control problems like performance measurement of digital to analog converters; biomedical problems like functional Magnet Resonance Imaging, and bioinformatics problems like micro-array gene expression.

Tentative Course Outline:

Week 1: Dimension reduction; principal Component analysis;  analysis of variance ; clustering

Week 2: Effective Dimension reduction in regression; Sliced inverse regression; application.

Week 3: Transformation, projection pursuit, Classification.

Week 4: Multivariate SIR; Aggregated time series and  curves.

Week 5: Principal Hessian Directions; regression trees.

Week 6: Linear design condition; quasi-helices

Week 7: Discrete regressors; Error-in-regressor; Censored regression.

Week 8: Support vector machine; clustering; neural network.

Week 9: Aggregated imaging data; independent component analysis ; functional data analysis.

Week 10: Data visualization for simulation data.

Textbooks:

None. Instructor's lecture notes are available in ps and pdf files.  Other related papers are to be distributed in class.

Visit http://www.stat.ucla.edu/~kcli.