Rodeo: Sparse nonparametric regression in high dimensions

John Lafferty
Carnegie Mellon University
Computer Science

Modern data sets requiring statistical analysis are often very high dimensional. However, estimating a high dimensional regression function is notoriously difficult, due to the "curse of dimensionality," which can be precisely characterized using minimax theory. In this talk we present a new method for simultaneously performing bandwidth selection and variable selection in nonparametric regression that can beat the curse of dimensionality when the underlying function is sparse. The method starts with a local linear estimator with large bandwidths, and incrementally decreases the bandwidth in directions where the gradient of the estimator with respect to bandwidth is large. The method--called "rodeo" (regularization of derivative expectation operator)--conducts a sequence of hypothesis tests, and is easy to implement. A modified version that replaces testing with soft thresholding cay be viewed as solving a sequence of lasso problems. When applied in one dimension, the rodeo yields a simple adaptive estimator that chooses the locally optimal bandwidth. More generally, the method achieves the optimal minimax rate of convergence, up to logarithmic factors, as if the true relevant variables were known in advance. Joint work with Larry Wasserman.


Video of Talk (RealPlayer File)

Back to Graduate Summer School: Intelligent Extraction of Information from Graphs and High Dimensional Data