Random Sampling for Matrix Projection and Approximation

Santosh Vempala Massachusetts Institute of Technology Mathematics

We consider the problem of approximating a given real m x n matrix by a rank-k matrix, where k < min {m, n}. It is known that a sample of O(k/\eps) rows of the matrix, chosen according to the squared length distribution of the rows, contains a low-rank approximation with additive error at most \eps [Frieze, Kannan and Vempala]. Here we show that with adaptive sampling, in t rounds using O(k/\eps) samples in each round, the additive error drops exponentially as (\eps)^t; the computation time remains nearly linear in the number of nonzero entries. Viewing the best rank-k subspace as a projection, this means that we can adaptively refine a distribution on subspaces that zooms in on the best one while retaining a simple description. This is also a natural problem for which multiple passes over data provide exponential improvement. In a related vein, we will give a nonconstructive proof that there always exists a subset of O(k^2/\eps) rows whose span contains a rank-k approximation with multiplicative (1+\eps) error. This leads to an approximation scheme for the following projective clustering problem: Given a set of points P in \R^d, and integers k,q, find $j$ subspaces F_1,..., F_j, each of dimension at most k, that minimize the sum over the points, of the squared distance of each point to its nearest subspace. The talk will introduce relevant concepts. This is joint work with Luis Rademacher and Grant Wang.