Random Sampling for Matrix Projection and Approximation

Santosh Vempala
Massachusetts Institute of Technology
Mathematics

We consider the problem of approximating a given real m x n matrix by a rank-k matrix, where k < min {m, n}. It is known that a sample of O(k/\eps) rows of the matrix, chosen according to the squared length distribution of the rows, contains a low-rank approximation with additive
error at most \eps [Frieze, Kannan and Vempala]. Here we show that with adaptive sampling, in t rounds using O(k/\eps) samples in each round,
the additive error drops exponentially as (\eps)^t; the computation time remains nearly linear in the number of nonzero entries. Viewing the best
rank-k subspace as a projection, this means that we can adaptively refine a distribution on subspaces that zooms in on the best one while retaining a simple description. This is also a natural problem for which multiple passes over data provide exponential improvement. In a related vein, we will give a nonconstructive proof that there always exists a subset
of O(k^2/\eps) rows whose span contains a rank-k approximation with multiplicative (1+\eps) error. This leads to an approximation scheme
for the following projective clustering problem: Given a set of points P in \R^d, and integers k,q, find $j$ subspaces F_1,..., F_j, each of
dimension at most k, that minimize the sum over the points, of the squared distance of each point to its nearest subspace. The talk will introduce relevant concepts. This is joint work with Luis Rademacher and Grant Wang.

Presentation (PDF File)

Back to MGA Workshop II: Multiscale Geometry in Scientific Computing