Deterministic Clustering with Data Nets

Leonard Schulman
California Institute of Technology

We consider the K-clustering problem with the \ell_2^2 distortion measure, also known as the problem of optimal fixed-rate vector quantizer design. We provide a deterministic linear-time
(1+\epsilon)-approximation algorithm for this problem. Specifically, on a data set of size n, in d dimensions, the run-time is nM + M^k for M=poly(K)(d/\epsilon)^{O(d)}.

The key tool is construction of a new kind of data representation called a "data net". The construction of data nets leads to solutions
to a wide variety of other problems in lossy data compression and data analysis, a few of which will be discussed.

Joint work with Michelle Effros

Back to MGA Workshop II: Multiscale Geometry in Scientific Computing