Geometry-based sampling in learning and classification

Leonard Schulman California Institute of Technology

Let X be a space and F a family of {0,1}-valued functions on X. Vapnik and Chervonenkis showed that if F is "simple" (finite VC dimension), then for every probability measure \mu on X and epsilon>0 there is a finite set S such that for all f in F, \sum_{x \in S} f(x)/|S| = [\int f(x) d\mu(x)] \pm epsilon. Think of S as a "universal epsilon-approximator" for integration in F. S can actually be obtained w.h.p. just by sampling a few points from \mu. This is a mainstay of computational learning theory. It was later extended by other authors to families of bounded (e.g., [0,1]-valued) real functions.

We wish to establish similar "universal epsilon-approximators" for families of unbounded nonnegative real functions---in particular, for the families over which one optimizes when performing data classification. (In this case the epsilon-approximation should be multiplicative.) I will describe the following result (among others), obtained by geometry-sensitive sampling. Let F be the family of "k-median functions" on X=R^d: every u_1,...,u_k \in R^d determine an f by f(x)=Euclidean distance from x to {u_1,...,u_k}. Then for every probability measure \mu on R^d there exists a set S of cardinality poly(k,d,1/epsilon) and a probability measure \mu_S supported on S such that for every f in F, \sum_{x \in S} f(x) \mu_S(x) = [\int f(x) d\mu(x)] * [1 \pm epsilon].

Joint work with Michael Langberg, Open U. of Israel.