Proximity Graphs for Instance-Based Learning

Godfried Toussaint
McGill University
Computer Science

In the typical nonparametric approach to learning and pattern recognition, random data (the training set of patterns) are collected and used to design a decision rule. One of the most well known such rules is the K-nearest-neighbor
rule in which an unknown pattern is classified into the majority class among its K nearest neighbors in the training set. Several questions related to this rule have received considerable attention over the past fifty years. Such questions include the following. How large should K be? How should a value of K be chosen? Should all K neighbors be equally weighted when used to decide the class of an unknown pattern? Should the features (measurements) be equally weighted? If not, how should these weights be chosen? What metric should be used? How can the rule be made robust to outliers present in the training data? How can the storage of the training set be reduced without degrading the performance of the decision rule? How fast can the nearest neighbor of a query point be found? What is the smallest neural network that can implement the nearest
neighbor rule? Proximity graphs offer elegant solutions to all these problems. In this lecture we review and discuss recent work on the application of proximity graphs in this area and show how to obtain the best results in practice.


Video of Talk (RealPlayer File)

Back to Graduate Summer School: Intelligent Extraction of Information from Graphs and High Dimensional Data