The Power of Comparative Reasoning

Jay Yagnik
Google Inc.

Rank correlation measures are known for their resilience to perturbations in numeric values and are widely used in many evaluation metrics. Such ordinal measures have rarely been applied in treatment of numeric features as a representational transformation. We emphasize the bene?ts of ordinal representations of input features both theoretically and empirically. We present a family of algorithms for computing ordinal embeddings based on partial order statistics. Apart from having the stability bene?ts of ordinal measures, these embeddings are highly nonlinear, giving rise to sparse feature spaces highly favored by several machine learning methods. These embeddings are deterministic, data independent and by virtue of being based on partial order statistics, add another degree of resilience to noise. These machine-learning-free methods when applied to the task of fast similarity search outperform state-of-theart machine learning methods with complex optimization setups. For solving classi?cation problems, the embeddings provide a nonlinear transformation resulting in sparse binary codes that are well-suited for a large class of machine learning algorithms. These methods show signi?cant improvement on VOC 2010 using simple linear classi?ers which can be trained quickly. Our method can be extended to the case of polynomial kernels, while permitting very ef?cient computation. Further, since the popular MinHash algorithm is a special case of our method, we demonstrate an ef?cient scheme for computing MinHash on conjunctions of binary features. The actual method can be implemented in about 10 lines of code in most languages (2 lines in MATLAB), and does not require any data-driven optimization.


Back to Long Programs