Aggregating local image descriptors into compact codes

Cordelia Schmid

This presentation addresses the problem of large-scale image
search. Three constraints have to be taken into account: search accuracy, efficiency and memory usage. We first present and evaluate different ways of aggregating local image descriptors into a vector and show that the Fisher kernel achieves better performance than the
reference bag-of-visual words approach for any given vector
dimension. We then jointly optimize dimensionality reduction and indexing in order to obtain a precise vector comparison as well as a compact representation. The evaluation shows that the image representation can be reduced to a few dozen bytes while preserving
high accuracy. Searching a 100 million image dataset takes about 250 ms on one processor core.

This is joint work with H. Jegou, F. Perronnin, M. Douze, J. Sanchez
and P. Perez.

Back to Large Scale Multimedia Search