In many applications involving multimedia data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, or visualization. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of a video, or audio clips, web documents and art work describing musical artists. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning (MKL) technique for integrating heterogeneous data into a single, unified similarity space. Instead of finding a weighted linear combination of base kernels, as in the original MKL formulation, we learn a concatenation of linear projections, where each projection extracts the relevant information from a base kernel's feature space. This new paradigm results in a more flexible model than previous methods, that can adapt to the case where the
discriminative power of a kernel varies over the data set or feature space. It can be applied to learn an optimal multi-modal metric for various learning algorithms. Applications to learning similarity (e.g., for kNN classification), metric learning to rank and cross-modal multimedia retrieval will be discussed.
Back to Large Scale Multimedia Search