Sparse decompositions offer a number of interesting properties for structuring audio content, and in particular they provide a representation of the signals that is both compact and hierarchically organized. In this talk I will present some recent work targeted at structuring a mixed audio content, such as in radio archives, with different goals and criteria : finding out exact or approximate repeats, jointly coding them, identifying speech and music content, etc. One of the challenges is to constrain the computational complexity (that depends on the size of the dictionary) for a possible use on large datasets, hence requiring optimized algorithms, and for this we introduce a new technique called Random Matching Pursuit. Another issue is whether the first atoms in the decomposition can be used as a watermark for fast identification purposes : our preliminary results show that, by a slight modification of the selection rule - supported by psychoacoustic experiments -, a good tradeoff can be reached between the size of the watermark and the identification rates.
Back to Large Scale Multimedia Search