Expressing language modeling approaches as region algebra queries

Djoerd Hiemstra
Universiteit Twente
Computer Science

In this talk, I propose a unified theory describing "document space". We have recently made some interesting discoveries while combining two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured
document retrieval. They provide a well-defined behaviour as well as a simple but powerful structured query language. Language models are
particularly useful to reason about the ranking of search results, and for developing new ranking approaches. The unified model allows
application developers to define complex language modeling approaches as simple structured queries on a textual database. We show a remarkable one-to-one relationship between region queries and the language models they represent for a wide variety of applications: simple ad-hoc search,
cross-language retrieval, video retrieval, and web search. The talk will conclude with ongoing research on relating our approach to other
approaches to structured information retrieval, such as the MultiText approach derveloped at the University of Waterloo and the Inquery/Indri
approach developed originally at the University of Massachussets.

Audio (MP3 File, Podcast Ready) Presentation (PDF File)

Back to Document Space