Beyond "Bag of Words": Towards a Framework for Conceptual Retrieval

Jimmy Lin
University of Maryland

Although the field of information retrieval has made enormous progress in the last half century, virtually all systems are still built on the remarkably simple concept of "counting words", under assumptions of term independence. Although these methods have been empirically validated (e.g., in TREC evaluations), it is a simple fact that words alone cannot capture the semantic content of documents and information needs.

In this talk, I will discuss a framework for "conceptual retrieval"
that articulates the types of knowledge that are important for information seeking. This general framework is instantiated in a clinical question answering system that operationalizes the principles of evidence-based medicine (EBM). Experiments show that an EBM-based scoring algorithm outperforms a state-of-the-art baseline that employs only term statistics. Ablation studies further yield a better understanding of the performance contributions of different components.

I will conclude by discussing how other domains can benefit from knowledge-based approaches and the general applicability of this proposed framework.

Presentation (PowerPoint File)

