Scalable inference in probabilistic topic models

David Blei
Princeton University
Computer Science

Probabilistic topic modeling provides an important suite of tools for the unsupervised analysis of large collections of documents. Topic models uncover the underlying themes of the documents, and then use those themes to aid in exploration, search, and prediction. However, traditional topic modeling algorithms require multiple passes through the collection. They come with a significant computational burden, and much research on scaling up topic models has gone into developing distributed variants. In this talk, I will describe a different strategy---a topic modeling algorithm that can analyze documents arriving in a stream and that does not require repeated views of the same document. An analysis of 3.3M articles from Wikipedia shows that the on-line approach fits topic models that are as good or better than those found with the traditional batch approach, and fits them in a fraction of the time.

(This is joint work with Matthew Hoffman and Francis Bach.)

Presentation (PDF File)

Back to Machine Reasoning Workshops III & IV: Mission-Focused Actions/Reactions Based on & System Integration of Information Derived from Complex Real-World Data (by invitation only)