From Text to Concepts at Google

Brian Milch
Google Inc.

A number of Google services need to be able to tell when two pieces of text -- such as search queries, web pages, or ads -- are about the same thing. Looking for keyword matches gets us part of the way there, but does not account for synonyms or words with more than one meaning. This talk will describe Rephil, a system developed at Google to identify concepts that underlie the words in a text. Rephil determines, for example, that "apple pie" falls under some of the same topics as "chocolate cake", but has little in common with "apple ipod". The concepts used by Rephil are not pre-specified; instead, they are derived by a statistical machine learning algorithm running on massive amounts of text. I will discuss the structure of Rephil models, the distributed learning algorithm that we use to build these models from terabytes of data, and the inference algorithm that we use to identify concepts in new texts under tight time constraints.
Brian Milch is currently a software engineer at Google LA. He received a bachelor's degree in Symbolic Systems from Stanford in 2000, and a Ph.D. in computer science from UC Berkeley in 2006. He then completed a two-year post-doctoral position in the Computer Science and Artificial Intelligence Laboratory at MIT. He does research in artificial intelligence, machine learning, and computational game theory.


Back to Graduate Summer School: Deep Learning, Feature Learning