Language Modeling with the Maximum Likelihood Set: Complexity Issues and the Back-off Formula

Damianos Karakos
Johns Hopkins University
Center for Language and Speech Processing

The Maximum Likelihood Set (MLS) was recently introduced in [Jedynak-Khudanpur05] as an effective, parameter-free technique for estimating a probability mass function (pmf) from sparse data. The MLS contains all pmfs that assign merely a higher likelihood to the observed counts than to any other set of counts, for the same sample size. In this talk, we show how the MLS can be extended to the case of conditional density estimation. First, it is shown that, when the criterion for selecting a pmf from the MLS is the KL-divergence, the selected conditional pmf naturally has a back-off form, except for a ceiling on the probability of high frequency symbols that are not seen in particular contexts. Second, the pmf has a sparse parameterization, leading to efficient algorithms for KL-divergence minimization. Experimental results from bigram and trigram language modeling indicate that pmfs selected from the MLS are competitive with state-of-the-art estimates.

Audio (MP3 File, Podcast Ready) Presentation (PDF File)

Back to Long Programs