Language Modeling Experiments with Random Forests

Frederick Jelinek
Johns Hopkins University
The Center for Language and Speech Processing

L. Breiman recently introduced the concept of random forests (randomly constructed collection of decision trees) for classification. We have modified the method for regression and applied it to language modeling for speech recognition. Random forests achieve excellent results in both perplexity and error rate. They can be regarded as a language model in HMM form and have interesting properties that achieve very robust smoothing.


Bio: Dr. Jelinek obtained an S.B. degree in 1956 from Massachusetts Institute of Technology and an S.M. degree in 1958, both in Electrical Engineering. He went on to receive his PhD in from MIT in 1962. In 1968 Dr. Jelinek joined the IBM T. J. Watson Research Center as a research staff member. By 1972 Dr. Jelinek had become the manager of the large Continuous Speech Recognition group. There he pioneered, with his colleagues the statistical methods that are the basis of current state-of-the art speech recognizers. He stayed at IBM until 1993 when he moved to Johns Hopkins University in Baltimore. Dr. Jelinek’s was an Instructor at MIT (1956-1962), and a Visiting Lecturer at Harvard (1962). After receiving his PhD degree in 1962 with a thesis on, two-way channel communication, Dr. Jelinek continued his teaching career at Cornell (1962 – 1974) as a Professor of Electrical Engineering. He joined the faculty at Johns Hopkins University in 1993, and at the same time he established the Center for Language & Speech Processing. Dr. Jelinek is currently a Julian S. Smith Professor of Electrical Engineering at Hopkins, as well as the Director of the Center for Language & Speech Processing.

Audio (MP3 File, Podcast Ready) Presentation (PowerPoint File)

Back to Document Space