Computer Speech Recognition: Mimicking the Human System

Li Deng
Microsoft Research

The main goal of computer speech recognition/understanding is to automatically convert natural human speech into its corresponding text (and then into its meaning). While remarkable success, both technologically and commercially, has been achieved by straightforward statistical methods (e.g., hidden Markov modeling), solutions of the remaining problems leading to its ultimate success appear to require a deep understanding of human speech recognition mechanisms. This talk will analyze various human sub-systems, including motor-control, vocal tract (mouth), ears, auditory pathways, and auditory cortex (brain), working in synergy to accomplish the task of highly robust, low-error speech recognition/perception and understanding. How to abstract the essence of such human information processing power in building a computer system with similar (or better) performance? How can we build mathematical models to enable the development of algorithms and advanced machine-learning techniques that will run efficiently in a computer? How can we explore some special power of computing machines that human lacks in order to achieve super-human speech recognition? These are some of the issues to be addressed in this talk.

Presentation (PowerPoint File)

Back to Mathematics of the Ear and Sound Signal Processing