Toward quantitative models of human speech perception and the application to noise-robust automatic speech recognition (ASR) systems

Abeer Alwan
UCLA

We are developing models of the unique human capacity for perceiving speech in noise. An application of these models is in the design noise-robust recognition systems and speech processors for cochlear implants that can help recover the ability to hear speech in noise lost with hearing impairment.


We will discuss several methods, inspired by auditory processing and acoustic phonetic knowledge, that we have developed over the past 10 years and include noise-robust front-end and back-end techniques. The techniques include variable frame-rate analysis, harmonic demodulation, peak isolation, and weighted Viterbi recognition.


When used in automatic word-recognition systems, and in comparison with commonly-used spectral processing of speech, our models significantly reduce the error rate of the recognition system in the presence of background noise.


References can be found at:


http://www.icsl.ucla.edu/~spapl/papers.html

Presentation (PDF File)

Back to Mathematics of the Ear and Sound Signal Processing