| |
Speech Technology and Research (STAR) Laboratory Seminar Series
Past talks: 2006
-
Speaker: Fei Sha,
Computer Science Division, UC Berkeley
Time: Wednesday, Oct. 25, 2006, 10:30 am
Venue: STAR Lab, EJ 124
Title: Large margin approaches for automatic speech recognition
Abstract:
Most modern speech recognizers are based on continuous-density hidden
Markov models (CD-HMMs). The hidden states in these CD-HMMs model
different phonemes or sub-phonetic elements, while the observations
model cepstral feature vectors. Distributions of cepstral feature
vectors are most often represented by Gaussian mixture models (GMMs).
The accuracy of the recognizer depends critically on the careful
estimation of GMM parameters.
The most basic approach involves maximum likelihood (ML) estimation.
The main attraction of the EM algorithm is that no free parameters need
to be tuned for its convergence. However, in general, maximum
likelihood training criteria do not optimize classification error rates
directly. In many cases, alternative training criteria which track
error rates more explicitly, tend to perform better. Two well-known
examples are discriminative methods like conditional maximum likelihood
(CML)/maximum mutual information (MMI) and minimum classification
errors (MCE).
In this talk, I will present a new framework of discriminative training
called large margin hidden Markov models. Inspired by the principles of
large margin, a well-studied statistical learning framework, the large
margin HMMs parameter estimation techniques learn parameters by
separating correct labeling sequence from incorrect labeling sequences
by a large margin. The large margin is directly proportional to the
number of labeling mistakes. The training is cast as a convex
optimization which maximizes the margins.
I will describe the framework and the training algorithm of the large
margin HMMs. I will also present experimental results of applying this
training criteria to building phoneme recognizers. We found
significantly improved phoneme recognition accuracy on the TIMIT speech
corpus. We also systematically compared to other leading discriminative
training methods. We found greater error reduction from baseline
systems than both CML and MCE.
Joint work with Dr. Lawrence K. Saul (U. of California, San Diego).
References:
Fei Sha and Lawrence K. Saul (2006).
Large margin Gaussian
mixture models for automatic speech recognition. To appear in Neural
Information Processing Systems Conference 2006 (Vancouver, CA).
Fei Sha and Lawrence K. Saul (2006).
Large margin Gaussian
mixture modeling for phonetic classification and recognition.
Proc. of ICASSP 2006, Tolouse, France.
Fei Sha and Lawrence K. Saul (2007).
Comparison of large margin
training to other discriminative methods for phonetic recognition by
hidden Markov models. Submitted to ICASSP 2007.
|
|