TANDEM MODELING INVESTIGATIONS
Dan Ellis
ICSI Berkeley / Columbia University
We are continuing our investigation of Tandem acoustic modeling,
in which a neural network, trained to context-independent phone
targets, is used as a feature preprocessor for conventional
Gaussian mixture-based speech recognition. I will describe
three recent areas of investigation:
-
Searching for the source of Tandem's benefits.
Tandem systems reduce word errors to as little as half those
seen in convenetional HTK systems. We investigate the suggestion
that the diversity of subword units - phones for the network and
word-specific states for the GMM-HMM system - is beneficial. In
fact, we find that training a network to 181 state labels from
an alignment by HTK (i.e. the same units as used in the GMM model)
results in slightly improved performance.
- Experiments with tandem-feature domain processing.
We have tried various combinations of deltas, per-utterance
normalization, and KLT rank reduction in between the net
output and the GMM input. We find that, for PLP features,
deltas before the KLT and normalization afterwards is markedly
helpful, whereas other variants give only slight improvements.
However, this result disappears for MSG and PLP+MSG combo systems.
- Tandem for larger-vocabulary tasks. Further experiments with our
tandem system for the NRL "SPINE" medium-vocab spontaneous-speech
task shows that the tandem system vastly improves results from
a context-independent version, but using (monophone-based) tandem
features with context-dependent modeling and MLLR almost completely
eliminates this advantage.
Jon Barker
Last modified: Mon Jan 29 17:40:56 GMT 2001