| RESPITE:Events : Meeting, Sep 2000:Presentations: Barker et al. |
In this paper we replace the discrete decision that a time-frequency pixel is reliable or unreliable with an estimate of the probability that the data is reliable. We adapt the probability calculation to use this estimate as weighting factors for term (1) and term (2) for each vector component. weighting factor is expressed using a sigmoidal transformation of the reliability estimate.
This soft decision approach integrates smoothly with algorithms which derive a continuous-valued estimate of the degree to which a local spectro-temporal region belongs to a group. For example, grouping of frequency channels based on common periodicity no longer require hard decisions such as the presence/absence of a peak in the channel autocorrelation function at a given lag [3], but can use the value of the channel autocorrelation function itself as a reliability estimate.
We also outline three additional improvements to missing data work previously
published: we make use of temporal constraints, word-boundary penalties
and improved silence modelling. These are shown to enhance performance
compared to those reported in [2], and the use of soft decisions provides
a further significant gain at low SNRs. For example, on speech from the
TIDIGIT database contaminated by factory noise we obtain the following
word accuracies over a range of SNRs, using models trained on clean speech:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Improvements as above |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[1] A. Vizinho, P.D. Green, M.P. Cooke and L. Josifovski (1999),
'Missing data theory, spectral subtraction and signal-to-noise estimation
for robust ASR: An integrated study', Proc. Eurospeech 99, pp 2407-2410.
[2] M.P. Cooke, P.D. Green, L. Josifovski and A. Vizinho,
'Robust automatic speech recognition with missing and unreliable acoustic
data', to appear in Speech Communication.
[3] R. Meddis & M. Hewitt (1992), 'Modelling the identification
of concurrent vowels with different fundamental frequencies', JASA, 91:233-245.