| RESPITE:Events : Meeting, Sep 2000:Presentations: Stéphane Dupont |
To remove this drawback, we propose to use the multi-band architecture. This is based on the observation that, if we consider narrow frequency bands, the noises inside the bands practically differs only by their energy level, not by their spectral shape. Therefore, if the models associated with each frequency band are trained on data corrupted by any kind of noise at different SNR, we can expect, if the frequency bands are narrow enough, that they are insensible to other kinds of noise.
For each frequency band, we develop a system to estimate noise-robust acoustic features. These features are computed from parameters specific to the frequency band (as critical bands energies inside the frequency band). We train, for each band, a MLP on data corrupted by white noise at different SNR. These MLPs can produce acoustic features according to the non-linear discriminant analysis (NLDA) technique [1] (output of the last hidden layer). These robust acoustic features are then concatenated and passed through the recognition system (an hybrid HMM/MLP in our case).
This approach ('new - 7 bands' in the tables) has been compared to the
baseline system (log-RASTA), J-RASTA, Spectral subtraction, J-RASTA multi-band
(4 bands), Spectral subtraction multi-band (4 bands) on Numbers'95. Results
are the average on six kinds of noises (gaussian white noise, Noisex helicopter
noise, Madras car noise, Daimler inside car noise, shopping mall, exhibition
hall). For each method we used the configuration that led to the best performance
(J value, noise level estimation method, number of sub-bands, ...):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Word error rate on Numbers'95. Average on 6 kinds of noises. Different methods
Next table shows the influence of the number of sub-band on the performance
of the new approach. Note that the '1-band' system is a bit different
from the baseline system in the sense that robust features were extracted
from the spectral features using NLDA (system trained on white noise)
and then passed to the recognizer.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Word error rate on Numbers'95. Average on 6 kinds of noises. Different number of sub-bands.
This table shows results on Resource Management using different methods.
Speech was corrupted by the Noisex helicopter noise. The last row correspond
to a hybrid system trained on this particular noise at different SNR.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Word error rate on Resource Management. Noisex helicopter noise. Different methods.