| RESPITE:Events : Meeting, Sep 2000:Presentations: Glotin & Berthommier |
Information about speech reliability can be extracted and then integrated in a recogniser by various means. The full combination (FC) approach allows the weighting of the posterior values estimated locally in the time frequency representation, according a speech reliability measure. Since most of the speech segments are voiced, we use a method exploiting the harmonicity of speech to derive these weights. We test this method together with the direct integration of the a priori SNR. Then, we run speech recognition with different kind of weighting functions. The weights are continuous or binary values. This corresponds to a soft or to a hard decision function about the speech reliability, which is derived from an observable harmonicity index. Using a binary decision process, the effect is, for each time frame, to collapse the set of combinations of sub-bands into a single combination. On the other hand, we substitute empirical values to these terms, including functions of the a priori SNR, which are continuous or discrete, but not based on a probabilistic estimation. We establish the average scores in % WER for a panel of noises at different levels, stationary or not, narrow-band or wide-band. All these functions are found to be sub-optimal comparatively to the constant weighting, but a robustness of the FC for narrow-band noises is observed.