RESPITE: Annual Report 2000: Scientific Highlights: Identifying Reliable Information

Identifying Reliable Information

Dynamic Combination Weights

A new Maximum Likelihood approach for dynamic expert weights estimation has recently been tested. Tests with narrowband noise show that the weights estimated look reasonable. In these experiments the fullband data likelihood was modelled as a simple linear weighting of subband data likelihoods.

In the first experiment the fullband data likelihood (maximised) for each phoneme was expressed as a linear combination of the 4 likelihoods from 1-subband likelihoods and the fullband data likelihood. Separate combination weights are estimated for each of 27 phonemes.

In recognition these weights are estimated dynamically, but in the figure below they are estimated over the full Numbers95 connected-digits training data set.

The figure shows the estimated weights for each phoneme in clean speech. By clicking on the buttons above you can see the effect on the weights when they are reestimated with noise added to each frequency band in turn. The weights for the noisy band are reduced.

The same approach will be extended in future to use more accurate subband combination likelihood combination rules.

Estimating SNR

From noisy training data we can estimate joint probability distributions of the form, P(SNR, Observable). We can then employ these distributions in conjunction with a measurement of the Observable to calculate a reliability estimate of the form P(SNR>T|Observable). A reliability estimate of this for has many potential uses, for example Fuzzy Missing Data.

The 2-dimensional histogram below shows the distribution obtained when the observable is the harmonicity index (see Berthommier and Glotin, 99).

Click on thumbnail to see full sized image.

The technique is quite general, and can, for example, also be successfully applied using a localisation cue (see Glotin et al., 1999).

A Pyschoacoustic Study: Investigating Cues For Consonant Identification

The goal of this experiment was to extend Shannon's experiment (R.V. Shannon and al. Speech Recognition with Primarily Temporal Cues, Science, 1995). In their study, they showed that, by varying spectral and temporal resolution : two consonant features -voicing and manner- were preserved at very low spectral resolution, information transmission of consonantal place of articulation was increased with spectral resolution. By adding a temporal masker on Shannon's residual signal, we aimed to understand the transmission of residual temporal and spectral information.

Results

We confirm the main findings of Shannon et al.:

consonant perception doesn't seem to be affected by fine temporal reduction when minimal spectral information is present in the signal,
consonant place of articulation seems to be a primarily spectral cue.

Our experiment further suggests that:

voicing is a robust consonant feature which depends on both categories of information: slow temporal envelope signal and spectral information.
manner is mainly related to temporal envelope information.

This experiment supports the hypothesis that consonant identification is a complex process which can compensate for the reduction of temporal or spectral information by the use residual information: consonant perception is a robust process which can make use both spectral and temporal cues.

Click on thumbnail to see full sized image.

For further details of this work see Grosgeorges et al., 2000.