Using the CHiME corpus for speech recognition evaluation

The CHiME background recordings and impulse response measurements allow for the construction of natural and yet controllable speech recognition evaluation data. In the following examples target speech data is taken from the Grid corpus of command phrases -- exemplified by phrases such as "place green on B 6 please". The Grid corpus was recorded in an acoustic booth, but by using the carefully measured binaural room impulse responses (BRIRs) supplied with the CHiME corpus, we are able to add the reverberation effects of the CHiME room to the original utterances.

The following audio examples illustrate this process:

i/ we start with a Grid utterance, e.g. "bin red at Q 2 again";

ii/ we filter and reverberate to simulate the effect of it having been spoken in a CHiME room -- in this case, the lounge at a distance of 200 cm and an azimuthal angle of 0 degrees;

iii/ finally we mix with recordings that have been made in the room. In this final example two utterances were spoken in the room and one was artificially added. Can you tell which?

Differing degrees of noise

Target utterances can be mixed with different portions of the CHiME background to achieve a precisely controllable range of SNRs:

-6 dB

0 dB

6 dB

12 dB

Unsegmented audio

In the following examples, Grid target utterances have been mixed at unpredictable positions in continuous five minute segments of Grid background recording.

Fairly quiet morning

Mainly speech-on-speech, two people

Mainly speech-on-speech, four people

Speech, music etc

The ultimate aim of the CHiME project will be to reliably recover commands given unsegmented auido similar to the above examples.