
The Computational Auditory Scene Analysis Toolkit
The figure below shows a simple example of the CASA Toolkit being used to run a speech recognition experiment on the TIDigit corpora:
- On the right an editor window shows the script which describes the processing applied to each utterance in the corpora. A system, here called 'main', is constructed from a set of simpler blocks. In this case 4 blocks are used. These perform the following functions:
- AlienSampleInputFile - For reading the waveform file.
- Ratemap - For constructing an auditory `ratemap' representation.
- Display- For displaying the ratemap in an interactive 3D viewer.
- HMMDecoderStandard - For performing the Viterbi decoding i.e. the speech recognition.
- On the left you can see the result of the `Display' block. This is a contoured 3D view of a ratemap representation. The 64 filter channels are along one base axis and time is along the other.
- The console window at the bottom of the figure shows the output as the script is applied to the TIDigit corpus. The first column is the output of the decoder, the second column is the correct transcription. Recognition statistics are shown for each utterance and as a running total.