RESPITE: The CASA Toolkit Page: Documentation: Block Library Index:HMMDecoderStandard

HMMDecoderStandard

The HMMDecoderStandard block performs HMM Viterbi decoding on a stream of input feature vectors given a set of HMM models.

As a by-product the decoder outputs a stream of state-likelihood frames (out1. Each frame consists of the likelihood of each model state having generated the corresponding input feature frame. Within these frames the state likelihoods occur in the same order in which the states are defined in the HMM definition file.

The operation of the decoder is specified by the following set of parameters:

HMM_FILE
The HMM_FILE parameter specifies the name of a file that associates HMM definitions with HMM NAMEs. This file can have one of two possible formats, depending on whether the HMMs are stored in a single file or are stored separately:
- Single file: HMM_FILE refers to an HTK MMF (Multiple Model File). This is a single file containing the definition of all the HMMs. For each HMM there is a quoted NAME
- Separate files: HMM_FILE refers to a file list. This is a file containing the list of the names of individual files which define individual HMMs in HTK format. Each line of the list consists of the HMM file name, optionally followed by the NAME to be assigned to the HMM. If no NAME is supplied, the HMM NAME is taken to be the same as the HMM's HTK file name without the path. e.g. The HMM_FILE may contain lines like the following:
  /home/jon/hmms/one.3mix one
  /home/jon/hmms/two.3mix two
  /home/jon/hmms/three.3mix three
TRANSCRIPTION
This is a string parameter that gives the correct transcription for the utterance to be recognised. The transcription must be encoded as a sequence of single character labels which correspond to the character labels for the correct sequence of models.
LABEL_FILE
This parameter specifies the name of a file which associates HMM NAMEs with HMM LABELs. Whereas each HMM must have a unique NAME, several HMMs can share the same LABEL. e.g. there may be both a male and female version of the digit one with NAMEs "one_m" and "one_f" both having the LABEL "1".
Each line of the file defines a separate LABEL. The LABEL occurs as the first character on the line and is followed by the NAME of each HMM that shares this LABEL. e.g:
1 one_m one_f
2 two_m two_f
S sil sp
etc.
GRAMMAR_FILE
This parameter specifies the name of a file containing the grammar to be applied to the set of models.
The GRAMMAR_FILE specifies the grammar in terms of the NAMEs of the individual HMMs. The format is the same as that used in version 1.x of HTK. For more details see here.
If no GRAMMAR_FILE is specified, then all the models are placed in a simple loop grammar. i.e. any model can follow any other model.
SILENCE
The SILENCE parameter is a string composed of the labels for all the models that are to be regarded as silence. These labels will be removed from the transcription and the recognition hypothesis before the recognition statistics are calculated, e.g. if the SILENCE parameter is set to ``s" and the recognition output is ``s1s2s" then this will be treated as ``12" when scoring the correctness and accuracy.
HAS_DELTAS
The HAS_DELTAS parameter is a boolean switch that when turned on informs the decoder that the input data contains `delta' parameters i.e. if the input data is a vector of size 64, then the 1st 32 elements are treated as static features, and the 2nd 32 elements are the corresponding delta features. This switch only makes a difference when the decoder is using a probability calculation for which deltas features are handled differently to static features. For example, when using bounded marginalisation the bounds constraint is only applied to missing static features, not missing delta features.
USE_DELTAS
The decoder will normally use the deltas if they are supplied. However, if the USE_DELTAS switch is set to FALSE then deltas will be ignored. If left unset then USE_DELTAS will take the value of HAS_DELTAS i.e. they are used if present. (Note, it is an error to have HAS_DELTAS as FALSE and USE_DELTAS as TRUE.)
LOG_FILE
The LOG_FILE parameter specifies the name of an optional recognition log file to which the recognition statistics will be sent. If the file does not already exist it will be created. If it does exist then the statistics will be appended to it. If no LOG_FILE parameter is specified, or if LOG_FILE is set to the empty string (i.e. LOG_FILE=""), then the recognition statistics will be sent to stdout.
LOG_FILE_2
The LOG_FILE_2 parameter specifies the name of an additional recognition log file to which detailed per utterance information about the results of the decoding will be sent. This file is in XML format and a corresponding DTD file can be found in $CTKROOT/src. If LOG_FILE_2 is not set then the additional log file will not be generated.
OUTPUT_CONFUSIONS
If OUTPUT_CONFUSIONS is set to true then the recognition statistics will include a confusion matrix.
NBEST
The decoder can produce approximate N-best lists. The NBEST parameter determines the size of the N-best list to produce. By default NBEST is set to 1 and only the highest scoring hypothesis is considered.
The N-best lists are computed using the approximate lattice N-best algorithm (see Schwartz and Austin, ICASSP `91 for details).
WORD_PENALTY
The WORD_PENALTY is added to the score of a token as it passes out of the final state of a model. By default this penalty is set to 0.0, but if the recogniser is making excessive insertion errors then the recognition accuracy can sometimes be improved by setting the penalty to a positive value. It has been found that this penalty can greatly improve results when performing missing data recognition (see next section). The appropriate value to use is best determined empirically.
MAX_APPROX
The MAX_APPROX is a boolean parameter that can be set to true to offer a small increase in speed in the probability calculation when using multiple mixture models. If this is set to true then rather than summing the likelihood contributions of each Gaussian mixture, the overall likelihood is estimated by taking it to be the likelihood of the mixture with the biggest (i.e. the maximum) likelihood. This approximation is normally very close as there is typically a difference of several orders of magnitude between the likelihoods of the most likely mixture and even the 2nd most likely.
HYPOTHESIS_FILTER
This string is a regular expression that is used as an `hypothesis filter'. When the filter is used, the the decoder will reject any hypotheses which match the regular expression and will scan down a 50-best list to find the highest ranking hypothesis that does not match the filter. If the list contains no compliant hypotheses, the decoder reverts to accepting the originally selected best hypothesis.
FIRST_TOKEN
The FIRST_TOKEN is a string parameter that supplies the label name of a forced first token, i.e. when this parameter is set the decoding is forced to start with the given model. This is typically used to force a decoding to start with the silence model, e.g. FIRST_TOKEN="S".
If the FIRST_TOKEN parameter is not set then the decoding can start with any token.
FINAL_TOKEN
The FINAL_TOKEN is a string parameter that supplies the label name of a forced final token, i.e. when this parameter is set the decoding is forced to end with the given model. This is typically used to force a decoding to end with the silence model, e.g. FINAL_TOKEN="S".
If the FINAL_TOKEN parameter is not set then the decoding can end with any token.
STATE_PATH
If the STATE_PATH switch is set to true then the decoder will record the state path that the winning hypothesis has taken through each model. The frame by frame state occupancy will be output to LOG_FILE_2. Note, recording this information requires some computational overhead, so if it is not required the STATE_PATH switch should be turned off.
DUMP_PARAMETERS
This is a boolean parameter that if set to TRUE causes a record of the settings of the decoder parameters to be written at the end of the log file. By default DUMP_PARAMETERS is FALSE.

Inputs Meaning Sample 1-D frame $\ge$ 2-D frame

in1 feature vectors No Yes No

Inputs	Meaning	Sample	1-D frame	$\ge$ 2-D frame
`in1`	feature vectors	No	Yes	No

Outputs Meaning

out1 state likelihoods

out2 state max mixture label

Outputs	Meaning
out1	state likelihoods
out2	state max mixture label

Parameters Type Default Meaning

LOG_FILE String - Name of an optional log file

LOG_FILE_2 String - Name of additional detailed log file

WORD_PENALTY Float 0.0 The creation penalty

HMM_FILE String - Name of the HMM file list

GRAMMAR_FILE String - File storing the grammar

LABEL_FILE String - File storing HMM NAME-> HMM LABEL mapping

FIRST_TOKEN String - Label of a fixed first token

FINAL_TOKEN String - Label of a fixed final token

TRANSCRIPTION String - The correct transcription

SILENCE String "" The silence label(s)

MAX_APPROX Boolean False Use max mixture approximation

NBEST Int 1 Return best N hypotheses

STATE_PATH Boolean False Record HMM state path

HAS_DELTAS Boolean False Models have delta parameters

USE_DELTAS Boolean - Models have delta parameters

HYPOTHESIS FILTER String "" Regular expression for filtering hypotheses

OUTPUT_CONFUSIONS Boolean False Output confusion matrix

DUMP_PARAMETERS Boolean False Write parameters to log file

Parameters	Type	Default	Meaning
`LOG_FILE`	String	-	Name of an optional log file
`LOG_FILE_2`	String	-	Name of additional detailed log file
`WORD_PENALTY`	Float	0.0	The creation penalty
`HMM_FILE`	String	-	Name of the HMM file list
`GRAMMAR_FILE`	String	-	File storing the grammar
`LABEL_FILE`	String	-	File storing HMM NAME-> HMM LABEL mapping
`FIRST_TOKEN`	String	-	Label of a fixed first token
`FINAL_TOKEN`	String	-	Label of a fixed final token
`TRANSCRIPTION`	String	-	The correct transcription
`SILENCE`	String	""	The silence label(s)
`MAX_APPROX`	Boolean	False	Use max mixture approximation
`NBEST`	Int	1	Return best N hypotheses
`STATE_PATH`	Boolean	False	Record HMM state path
`HAS_DELTAS`	Boolean	False	Models have delta parameters
`USE_DELTAS`	Boolean	-	Models have delta parameters
`HYPOTHESIS FILTER`	String	""	Regular expression for filtering hypotheses
`OUTPUT_CONFUSIONS`	Boolean	False	Output confusion matrix
`DUMP_PARAMETERS`	Boolean	False	Write parameters to log file

Documentation for CTKv1.1.4 - Last modified: Mon Jul 2 18:15:59 BST 2001