| RESPITE: The CASA Toolkit Page: Documentation: Block Library Index:HMMDecoderMDSoft |
The HMMDecoderMDSoft is a version of HMMDecoderMD that has been generalised to accept a `soft mask'. Whereas the mask input for HMMDecoderMD should contain only 0's and 1's, the mask input for HMMDecoderMDSoft may contain values in the inclusive range 0.0 to 1.0. The values in the mask are interpreted as the probability that the data is reliable.
For full details of how the soft masks are employed in missing data speech recognition:
As part of its probability calculation the HMMDecoderMDSoft needs to perform a bounded-marginalisation (see HMMDecoderMD). The block therefore requires lower and upper bounds for the missing data values. These bounds are supplied via the block inputs in3 and in4. The lower bound input (in3) is optional, and if not supplied the lower bound will default to 0.
It is difficult to approximate good missing data bounds for delta features, so, by default, the HMMDecoderMDSoftdoes not apply the soft mask technique to the delta features. Instead, delta feature are treated in the same manner as they are by the discrete mask HMMDecoderMD decoder - i.e. they are either taken to be fully present or fully missing, and missing values are considered unbounded. Consequently, the elements of the missing data mask corresponding to the delta features should be 0 or 1. However, the decoder can be forced to treat delta features using the same soft mask technique as applied to the static features by setting USE_DELTA_BOUNDS to true. This may give better results if good lower and upper delta feature bounds are available.
The HMMDecoderMDSoft has the same parameters as the HMMDecoderMD, except: i) there is no USE_BOUNDS parameter (bounded-marginalisation is an integral part of the soft mask technique and bounds for the static features are always used), and ii) there is an additional parameter, ONE_ZERO_ROUNDING, which is described further below.
This parameter specifies a tolerance (that should be in the range 0.0 to 0.5) within which values in the soft mask will be rounded to the discrete values 0.0 and 1.0. i.e. if the parameter is set to 0.1 then values in the range [0.0,0.1] will be rounded down to 0.0 and values in the range [0.9,1.0] will be rounded up to 1.0.
Typically only a small proportion of the points in a soft mask are effectively soft, the others are very close to the discrete missing (0.0) and present (1.0) values. By approximating the points close to 0 and 1 as actually being 0 and 1 the probability calculation may be speeded up. Setting the ONE_ZERO_ROUNDING parameter to a small number e.g. 0.01 may result in faster decoding without any loss in performance. [However, in practice I've found performance tends to deteriorate even with very small tolerances - and the increased decoding speed is marginal! I've left the feature in for future experimentation.]
By default ONE_ZERO_ROUNDING is set to 0.0 - this effectively turns the feature off.
Note, that if a discrete mask consisting of only 0's and 1's is passed to HMMDecoderMDSoft it should produce exactly the same results as HMMDecoderMD. However, due to the way in which the probability calculation has been generalised, it is more efficient to use HMMDecoderMD for discrete masks .
As with the other decoders, HMMDecoderMDSoft outputs a stream of state-likelihood frames. Each frame consists of the likelihood of each model state having generated the corresponding input feature frame. Within these frames the state likelihoods occur in the same order in which the states are defined in the HMM definition file. (The mixture label frames (out2) indicate the integer label of the winning mixture for each state).
| Inputs | Meaning | Sample | 1-D frame | |
|---|---|---|---|---|
| in1 | feature vectors | No | Yes | No |
| in2 | fuzzy data mask | No | Yes | No |
| (in3) | lower bound | No | Yes | No |
| in4 | upper bound | No | Yes | No |
| Outputs | Meaning |
|---|---|
| out1 | state likelihoods |
| out2 | state max mixture label |
| Parameters | Type | Default | Meaning |
|---|---|---|---|
| LOG_FILE | String | - | Name of an optional log file |
| LOG_FILE_2 | String | - | Name of additional detailed log file |
| WORD_PENALTY | Float | 0.0 | The creation penalty |
| HMM_FILE | String | - | Name of the HMM file list |
| GRAMMAR_FILE | String | - | File storing the grammar |
| LABEL_FILE | String | - | File storing HMM NAME-> HMM LABEL mapping |
| FIRST_TOKEN | String | - | Label of a fixed first token |
| FINAL_TOKEN | String | - | Label of a fixed final token |
| TRANSCRIPTION | String | - | The correct transcription |
| SILENCE | String | "" | The silence label(s) |
| MAX_APPROX | Boolean | False | Use max mixture approximation |
| NBEST | Int | 1 | Return best N hypotheses |
| STATE_PATH | Boolean | False | Record HMM state path |
| HAS_DELTAS | Boolean | 0 | Models have delta parameters |
| USE_DELTAS | Boolean | - | Models have delta parameters |
| HYPOTHESIS FILTER | String | "" | Regular expression for filtering hypotheses |
| OUTPUT_CONFUSIONS | Boolean | 0 | Output confusion matrix |
| DUMP_PARAMETERS | Boolean | 0 | Write parameters to log file |
| USE_BOUNDS | Boolean | False | Use bounded marginalisation |
| ONE_ZERO_ROUNDING | Float | 0.0 | Tolerance within which to round mask to 0 or 1 |