RESPITE: The CASA Toolkit Page: Documentation: Block Library Index:ColourMask

ColourMask

The ColourMask block performs a `pseudo-grouping' on a discrete 1/0 missing data mask to perform a set of numbered groups. The output is a time-frequency map in which each point is labelled with an integer label indicating the group to which it has been assigned. This map may be used as input to the multisource decoder.

The `pseudo-grouping' is based on locating and labelling contiguous regions of 1's in the missing data mask. Time-frequency points are considered contiguous if they are adjacent in either time or frequency (i.e. joined either `horizontally' or `vertically' but not `diagonally' in the 2-D time-frequency map). Each separated region is assigned a unique group number. As an option the missing data mask may be split into sub-bands which are treated independently when searching for contiguous regions.

The block may also perform additional group splitting based on optional pitch and voicing estimate inputs. If the voicing input is connected then groups will be split at frames where the voicing parameter crosses the VOICING_THRESHOLD (i.e. at such frames all existing groups will end and new groups will begin). If the pitch input is supplied then groups will also be split at voiced frames in which the pitch changes by more than the DELTA_PITCH_THRESHOLD. (If the voicing input is not connected then all frames will be treated as though they are voiced).

After the contiguous region grouping, a higher level of common onset/common offset grouping may be applied to the groups that have been located. In this case disconnected groups that start (or end) at the same time frame are merged into a common group.

The grouping algorithm is controlled by the following 9 parameters:

WINDOW_SIZE
To perform an exact labelling of each separate contiguous region of 1's in the 1/0 mask is only possible if the entire mask is visible. However, CTK blocks are designed to run in an `online' mode where data output `keeps up with' data input i.e. the colouring block does not wait until the end of the utterance to produce its result. The algorithm operates on a window of the data (by default 5 frames) and the grouping decisions and output will lag behind the input by the size of this window. As the algorithm cannot look ahead further than the size of this window errors can occur if two groups that appear separate in the window merge at a later time. Increasing the size of the window will make these errors less likely at the expense of a greater lag time in the output.
ONSET_GROUPING
If set to TRUE then common onset grouping is applied to merge separate groups that happen to start at the same time frame.
OFFSET_GROUPING
If set to TRUE then common offset grouping is applied to merge separate groups that happen to end at the same time frame.
Offset grouping is a little more complicated than onset grouping as it is by nature retroactive - the decision to merge two groups can not occur until both groups have ended. This means that by the time the decision to merge occurs, data frames containing the beginning of the groups may have been passed through the analysis window and out of the block. Once part of a group has been passed out of the colour block its label cannot be changed. So, in the current implementation offset grouping is only applied if at the point of offset at least one of the two groups has an onset within the analysis window (i.e. the group is fully contained within the window).
MIN_GROUP_SIZE
Missing data masks typically contain `speckled' regions which will be interpreted by the grouping algorithm as large numbers of groups each containing very few points. This proliferation of groups can cause performance problems during multisource decoding. As a way of avoiding this groups containing less than MIN_GROUP_SIZE time-frequency points are automatically merged into a common group. This group is always given the label `1'.
Note, if MIN_GROUP_SIZE is 1 or less then the feature has no effect.
MAX_GROUP_SIZE
If this feature is used then groups are allowed to grow from frame to frame only up to a maximum size of MAX_GROUP_SIZE points. At the first frame where MAX_GROUP_SIZE is exceeded the group is terminated and a new group is started (i.e. large groups are sliced up into short segments).
The feature is disabled when MAX_GROUP_SIZE is set to 0.
NUM_SUBBANDS
If NUM_SUBBANDS is set greater than 1 then the input data frames are divided up into subbands of equal width and the grouping algorithm is applied independently within each band. The full band labelling data is reconstituted from the parallel subband labellings on output.
HAS_DELTAS
The HAS_DELTAS switch has to be set so the block knows how to interpret the input data.
If the input missing data mask has deltas then HAS_DELTAS should be set to true. In this case the the grouping algorithm will be applied only to the non-delta features (i.e. the features with the lower half of the frame). Points in the delta mask (i.e. the upper half of the frame) will then be labelled with the group number of the corresponding non-delta spectro-temporal points.
VOICING_THRESHOLD
This is used in conjunction with the optional voicing input. If the voicing input crosses the voicing threshold then the voicing state changes and groups will be split.
DELTA_PITCH_THRESHOLD
This is used in conjunction with the optional pitch input. If the pitch input changes from one frame to the next by more than the delta pitch threshold, then groups will be split.

Inputs Meaning Sample 1-D frame $\ge$ 2-D frame

in1 1/0 missing data mask frames No Yes No

(in2) Degree of voicing Yes No No

(in3) Pitch estimate Yes No No

Inputs	Meaning	Sample	1-D frame	$\ge$ 2-D frame
`in1`	1/0 missing data mask frames	No	Yes	No
`(in2)`	Degree of voicing	Yes	No	No
`(in3)`	Pitch estimate	Yes	No	No

Outputs Meaning

out1 labelled group frames

Outputs	Meaning
`out1`	labelled group frames

Parameters Type Default Meaning

WINDOW_SIZE Integer 5 Number of frames in running buffer

ONSET_GROUPING Boolean False Perform common onset grouping ?

OFFSET_GROUPING Boolean False Perform common offset grouping ?

MIN_GROUP_SIZE Integer 0 (see above)

MAX_GROUP_SIZE Integer 0 0 = no max size (see above)

NUM_SUBBANDS Integer 1 Number of subbands

HAS_DELTAS Boolean False Should be set to TRUE if input data includes deltas

VOICING_THRESHOLD Float 0.5 Threshold for discriminating voiced/unvoiced frames

DELTA_PITCH_THRESHOLD Float 10 Max pitch change allowed before groups will be split

Parameters	Type	Default	Meaning
`WINDOW_SIZE`	Integer	5	Number of frames in running buffer
`ONSET_GROUPING`	Boolean	False	Perform common onset grouping ?
`OFFSET_GROUPING`	Boolean	False	Perform common offset grouping ?
`MIN_GROUP_SIZE`	Integer	0	(see above)
`MAX_GROUP_SIZE`	Integer	0	0 = no max size (see above)
`NUM_SUBBANDS`	Integer	1	Number of subbands
`HAS_DELTAS`	Boolean	False	Should be set to TRUE if input data includes deltas
`VOICING_THRESHOLD`	Float	0.5	Threshold for discriminating voiced/unvoiced frames
`DELTA_PITCH_THRESHOLD`	Float	10	Max pitch change allowed before groups will be split

Documentation for CTKv1.1.4 - Last modified: Thu Jun 28 12:12:04 BST 2001