Speech Recognition by Dynamic Time Warping

Program Testing and Results

I apologise for the sheer size of this page - it has been hastily produced in Word 97 from the original document.

The Testing Strategy

Approximately 400 template files were made available for the recogniser. These were logically separated according to speaker. This allowed two types of tests to be carried out: speaker dependent and speaker independent. The terms speaker dependent and speaker independent need to be clarified:

speaker dependent: each word to be recognised is only compared with words spoken by the same speaker.
for the purposes of this assignment, the term speaker independent has been given two meanings,
1. each word to be recognised is compared with all the words available; ie words spoken both by the speaker of the test word and the other speakers.
2. each word to be recognised is compared with all the words spoken by different speakers; ie words spoken by the speaker of the test word are not used in the recognition process. This implies that the speaker is totally unknown to the system and as such the system has to estimate a word to its best ability using data from other speakers.

In order to exercise the recogniser, the following method was used for each word test,

Choose one of the templates of that word.
Obtain a global cost measure for that template matched against every other template in the set (ie not matched against itself).
Estimated word is template that produced the lowest global cost.
Repeat for the remaining untested templates of that word

The set mentioned in step 2 is the set of templates to be used in the comparison process as outlined above in the definitions of speaker dependence and speaker independence.

The results for each test are presented in a confusion matrix showing the breakdown of how each word was recognised and the overall success in percent.

Comments on the results

As can be seen from the results below, the symmetrical recogniser faired almost perfectly. The only error that occurred in the speaker dependent tests was during the testing of the word four for speaker 3. Here, the recogniser only achieved 87.5%.

However, during the speaker independent testing, a recognition rate of 100% was achieved for definition 1. The more difficult task of recognition according to definition 2 faired reasonably well with an average recognition rate of approximately 93%.

The improved algorithm - asymmetric - performed equally well on the speaker dependent tasks. Although the recognition rate dropped marginally for definition 1, an improvement is evident for definition 2 of the speaker independent trials. The average recognition rate rose to over 96%. The algorithm was also marginally faster due to the reduction in computation required.

A further possible extension could be the ranking of results. In other words the top 3 (for example) estimates could be given. This would show that if yes was the first choice but the second and third were no, then it could be more likely that the word should be no. However, this brings its own difficulties and could not be investigated fully in the time available.

Recognition Results
Symmetric Dynamic Programming
Speaker Dependent

Speaker 1

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	8													100
no		7												100
across			8											100
down				8										100
1					7									100
2						6								100
3							5							100
4								6						100
5									8					100
6										7				100
7											8			100
8												8		100
9													8	100

Speaker 2

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	8													100
no		8												100
across			8											100
down				8										100
1					8									100
2						8								100
3							8							100
4								8						100
5									8					100
6										8				100
7											8			100
8												8		100
9													8	100

Speaker 3

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	8													100
no		8												100
across			8											100
down				8										100
1					8									100
2						8								100
3							8							100
4					1			7						87.5
5									8					100
6										8				100
7											8			100
8												8		100
9													8	100

Speaker 5

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	8													100
no		8												100
across			8											100
down				8										100
1					7									100
2						7								100
3							7							100
4								7						100
5									7					100
6										7				100
7											7			100
8												7		100
9													7	100

Speaker Independent (Definition 1)

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	32													100
no		31												100
across			32											100
down				32										100
1					30									100
2						29								100
3							28							100
4								29						100
5									31					100
6										30				100
7											31			100
8												31		100
9													31	100

Speaker Independent (Definition 2)

	yes	no	across	down	1	2	3	4	5	6	7	8	9	%
yes	29											3		90.6
no		30			1									96.8
across	1		31											96.9
down				25	4	3								78.1
1					30									100
2						29								100
3						5	23							82.1
4								29						100
5									31					100
6										30				100
7						2					29			93.5
8							1					30		96.8
9		4		1	1								25	80.6

Asymmetric Dynamic Programming

Speaker Dependent