Stuart N WrigleyBSc(Hons) PhD MIET SMIEEE MAHEP
Operations and Business Development Manager, UKRI CDT in Speech and Language Technologies and their Applications

Speech Recognition by Dynamic Time Warping

Program Testing and Results

I apologise for the sheer size of this page - it has been hastily produced in Word 97 from the original document.

The Testing Strategy

Approximately 400 template files were made available for the recogniser. These were logically separated according to speaker. This allowed two types of tests to be carried out: speaker dependent and speaker independent. The terms speaker dependent and speaker independent need to be clarified:

In order to exercise the recogniser, the following method was used for each word test,

  1. Choose one of the templates of that word.
  2. Obtain a global cost measure for that template matched against every other template in the set (ie not matched against itself).
  3. Estimated word is template that produced the lowest global cost.
  4. Repeat for the remaining untested templates of that word

The set mentioned in step 2 is the set of templates to be used in the comparison process as outlined above in the definitions of speaker dependence and speaker independence.

The results for each test are presented in a confusion matrix showing the breakdown of how each word was recognised and the overall success in percent.

 

Comments on the results

As can be seen from the results below, the symmetrical recogniser faired almost perfectly. The only error that occurred in the speaker dependent tests was during the testing of the word four for speaker 3. Here, the recogniser only achieved 87.5%.

However, during the speaker independent testing, a recognition rate of 100% was achieved for definition 1. The more difficult task of recognition according to definition 2 faired reasonably well with an average recognition rate of approximately 93%.

The improved algorithm - asymmetric - performed equally well on the speaker dependent tasks. Although the recognition rate dropped marginally for definition 1, an improvement is evident for definition 2 of the speaker independent trials. The average recognition rate rose to over 96%. The algorithm was also marginally faster due to the reduction in computation required.

A further possible extension could be the ranking of results. In other words the top 3 (for example) estimates could be given. This would show that if yes was the first choice but the second and third were no, then it could be more likely that the word should be no. However, this brings its own difficulties and could not be investigated fully in the time available.

 

 

 

Recognition Results

Symmetric Dynamic Programming

Speaker Dependent

Speaker 1

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

7

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

7

               

100

2

         

6

             

100

3

           

5

           

100

4

             

6

         

100

5

               

8

       

100

6

                 

7

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

Speaker 2

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

8

               

100

2

         

8

             

100

3

           

8

           

100

4

             

8

         

100

5

               

8

       

100

6

                 

8

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

 

Speaker 3

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

8

               

100

2

         

8

             

100

3

           

8

           

100

4

       

1

   

7

         

87.5

5

               

8

       

100

6

                 

8

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

 

Speaker 5

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

7

               

100

2

         

7

             

100

3

           

7

           

100

4

             

7

         

100

5

               

7

       

100

6

                 

7

     

100

7

                   

7

   

100

8

                     

7

 

100

9

                       

7

100

 

 

Speaker Independent (Definition 1)

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

32

                       

100

no

 

31

                     

100

across

   

32

                   

100

down

     

32

                 

100

1

       

30

               

100

2

         

29

             

100

3

           

28

           

100

4

             

29

         

100

5

               

31

       

100

6

                 

30

     

100

7

                   

31

   

100

8

                     

31

 

100

9

                       

31

100

 

Speaker Independent (Definition 2)

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

29

                   

3

 

90.6

no

 

30

   

1

               

96.8

across

1

 

31

                   

96.9

down

     

25

4

3

             

78.1

1

       

30

               

100

2

         

29

             

100

3

         

5

23

           

82.1

4

             

29

         

100

5

               

31

       

100

6

                 

30

     

100

7

         

2

       

29

   

93.5

8

           

1

       

30

 

96.8

9

 

4

 

1

1

             

25

80.6

 

 

 

Asymmetric Dynamic Programming

Speaker Dependent

Speaker 1

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

7

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

7

               

100

2

         

6

             

100

3

           

5

           

100

4

             

6

         

100

5

               

8

       

100

6

                 

7

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

Speaker 2

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

8

               

100

2

         

8

             

100

3

           

8

           

100

4

             

8

         

100

5

               

8

       

100

6

                 

8

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

 

 

Speaker 3

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

8

               

100

2

         

8

             

100

3

           

8

           

100

4

       

1

   

7

         

87.5

5

               

8

       

100

6

                 

8

     

100

7

                   

8

   

100

8

                     

8

 

100

9

                       

8

100

 

 

Speaker 5

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

8

                       

100

no

 

8

                     

100

across

   

8

                   

100

down

     

8

                 

100

1

       

7

               

100

2

         

7

             

100

3

           

7

           

100

4

             

7

         

100

5

               

7

       

100

6

                 

7

     

100

7

                   

7

   

100

8

                     

7

 

100

9

                       

7

100

 

 

Speaker Independent (Definition 1)

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

32

                       

100

no

 

31

                     

100

across

   

32

                   

100

down

     

32

                 

100

1

       

30

               

100

2

         

29

             

100

3

           

28

           

100

4

       

1

   

28

         

96.6

5

               

31

       

100

6

                 

30

     

100

7

                   

31

   

100

8

                     

31

 

100

9

 

1

                   

30

96.8

 

Speaker Independent (Definition 2)

 

yes

no

across

down

1

2

3

4

5

6

7

8

9

%

yes

28

                   

4

 

87.5

no

 

31

                     

100

across

   

32

                   

100

down

     

30

               

2

93.8

1

       

30

               

100

2

         

29

             

100

3

         

2

26

           

92.9

4

             

29

         

100

5

               

31

       

100

6

                 

30

     

100

7

                   

31

   

100

8

           

3

       

28

 

90.3

9

 

5

                   

26

83.9