STARDUST

Speech Recognition for People with Severe Dysarthria

Background

Dysarthria is the most common acquired speech disorder affecting 170 per 100,000 population. In its severest form dysarthric speech is unintelligible to others and may take the form of producing vocal utterances, rather than words recognisable to unfamiliar communication partners. Some people with dysarthric speech are also severely motor-impaired with limited or no control of their local environment. The combination of speech and general physical disability can make it particularly problematic for them to interact in their environment and limits independence.
With the development of speech recognition systems and an increased awareness of the needs of people with dysarthric speech, there is a clear opportunity to develop systems for such users to enable them to gain greater independence and control of their lives. Modern automatic speech recognition is based on training statistical models of the acoustic manifestation of speech units (words, phones or context-dependent phones), using pre-recorded speech databases [Gold and Morgan 1999]. Good performance can be achieved when the speech to be recognised is sufficiently similar to that used in training. However, automatic speech recognition performance remains brittle when compared to the human ability to deal with abnormal or distorted speech: word error rates in such conditions are typically an order of magnitude worse for the machine [Lippmann 1996]. For this reason, we propose to develop tools which, for dysarthric speech, will narrow the gap between training data and speech data.
Large vocabulary consumer automatic speech recognition systems have been used for people with mild and moderate dysarthria as a means of inputting text, but there is a lack of consensus over whether these systems are appropriate for people with severe dysarthria [e.g.Bowes 1999, Rosengren et al 1995]. Research evidence is scarce. Most reports are single case studies or are anecdotal and there are difficulties in comparing results between reports due to uncertain definitions of severity of dysarthria and a lack of detail about methods employed. There is consensus, however, that intensive training is the key to improving recognition performance [e.g. Bowes 1999, Arnold 1999, Hawley and Zahid 1999].
A number of speaker independent speech recognition algorithms have been reported which have been developed with the aim of improving the recognition of dysarthric speech patterns [Deller 1991, Jayaram 1995] but these have not appeared in a form which can be used by disabled people. The applicants research shows that, although large vocabulary recognition systems can be trained to recognise speech which is reasonably close to normal models, speech which is severely impaired achieves relatively low recognition rates, and is therefore unsuitable for control applications [Hawley and Zahid 1999].
We are planning a two-pronged approach to solving the problem of reliable use of speech as a control method for people with severe dysarthria:

To develop a computerised training package which will assist dysarthric speakers to improve the consistency of their vocalisations.
To develop a speech recognition system, which has greater tolerance to variability of speech utterances.