STARDUST
Speech Recognition for People with Severe Dysarthria
Background
Dysarthria is the most common acquired speech disorder
affecting 170 per 100,000 population. In its severest form dysarthric speech
is unintelligible to others and may take the form of producing vocal utterances,
rather than words recognisable to unfamiliar communication partners. Some
people with dysarthric speech are also severely motor-impaired with limited
or no control of their local environment. The combination of speech and
general physical disability can make it particularly problematic for them
to interact in their environment and limits independence.
With the development of speech recognition systems
and an increased awareness of the needs of people with dysarthric speech,
there is a clear opportunity to develop systems for such users to enable
them to gain greater independence and control of their lives. Modern automatic
speech recognition is based on training statistical models of the acoustic
manifestation of speech units (words, phones or context-dependent phones),
using pre-recorded speech databases [Gold and Morgan 1999]. Good performance
can be achieved when the speech to be recognised is sufficiently similar
to that used in training. However, automatic speech recognition performance
remains brittle when compared to the human ability to deal with abnormal
or distorted speech: word error rates in such conditions are typically
an order of magnitude worse for the machine [Lippmann 1996]. For this reason,
we propose to develop tools which, for dysarthric speech, will narrow the
gap between training data and speech data.
Large vocabulary consumer automatic speech recognition
systems have been used for people with mild and moderate dysarthria as
a means of inputting text, but there is a lack of consensus over whether
these systems are appropriate for people with severe dysarthria [e.g.Bowes
1999, Rosengren et al 1995]. Research evidence is scarce. Most reports
are single case studies or are anecdotal and there are difficulties in
comparing results between reports due to uncertain definitions of severity
of dysarthria and a lack of detail about methods employed. There is consensus,
however, that intensive training is the key to improving recognition performance
[e.g. Bowes 1999, Arnold 1999, Hawley and Zahid 1999].
A number of speaker independent speech recognition
algorithms have been reported which have been developed with the aim of
improving the recognition of dysarthric speech patterns [Deller 1991, Jayaram
1995] but these have not appeared in a form which can be used by disabled
people. The applicants research shows that, although large vocabulary recognition
systems can be trained to recognise speech which is reasonably close to
normal models, speech which is severely impaired achieves relatively low
recognition rates, and is therefore unsuitable for control applications
[Hawley and Zahid 1999].
We are planning a two-pronged approach to solving
the problem of reliable use of speech as a control method for people with
severe dysarthria:
-
To develop a computerised training package which
will assist dysarthric speakers to improve the consistency of their vocalisations.
-
To develop a speech recognition system, which has
greater tolerance to variability of speech utterances.