STARDUST
Speech Recognition for People with Severe Dysarthria
Research Plan
This will be a three year project requiring a multi-disciplinary
approach. A researcher with experience in speech and language and rehabilitation
will work part-time throughout the project and will be responsible for
contact with subjects, speech data collection and evaluation. A computer
scientist working with Professor Green's group will develop the software
for the training system and speech recogniser. A clinical engineer will
develop the demonstration applications.
A literature review will identify the very latest
publications in the area of speech recognition and control, the use of
technology in speech therapy, and training of consistency in dysarthric
speakers.
This study aims to demonstrate the feasibility
of speech-controlled technology for people with severe dysarthria. Subjects
will be involved throughout all stages of the project.
Development of a consistent speech training system.
The aim will be to build a computer-based training
package that trains the individual into more consistent utterances for
the defined vocabulary. We expect, with the methods described below, to
improve consistency in voice onset, articulation, placement and amplitude.
A speech training aid called OLT (Optical Logo
Therapy) which has been built at Sheffield [Hatzis, Green and Howard 1997,
1999] will be adapted and developed. OLT provides real-time visual feedback
as the client speaks, in the form of a two-dimensional `phonetic map'.
A neural net is trained to map the acoustics for each speech unit onto
a chosen area on the map. The map can be customised for an individual client
and re-trained as speech consistency improves. It portrays not only the
current sound, but also short-term speech dynamics, as a trajectory. OLT
has been successfully used in therapy for clients with problems in fricative
articulation. Applied to people with dysarthria, it offers a means of providing
feedback to help stabilise production, at the same time collecting data
for recognition. Crucially, the individual can practice with OLT in the
absence of a therapist - it requires only a mid-range PC and sound card.
The training system will be developed to train consistency in word (or
single utterance) production.
It is expected that dysarthric speakers will
have more difficulty in achieving consistent speech than speakers with
whom we have previously trailed this training method, due to their inability
to control articulation. There is evidence, however, that speech therapy
can improve the consistency of dysarthric speech [Netsell 1991]. Netsell
has also shown the importance of consistent feedback in successful outcomes.
Development of a speech recognition facility
Initially, a speech recogniser which recognises the
speech of the individual only (a speaker-dependent recogniser) and which
will work with only a small vocabulary will be developed.
The use of two techniques for this task will
be examined: the hidden Markov model and the neural network. Members of
the team have expertise in both of these techniques. The decision will
be based on the particular characteristics of the dysarthric speech.
An advantage of these approaches is that they
are scalable. We will develop the model initially as a speaker-dependent
one for a small vocabulary. Either method lends itself, however, to being
expanded both to larger vocabularies, and to speaker independence, through
training with a larger database of speech data from a larger number of
speakers. Thus, as the number of speakers using the training package grows,
and if further resources become available, this same model could be used
in a speaker-independent mode.
We will aim to produce a combined training function
and recogniser, which can be used by dysarthric speakers and carers with
as little input as possible from professionals.
Deployment of speech recogniser to control electronic
assistive technology.
Demonstration applications will be created for speech
control of two applications: a voice-output communication aid and an environmental
control system. The demonstration applications will be developed using
techniques built up over a number of years in the assistive technology
field [e.g. Hawley et al 1994].
The speech recognition system will aim for as
near 100% recognition rate as possible but no system can reach this target
consistently in all situations. Safeguards will therefore be built in to
ensure safety is not compromised. Safety is not a critical consideration
for the communication aid but the issue of less than 100% accuracy will
be addressed in the evaluation stage.