STARDUST

Speech Recognition for People with Severe Dysarthria

Research Plan

This will be a three year project requiring a multi-disciplinary approach. A researcher with experience in speech and language and rehabilitation will work part-time throughout the project and will be responsible for contact with subjects, speech data collection and evaluation. A computer scientist working with Professor Green's group will develop the software for the training system and speech recogniser. A clinical engineer will develop the demonstration applications.
A literature review will identify the very latest publications in the area of speech recognition and control, the use of technology in speech therapy, and training of consistency in dysarthric speakers.
This study aims to demonstrate the feasibility of speech-controlled technology for people with severe dysarthria. Subjects will be involved throughout all stages of the project.

Development of a consistent speech training system.

The aim will be to build a computer-based training package that trains the individual into more consistent utterances for the defined vocabulary. We expect, with the methods described below, to improve consistency in voice onset, articulation, placement and amplitude.
A speech training aid called OLT (Optical Logo Therapy) which has been built at Sheffield [Hatzis, Green and Howard 1997, 1999] will be adapted and developed. OLT provides real-time visual feedback as the client speaks, in the form of a two-dimensional `phonetic map'. A neural net is trained to map the acoustics for each speech unit onto a chosen area on the map. The map can be customised for an individual client and re-trained as speech consistency improves. It portrays not only the current sound, but also short-term speech dynamics, as a trajectory. OLT has been successfully used in therapy for clients with problems in fricative articulation. Applied to people with dysarthria, it offers a means of providing feedback to help stabilise production, at the same time collecting data for recognition. Crucially, the individual can practice with OLT in the absence of a therapist - it requires only a mid-range PC and sound card. The training system will be developed to train consistency in word (or single utterance) production.
It is expected that dysarthric speakers will have more difficulty in achieving consistent speech than speakers with whom we have previously trailed this training method, due to their inability to control articulation. There is evidence, however, that speech therapy can improve the consistency of dysarthric speech [Netsell 1991]. Netsell has also shown the importance of consistent feedback in successful outcomes.

Development of a speech recognition facility

Initially, a speech recogniser which recognises the speech of the individual only (a speaker-dependent recogniser) and which will work with only a small vocabulary will be developed.
The use of two techniques for this task will be examined: the hidden Markov model and the neural network. Members of the team have expertise in both of these techniques. The decision will be based on the particular characteristics of the dysarthric speech.
An advantage of these approaches is that they are scalable. We will develop the model initially as a speaker-dependent one for a small vocabulary. Either method lends itself, however, to being expanded both to larger vocabularies, and to speaker independence, through training with a larger database of speech data from a larger number of speakers. Thus, as the number of speakers using the training package grows, and if further resources become available, this same model could be used in a speaker-independent mode.
We will aim to produce a combined training function and recogniser, which can be used by dysarthric speakers and carers with as little input as possible from professionals.

Deployment of speech recogniser to control electronic assistive technology.

Demonstration applications will be created for speech control of two applications: a voice-output communication aid and an environmental control system. The demonstration applications will be developed using techniques built up over a number of years in the assistive technology field [e.g. Hawley et al 1994].
The speech recognition system will aim for as near 100% recognition rate as possible but no system can reach this target consistently in all situations. Safeguards will therefore be built in to ensure safety is not compromised. Safety is not a critical consideration for the communication aid but the issue of less than 100% accuracy will be addressed in the evaluation stage.