I have previously worked on the SERA (Social Engagement with Robots and Agents) project within the Rehabilitation and Assistive Technology group and the Department of Computer Science. The project collected and analysed audio-visual data of older people interacting with a robot in their own homes to address questions of building sociability into verbally interactive robots for future applications of assistive technology.
A video of the current set-up of the SERA project is available below in:
I completed a PhD in the Clinical Applications of Speech Technology and Speech and Hearing groups as part of both Computer Science and Human Communication Sciences departments at the University of Sheffield. The thesis title is "Personalising Synthetic Voices for Individuals with Severe Speech Impairment", supervised by Professor Phil Green and Dr Stuart Cunningham.
Speech technology can help individuals with speech disorders to interact more easily. Many individuals with severe speech impairment, due to conditions such as Parkinson's disease or motor neurone disease, use voice output communication aids (VOCAs), which have synthesised or pre-recorded voice output. This voice output effectively becomes the voice of the individual and should therefore represent the user accurately. Currently available personalisation of speech synthesis techniques require a large amount of data input, which is difficult to produce for individuals with severe speech impairment. These techniques also do not provide a solution for those individuals whose voices have begun to show the effects of dysarthria. The thesis shows that Hidden Markov Model (HMM)-based speech synthesis is a promising approach for `voice banking' for individuals before their condition causes deterioration of the speech and once deterioration has begun. Data input requirements for building personalised voices with this technique using human listener judgement evaluation is investigated. It shows that 100 sentences is the minimum required to build a significantly different voice from an average voice model and show some resemblance to the target speaker. This amount depends on the speaker and the average model used. A neural network analysis trained on extracted acoustic features revealed that spectral features had the most influence for predicting human listener judgements of similarity of synthesised speech to a target speaker. Accuracy of prediction significantly improves if other acoustic features are introduced and combined non-linearly. These results were used to inform the reconstruction of personalised synthetic voices for speakers whose voices had begun to show the effects of their conditions. Using HMM-based synthesis, personalised synthetic voices were built using dysarthric speech showing similarity to target speakers without recreating the impairment in the synthesised speech output.
Creer, S. M., Green, P. D, Cunningham, S. P. and Yamagishi, J. (2010) Building personalised synthetic voices for individuals with dysarthria using the HTS toolkit. In J. W. Mullennix and S. E. Stern, (eds.) Computer Synthesised Speech Technologies: tools for aiding impairment, Hershey, PA, USA: IGI Global: chapter 6: 92-115.
Creer, S., Cunningham, S., Hawley, M., Wallis, P. (2011) Describing the interactive domestic robot set-up for the SERA project. Applied Artificial Intelligence, 25:1-29 (in press).
Green, P. D., Khan, Z., Creer, S. M. and Cunningham, S. P. (2011) Reconstructing the voice of an individual following Laryngectomy. Augmentative and Alternative Communication 27(1):61-66.
Creer, S. M., Green, P. D. and Cunningham, S. P. (2009) Voice banking, Advances in clinical neuroscience and rehabilitation, 9(2) May/June: 16-17.
Creer, S. M. and Thompson, P. A. (2005) TEI mark-up of spoken language data: the BASE experience. In M. Georgiafentis and G. Kotzoglou (eds.) Reading Working Papers in Linguistics, 8: 149-174.
Wallis, P., Maier, V., Creer, S. and Cunningham. S. (2010) Conversation in Context: what should a robot companion say? In Proceedings of EMCSR 2010, edited by R. Trappl. Vienna: 547-552.
Creer, S. M., Green, P. D., Cunningham, S. P. and Fatema, K. (2009) Personalising synthetic voices for individuals with progressive speech loss: judging speaker similarity. In Proceedings of Interspeech 2009: 1427-1430.
Creer, S. M. and Wolters, M. (2003) Stress patterns of German cardinal numbers. In Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona.
Creer, S. (2010) Development of Robot-based Health Interactions for the SERA project: Increasing User Engagement based on the Transtheoretical Model. International Conference on Aging, Disability and Independence (ICADI), Newcastle Upon Tyne, 8-10th September.
Creer, S. (2010) Building personalised synthetic voices for individuals with severe speech impairment. Speech and Hearing group seminar, 21st April, 2010.
email: S.Creer "at" sheffield.ac.uk
Sarah Creer's ScHaRR web page
Last updated March 10, 2011