Research

I have over forty years experience in speech technology R&D, and much of my research has been based on insights derived from human speech perception and production.  I built my first automatic speech recogniser - EARS (Electronic Apparatus for Recognising Speech) in 1972, and in the mid-1970s I introduced the Human Equivalent Noise Ratio (HENR) - a vocabulary-independent measure of the goodness of an automatic speech recogniser based on a computational model of human word recognition.  In the 1980s, I published HMM Decomposition - a powerful method for recognising multiple simultaneous signals (such as speech in noise) based on observed properties of the human auditory system.  During the 1990s and more recently, I've continued to champion the need to understand the similarities and differences between human and machine spoken language behaviour.

Since joining Sheffield I've embarked on research that is aimed at developing computational models of Spoken Language Processing by Mind and Machine, and I'm currently working on a unified theory of spoken language processing called PRESENCE (PREdictive SENsorimotor Control and Emulation).  PRESENCE weaves together accounts from a wide variety of different disciplines concerned with the behaviour of living systems - many of them outside the normal realms of spoken language - and compiles them into a new framework that is intended to breath life into a new generation of research into spoken language processing, especially for Autonomous Social Agents and Human-Robot Interaction.  I am also interested in the relationship between human and animal communication, and have instigated several research activities under the banner of Vocal Interactivity in-and-between Human, Animals and Robots (VIHAR).

At the more practical end of the scale, I’m involved in collaborations aimed at Clinical Applications of Speech Technology (particularly for individuals with speaking difficulties) and Creative Applications of Speech Technology through interactions with colleagues from the performing arts.


Spoken Language Processing by Mind and Machine

PRESENCE is a new architecture for speech-based interaction that is founded on the premise that future progress depends, not on how to "bridge the gap" between speech science and speech technology, but on both communities seeking to assimilate wider research findings on the behaviour of living systems in general and the cognitive abilities of human beings in particular.  PRESENCE intentionally blurs the distinction between the core components of a traditional spoken language dialogue system and, as a result, cooperative and communicative behaviour emerges as a by-product of an architecture that is founded on a model of co-action in which the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.  MORE >>





Vocal Interactivity

In pursuing PRESENCE-based approaches to modelling spoken language, I've become increasingly drawn into studying vocalisation in general, whether it is performed by human beings, animals or robots.  I'm currently developing synthesisers for mammalian, insect and dolphin vocalisations, and embedding them in behavioural simulations implemented in Pure Data and in real-time embodiments using e-puck, Create and RoboKind robots.  My aim is to demonstrate that many of the little-understood paralinguistic features exhibited in human speech (including emotion) are derived from characteristics that are shared by living systems in general.  Modelling such behaviours in this wider context should eventually enable us to implement usable and effective interaction with artificial intentional agents such as robots.  MORE >>






Autonomous Social Agents

Whilst there is considerable interest in the possibility of people interacting with ‘intelligent’ agents for a wide range of applications, as yet there is no underpinning science as to how create such artefacts to ensure that they are capable of providing an effective and sustainable interaction.  This is especially true in the case of human-robot interaction where the look, sound and behaviour of a robot may not be consistent - a situation that can trigger feelings of eeriness and repulsion in the users.  I have developed the first mathematical model of this so-called ‘uncanny valley’ affect (published in Nature), and I am currently investigating the implications for designing autonomous social agents (such as robots) whose visual, vocal, cognitive and behavioural ‘affordances’ are appropriate to the role for which they are designed.  MORE >>






Clinical Applications

One of the areas in which advanced speech technology algorithms can really make a difference to people is in situations where individuals have serious difficulties in hearing or speaking.  The Sheffield Speech and Hearing group has a long history of research in this area, and has devoted significant effort to developing assistive technology that is tailored to the needs of specific individuals, such as dysarthric speakers with cerebral palsy or laryngectomy patients.  My research is focused on the development of low-dimensionality speech synthesis and how to optimise real-time control for users with severe motor impairment.  I’m currently working on an articulatory synthesiser (based on an acoustic waveguide model of the human vocal tract) which incorporates a novel, yet anatomically parsimonious, tongue.






Creative Applications

The use of technology to manipulate and alter human or machine voices interest me greatly, and I’m particularly keen to use such techniques to create voices for animated agents and robots that are ‘appropriate’ to their visual and behavioural affordances.  I'm also becoming increasingly involved in using such manipulations to explore more creative possibilities, and I’ve recently enjoyed very productive interactions with colleagues from the performing arts.  In particular I work closely with Dr. Chris Newell based at the University of Hull.  As well as pursuing various collaborative projects, Chris and I are slowly compiling a book entitled The Art of Artificial Voices in which we explore the creative uses of spoken language from both the technological and the performance perspectives.  MORE >>






Progress and Prospects

Having been actively involved in speech technology R&D for over four decades, I’m often called upon to deliver my personal perspective on the progress that’s been made in the past and the prospects we’re likely to witness in the years to come. In order to inform these views, I’ve not only conducted a number of surveys of the speech technology R&D community, but I’ve also exploited the ability of my Human Equivalent Noise Ratio (HENR) model to extrapolate automatic speech recogniser performance into the future.  In addition, I maintain a timeline of significant historical events in our field (including some infamous quotations and notable predictions) which it is hoped will provide a useful resource for students and researchers interested in learning how the speech technology field has developed over the years.  MORE >>













pd-extended



Much of my personal research (and teaching) is performed using the Pure Data (Pd) dataflow programming language. Pd provides a real-time graphical programming environment (authored by Miller Puckette – “The diagram is the program”) that is designed to operate on audio, graphical and video signals.  Pd is a free alternative to Max/MSP, and not only does it make it easy to implement real-time audio input/output, but it is especially efficient for rapid prototyping and creative research.  You can download some examples here:  Pd downloads >>


© Roger K. Moore 2021