PRESENCE

Home About Me Research Publications Teaching Seminars Administration

Follow @rogerkmoore

contact

PRESENCE (PREdictive SENsorimotor Control and Emulation)

PRESENCE is a new architecture for speech-based interaction that is founded on the premise that future progress depends, not on how to "bridge the gap" between speech science and speech technology, but on both communities seeking to assimilate wider research findings on the behaviour of living systems in general and the cognitive abilities of human beings in particular.

The PRESENCE architecture is inspired by relatively old ideas such as perceptual control theory [Powers, W. T. (1973). Behavior: The Control of Perception: Hawthorne, NY: Aldine] together with relatively new discoveries such as mirror neurons [Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192] coupled with contemporary theories of cortical functionality such as hierarchical temporal memory [Hawkins, J. (2004). On Intelligence: Times Books] and emulation mechanisms [Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460-473].

PRESENCE intentionally blurs the distinction between the core components of a traditional spoken language dialogue system and, as a result, cooperative and communicative behaviour emerges as a by-product of an architecture that is founded on a model of co-action in which the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.

Architecture

The PRESENCE architecture is organized into four layers. The top layer is the main path for motor behaviour such as speaking. A system's needs S:n modulated by motivation, causes the selection of a communicative intention S:i that would satisfy those needs. The selection mechanism can be implemented as a search process, and this is indicated by the diagonal arrow running through the S:i module. The selected intention drives both actual motor behaviour S:m and an emulation of possible motor behaviour S:E(S:m) on the second layer. Sensory input feeds back into this second layer, providing a check as to whether the desired intention has been met. If there is a mismatch between intended behaviour and the perceived outcome, then the resulting error signal will cause the system to alter its behaviour appropriately.

The third layer of the model captures the empathetic relationship between system as a speaker and the user as a listener that conditions the speaking behaviour of the system. U:E(S:i) represents the emulation by the user of the intentions of the system, and S:E(U:E(S:i)) represents the emulation of that function by the system. A similar arrangement applies to S:E(U:E(S:m)) - the system's emulation of the user's emulation of the systems motor output. The fourth layer represents the system's means for interpreting the needs, intentions and behaviour of a user though a process of emulating the user's needs S:E(U:n), intentions S:E(U:i) and behaviour S:E(U:m).

The second, third and fourth layers are able to exploit the information embedded in the previous layers, and this is indicated by the large block arrows. This process is equivalent to parameter sharing between the different models and thus represents not only an efficient use of information but also offers a mechanism for learning. In fact such a process may be bi-directional, and the potential flow of information in the opposite direction is indicated by the small block arrows.

The basic communicative loop in the PRESENCE architecture contains system components that are themselves realized using similarly-structured building blocks. The PRESENCE architecture is thus inherently nested recursively and hence hierarchical in structure. As a result, further refinements in behaviour arise from the operation of the nested components.

Overview of the PRESENCE architecture (click to enlarge)

<< back to research

Motivation

PRESENCE is based on the premise that there are three fundamental factors that ultimately determine an organism's fitness to survive in an evolutionary framework:

a need to manage energy
(facilitating efficient behaviour)
a need to manage entropy
(facilitating efficient communications)
a need to manage time
(facilitating efficient planning)

These constraints, coupled with an integrated and recursive processing architecture, pave the way to a new approach to spoken language technology in which high-level interactive behaviours such as prosody and emotion emerge as fundamental aspects of a communicative system rather than as processing afterthoughts.

Practical Implications

A new model of speech generation that …

selects its characteristics appropriate to the needs of the listener
monitors the effect of its own output
modifies its behaviour according to its internal model of the listener

A new model of speech recognition that …

uses a forward/generative model based on an internal emulation of the communicative intentions of the speaker
adapts its forward/generative model to the voice of the speaker based on knowledge of its own voice

Where to find out more …

Original PRESENCE publications:

Moore, R. K. (2007). Spoken language processing: piecing together the puzzle. Speech Communication, 49, 418-435.
Moore, R. K. (2007). PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Computers, 56(9), 1176-1188.

Media coverage:

Machines might talk with humans by putting themselves in our shoes, PhysOrg.com.

Nicolao, M., Tesser, F., & Moore, R. K. (2013). A phonetic-contrast motivated adaptation to control the degree-of-articulation on Italian HMM-based synthetic voices. 8th ISCA Speech Synthesis Workshop (SSW8). Barcelona, Spain.
Crook, N. T., Field, D., Smith, C., Harding, S., Pulman, S., Cavazza, M., Charlton, D., Moore, R. K., & Boye, J. (2012). Generating context-sensitive ECA responses to user barge-in interruptions. Journal on Multimodal User Interfaces, 6(1-2), 13-25.
Nicolao, M., Latorre, J., & Moore, R. K. (2012). C2H: A computational model of H&H-based phonetic contrast in synthetic speech, INTERSPEECH. Portland, USA.
Moore, R. K., & Nicolao, M. (2011). Reactive speech synthesis: actively managing phonetic contrast along an H&H continuum, 17th International Congress of Phonetics Sciences (ICPhS). Hong Kong. [Pdf]
Worgan, S., & Moore, R. K. (2011). Towards the detection of social dominance in dialogue. Speech Communication, 53(9-10), 1104-1114.
Moore, R. K. (2010). Cognitive approaches to spoken language technology. In F. Chen & K. Jokinen (Eds.), Speech Technology: Theory and Applications (pp. 89-103). New York Dordrecht Heidelberg London: Springer.
Crook, N., Smith, C., Cavazza, M., Pulman, S., Moore, R. K., & Boye, J. (2010). Handling user interruptions in an embodied conversational agent, AAMAS 2010: 9th International Conference on Autonomous Agents and Multiagent Systems. Toronto.
Hofe, R., & Moore, R. K. (2008). Towards an investigation of speech energetics using 'AnTon': an animatronic model of a human tongue and vocal tract. Connection Science, 20(4), 319–336.
Worgan, S., & Moore, R. K. (2008). Enabling reinforcement learning for open dialogue systems through speech stress detection, Fourth International Workshop on Human-Computer Conversation. Bellagio, Italy.
Hofe, R., & Moore, R. K. (2008). AnTon: Using an animatronic tongue and vocal tract model to investigate human language learning from an energetics point of view, Epigenetic Robotics. Brighton.
Moore, R. K. 'Towards speech-based human-robot interaction', Proc. Symposium on Language and Robotics, Aveiro, Portugal, 10-12 Dec. (2007)[pdf].
Moore, R. K. (2007). PRESENCE: A human-inspired architecture for speech-based human-machine interaction. IEEE Trans. Computers, 56(9), 1176-1188.
Moore, R. K. (2007). Spoken language processing: piecing together the puzzle. Speech Communication, 49, 418-435.

PRESENCE (PREdictive SENsorimotor Control and Emulation)

Architecture

Motivation

Practical Implications

Where to find out more …

Latest results …