Copyright © Roger K. Moore | contact Site designed using Serif WebPlus X6
PRESENCE is a new architecture for speech-based interaction that is founded on the premise that future progress depends, not on how to "bridge the gap" between speech science and speech technology, but on both communities seeking to assimilate wider research findings on the behaviour of living systems in general and the cognitive abilities of human beings in particular.
The PRESENCE architecture is inspired by relatively old ideas such as perceptual control theory [Powers, W. T. (1973). Behavior: The Control of Perception: Hawthorne, NY: Aldine] together with relatively new discoveries such as mirror neurons [Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192] coupled with contemporary theories of cortical functionality such as hierarchical temporal memory [Hawkins, J. (2004). On Intelligence: Times Books] and emulation mechanisms [Wilson, M., & Knoblich, G. (2005). The case for motor involvement in perceiving conspecifics. Psychological Bulletin, 131(3), 460-473].
PRESENCE intentionally blurs the distinction between the core components of a traditional spoken language dialogue system and, as a result, cooperative and communicative behaviour emerges as a by-product of an architecture that is founded on a model of co-action in which the system has in mind the needs and intentions of a user, and a user has in mind the needs and intentions of the system.
The PRESENCE architecture is organized into four layers. The top layer is the main path for motor behaviour such as speaking. A system's needs S:n modulated by motivation, causes the selection of a communicative intention S:i that would satisfy those needs. The selection mechanism can be implemented as a search process, and this is indicated by the diagonal arrow running through the S:i module. The selected intention drives both actual motor behaviour S:m and an emulation of possible motor behaviour S:E(S:m) on the second layer. Sensory input feeds back into this second layer, providing a check as to whether the desired intention has been met. If there is a mismatch between intended behaviour and the perceived outcome, then the resulting error signal will cause the system to alter its behaviour appropriately.
The third layer of the model captures the empathetic relationship between system as a speaker and the user as a listener that conditions the speaking behaviour of the system. U:E(S:i) represents the emulation by the user of the intentions of the system, and S:E(U:E(S:i)) represents the emulation of that function by the system. A similar arrangement applies to S:E(U:E(S:m)) - the system's emulation of the user's emulation of the systems motor output. The fourth layer represents the system's means for interpreting the needs, intentions and behaviour of a user though a process of emulating the user's needs S:E(U:n), intentions S:E(U:i) and behaviour S:E(U:m).
The second, third and fourth layers are able to exploit the information embedded in the previous layers, and this is indicated by the large block arrows. This process is equivalent to parameter sharing between the different models and thus represents not only an efficient use of information but also offers a mechanism for learning. In fact such a process may be bi-directional, and the potential flow of information in the opposite direction is indicated by the small block arrows.
The basic communicative loop in the PRESENCE architecture contains system components that are themselves realized using similarly-structured building blocks. The PRESENCE architecture is thus inherently nested recursively and hence hierarchical in structure. As a result, further refinements in behaviour arise from the operation of the nested components.
PRESENCE is based on the premise that there are three fundamental factors that ultimately determine an organism's fitness to survive in an evolutionary framework:
These constraints, coupled with an integrated and recursive processing architecture, pave the way to a new approach to spoken language technology in which high-level interactive behaviours such as prosody and emotion emerge as fundamental aspects of a communicative system rather than as processing afterthoughts.
A new model of speech generation that …
A new model of speech recognition that …
Original PRESENCE publications:
Media coverage:
Machines might talk with humans by putting themselves in our shoes, PhysOrg.com.