CONVERSE was based on quite simple intuitions: that conversational skill is a compromise between two tendencies. First, there is the active, top-down, intentional driver with something to say (the feature that Colby's PARRY (Colby 1971) had and Weizenbaum's ELIZA (Weizenbaum 1967) so clearly lacked).
Secondly, there was the passive, bottom-up, listener aspect which meant understanding what was said to it and react appropriately, by answering questions or even changing the topic. This, as all researchers know, is much harder because it requires understanding. Humans who lack it are conspicuously bad conversationalists but this is normally attributed to not listening rather than not understanding what is said. We could call the simple CONVERSE architecture (Figure 1) Pushmepullyou (after Dr Doolittle) to convey the tension between the two elements.
What we are now engaged in is an attempt to move both the push and pull sides to a higher level. For the latter we hope to use a model of individual agents beliefs and intentions we have worked on (in Sheffield and New Mexico) for some years, called ViewGen [26], it is a well-developed method (with several iterations of Prolog programs) of creating and manipulating the beliefs, goals etc. of individual spaces for inference we call environments, all controlled by an overall default process called ascription that propagates beliefs etc. from space to space with minimum effort so the system can model the states of other agents it is communicating with.
It is based on the general assumption that a communicating agent must model the internal states of its interlocutor agents as best it can, not just by storing their features, like age and size, but their own states. This system is much stronger than the rather elementary Person Data Base in CONVERSE but we intend to strengthen that with aspects of ViewGen so as to increase its functionality substantially.
The more immediately challenging move is to replace the push-me or parsing side of CONVERSE, which was based on a statistical parser of general English prose with a shell of what we called microqueries which adapted it to dialogue from normal prose.
We are currently working on a robust parser of English conversation, which is to say, a transducer from sentences to a set of dialogue acts plausible for any domain. Now is a perfect moment to do this since corpora of English dialogue (like the British National Corpus or BNC) have now become available so that this task can be seen as an extension of contemporary empirical computational linguistics into the field of pragmatics itself, the last bastion. The chief difficulty, suffered by all researchers in this tradition, is that although the BNC is available for unsupervised training, there is very little dialogue corpus (Apart from Edinburgh's MAPTASK [27], Rochester's TRAINS and a corpus at VerbMobil) which is marked with a set of dialogue acts for supervised training and evaluation. There is currently an international initiative (DAMSL) [28] in this area but as yet few signs of progress. The next key step in empirical linguistics, one not yet achieved anywhere, will be such a robust model of English dialogue structure, probably using machine learning methods, a task so far only attempted for speech interfaces in very narrow domains.