Yorick Wilks, Robert Gaizauskas,
Kevin Humphreys & Hamish Cunningham
Department of Computer Science
University of Sheffield
The benefits of the effective creation of Information Extraction (IE) in the last ten years, driven by the ARPA TIPSTER programme and the associated MUC evaluations, have been enormous, but it must now be time to ask what research issues face the systems we have built and what we should do next. We suggest that there are two classes of important research issues: those requiring detailed comparative evaluation of alternative approaches to IE subtasks and those to do with flexible adaptation of IE systems to new users and domains.
Both these classes of issues, we argue, can be profitably addressed within an architecture for language engineering called GATE, the General Architecture for Text Engineering. We describe GATE, which owes a great deal to the TIPSTER architecture, and also the LaSIE IE system, which is set within GATE and with which we have competed in MUC, and bring out the distinctive features that have led to its good performance in certain areas.
Within GATE, we can now reconfigure various Language Engineering modules so as to assemble alternative IE systems and then to compare their performance with LaSIE. In this way the environment provided by GATE will allow us to make significant strides in assessing alternative LE technologies and in rapidly adapting LE prototype systems for new users and domains.