Workshop on
Machine Learning  for Information Extraction
Monday 21 August 2000
to be held in conjunction with the 14th European Conference on Artificial Intelligence (ECAI),
BERLIN, HUMBOLDT UNIVERSITY
Fabio Ciravegna (contact) ITC-irst Centro per la Ricerca Scientifica e Tecnologica,
Roberto Basili Universitity of Roma Tor Vergata,
Robert Gaizauskas, University of Sheffield

Workshop Description
Aim of the Workshop
List of Accepted Papers
Program Committee
Workshop Schedule
Technical Equipment available for presentations
About Workshop Registration

Workshop Description

The exponential increase in the quantity of  textual information held in digital archives has fuelled growing interest in computer-assisted techniques for information extraction  from text (IE).  IE systems, as understood by the applied natural language processing community, identify predetermined relevant information in text documents from some specific domain. Once extracted, the information can be used for a number of purposes: database population, text indexing, information highlighting, and so on. While significant progress in constructing such systems has been made, stimulated in particular by the DARPA Message Understanding Conferences, by general agreement the main barriers to wider use and commercialisation of IE are the difficulties in adapting systems to new applications and domains. Porting IE systems is generally both difficult and expensive, given the current technology, since changes generally need to be carried out manually by highly skilled experts. Moreover some sources (e.g. Web pages) may change very rapidly in both format and content. Tracking all the changes and continuously re-adapting IE systems is very expensive or even unfeasible if done manually.
To address these difficulties there has been increasing interest in applying machine learning (ML) techniques to Information Extraction from text. Tasks to which ML has been applied include template design, template filling, named entity recognition and resource compilation (e.g. lexicons, knowledge structures, grammars). The kind of sources analysed range from structured texts (e.g. Web pages) to semi-structured texts (e.g. rental ads) to free texts (e.g. newspaper articles). ML techniques which have been used range from symbolic (e.g. inductive logic programming, transformation-based learning, etc.) to numerical methods (e.g. naive-Bayes, maximum entropy, etc.).However, the current situation is characterized by isolated experiments in which individual ML techniques are applied to specific IE tasks. What is lacking is a unifying view of the issue of adopting ML techniques for IE.

Aim of the workshop

The proposed workshop aims to establish a forum for discussing current and future trends of the application of ML to IE, with a specific focus on the identification of a unifying view of the issue. The workshop has the following goals: Particularly welcomed are contributions concerning: In the interest of promoting as much discussion as possible, the number of paper presentations will be limited in favour of panels and posters. A final panel will discuss the research agenda for the coming years.


List of accepted papers

_____________________ LONG PAPERS _____________________
  1. Machine Learning of Extraction Patterns from Unannotated Corpora: Position statement, Roman Yangarber and Ralph Grishman, (New York University, USA).
  2. Corpus-driven learning of Event Recognition Rules, Roberto Basili, Maria Teresa Pazienza, and Michele Vindigni, (Universita` di Roma Tor Vergata, Italy).
  3. Boosted wrapper induction, Dayne Freitag (Just Research, USA) and Nicholas Kushmerick (University College Dublin, Ireland)
  4. Learning to tag for Information Extraction from Text, Fabio Ciravegna (ITC-Irst, Italy)
  5. Selective Sampling With Naive Cotesting: Preliminary results , Ion Muslea, Steven Minton, Craig Knoblock (Information Science Institute, USA)
  6. Wrapper Generation by k-Reversible Grammar Induction, Boris Chidlovskii (Xerox Research Centre Europe, France)
  7. Corpus-based Learning for Information Extraction, Thierry Poibeau (Thomson and Universite' Paris 13, France).
_____________________ SHORT PAPERS _____________________
 
  1. Learning Decision Trees for Named-Entity Recognition and Classification, Georgios Pailouras, Vangelis Karkaletsis and Constantine D. Spyropoulos (Institute for Informatics and Telecommunications, NCSR, Greece)
  2. Computational Learnability of Word Sense Disambiguation Cues, Paola Velardi (Universita` di Roma "La Sapienza", Italy) and Alessandro Cucchiarelli (Universita` di Ancona, Italy).


Program Committee

Roberto Basili, Universita` di Tor Vergata, Italy
Nicola Cancedda, Xerox Research Center Europe, France
Fabio Ciravegna , ITC-Irst, Italy
Robert Gaizauskas, University of Sheffield, UK
Ralph Grishman, New York University, Usa
Nicholas Kushmerick, University College Dublin, Ireland
Ion Alexandru Muslea, ISI, Usa
Thierry Poibeau, Thomson, France
Giorgio Satta, Universita` di Padova, Italy
Paola Velardi, Universita` di Roma “La Sapienza”, Italy

Workshop Schedule

Monday 21 August 2000, starting time 9.00
30 minutes per presentation, inclusive of discussion (15 for each short presentation)
 

  9:00 - 10:40  Session 1:

  • Machine Learning of Extraction Patterns from Unannotated Corpora:Position statement, Roman Yangarber and Ralph Grishman, (New York University, USA).
  • Corpus-driven learning of Event Recognition Rules, Roberto Basili, Maria Teresa Pazienza, and Michele Vindigni, (Universita` di Roma Tor Vergata, Italy).
  • Learning Decision Trees for Named-Entity Recognition and Classification, Georgios Pailouras, Vangelis Karkaletsis and Constantine D. Spyropoulos (Institute for Informatics and Telecommunications, NCSR, Greece)
  •   11:00 - 12:15: Session 2:
  • Boosted wrapper induction, Dayne Freitag (Just Research, USA) and Nicholas Kushmerick (University College Dublin, Ireland)
  • Learning to tag for Information Extraction from Text, Fabio Ciravegna (ITC-Irst, Italy)
  •   13:50 - 15:10: Session 3:
  • Wrapper Generation by k-Reversible Grammar Induction, Boris Chidlovskii (Xerox Research Centre Europe, France)
  •  An Integrated Framework for a Multi-domain event-based extraction, Thierry Poibeau (Thomson and Universite' Paris 13, France).
  • Computational Learnability of Word Sense Disambiguation Cues, Paola Velardi (Universita` di Roma "La Sapienza", Italy) and Alessandro Cucchiarelli (Universita` di Ancona, Italy).
  •   15:30 - 16:30:    Discussion, end of workshop
     


    Technical Equipment

    An overhead  projector and a LCD/video projector with SVGA resolution (800x600) will  be provided.
    The video projector will be connected to your own (!) notebook / pc via a 15 pin  standard VGA connector.
    The following web page on technical equipment for the  different ECAI events are accessible:
    http://www.ecai2000.hu-berlin.de/technical.html
     

    About Workshop Registration

    Workshop delegates MUST register for the main conference. Hence, registration for workshops is through the main registration  for ECAI 2000. To ensure availability of places at a workshop,   delegates should register as soon as possible.