Research Interests

My research interests include the following topics:

  • Lexical semantics and disambiguation
  • Information Extraction
  • Information Retrieval
  • Semantic Web

My current focus of research is methods of mining and exploiting background knowledge from various data resources (e.g., Wikipedia, Wiktionary, DBpedia and Linked Data in general) to support various NLP tasks, such as semantic relatedness, disambiguation, and information extraction. I have developed methods of exploiting Wikipedia, WordNet and Wiktionary to supportt information extraction and sense disambiguation. I have given tutorials on information extraction at Web-scale, and I am currently exploring methods of using Linked Data for such tasks. 

I am currently working on the FootballWhispers project, which uses Information Extraction technology to mine football related information from social stream and the Web in general, to forecast potential changes in team assets (footballers, managers, ownership, etc.).

PhD Research

I finished my PhD in 2012. My PhD research focuses on exploiting background knowledge from various resources to support supervised Named Entity Recognition - a fundamental task of IE that extracts named entities from unstructured texts.

The research addresses three sub-topics concerning NER:

  • Document annotation methods for creating training data for building supervised learning models for NER;
  • Unsupervised approaches to constructing gazetteers using external resources (e.g., Wikipedia and unstructured corpus) to support NER;
  • Unsupervised approaches to resolving ambiguities in named entities, based on measures of semantic relatedness.
A copy of my thesis can be found here.

Research Projects

I have worked on the following research projects in the past:

  • Lodie - A 3-year EPSRC funded project aimed to develop Information Extraction techniques able to (i) scale at web level and (ii) adapt to user information need, by exploiting linked data on the Web. The project created over 20 publications and its technology has been adapted to two real-world business use cases in the format of KTP (JustGiving and FootballWhispers)
  • SmartProducts - An EC funded FP7 project aimed at developing the scientific and technological basis for building "smart products" with embedded "proactive knowledge". Smart products leverage "proactive knowledge" to communicate and co-operate with humans, other products and the environment.
  • ArchaeoTools - Jointly funded by EPSRC, JISC and AHRC, the aim of this project is to develop data mining and information extraction technologies to allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks.
  • Abraxas - a project funded by EPSRC and aimed at developing new unsupervised methods of ontology learning from texts.