Project Titles:
RJG-UG-1: | Automatic Face Detection in Images |
RJG-UG-2: | Analysing/Summarising Reader Comments in On-Line News |
RJG-UG-3: | Generating Image Descriptions from Labelled Bounding Boxes |
RJG-UG-4: | Size Matters: Acquiring Vague Spatial Size Information from Textual Sources |
RJG-UG-5: | Machine Learning Approaches to Coreference Resolution in Conversational Data |
RJG-UG-6: | Building Named Entity Recognizers from Wikipedia |
RJG-UG-7: | Mobile Robot Navigation |
Student numbers: this project is for 1 student only
While the algorithms that have been developed take advantage of specific aspects of the task (e.g the nature of human faces) the task and the many aspects of the algorithms developed to address it are examples of the wider class of computer vision problems/approaches referred to as object detection. Thus, face detection is a good problem to start with in learning basic computer vision techniques for object detection. It is also a good choice of problem to start with because there are various face detection challenge datasets available that allow one to compare one's own approach with others on standard datasets for which the performance of a wide range of approaches is known. For example the NIST Face Recognition Challenges and Evaluation datasets or the 300 Faces-in-the-Wild Challenge" dataset of the Facial Keypoints Detection dataset are candidate datasets for consideration in the project.
Student numbers: this project is for up to 2 students.
Increasingly, in a period of disruptive change for the traditional media, newspapers see their future as lying in such conversations with and between readers, and new technologies to support these conversations are becoming essential. In this scenario there are a number of potential users: news readers and the originating journalist want to gain a structured overview of the mass of comments, in terms of the sub-topics they address and the opinions (polarity and strength) the commenters hold about these topics; editors or media analysts may need a more widely scoping analysis.
At present none of these users can effectively exploit the mass of comment data -- frequently hundreds of comments per article -- as there are no tools to support them in doing so. What they need is new tools to help them make sense of the large volumes of comment data. Of particular potential here are NLP technologies such as: automatic summarisation, topic detection and clustering, sentiment detection and conversation/dialogue analysis.
The SENSEI Project is an EU-funded research project in which Sheffield is a partner. SENSEI's goal is to develop new technologies to assist in making sense of large volumes of conversational data, such as reader comments in on-line news, as described above. Researchers in Computer Science and Journalism here at Sheffield are currently addressing this challenge, using data from The Independent or The Guardian who are supporting the project. This student project will be carried out in conjunction with SENSEI, sharing data, tools and expertise.
The second part of the project will involve the detailed design, implementation and evaluation of a prototype for the tool specified the first part of the project. Of particular interest will be the evaluation of the tool: this may require the manual creation of some annotated data for use as a "gold standard" against which to evaluate the tool or the involvement of human subjects to make judgements about the utility of the tool.
Student numbers: this project is for 1 student only
Advances in computer vision techniques mean that a wide range of visual features can now be extracted for use in image content analysis. These features can be used for object detection -- the task of identifying instances of objects in an image, labelling them with their class and localising them in the image by drawing a bounding box around them. The accuracy of object detection by modern vision processing systems has significantly advanced in the past few years. As a consequence, it is now worth thinking of how to select which objects to include in an image description and what to say about them in terms of their spatial relations and the activities or processes we can infer they may be involved in. This is the “content selection” part of the task of image description generation. The other part of the description task is choosing the linguistic form, i.e. the words and their order, to use to express the selected content.
The project will work in the framework of ImageCLEF, an on-going series of evaluation challenges run by the international information retrieval community. Starting in 2003, ImageCLEF has run an annual challenge consisting of one or more "tracks" – defined tasks with supplied development and test data which participants must address. Participants design and train their systems on the development data and then submit results of running their system on the test data. The results are evaluated by the organizers and then published at a workshop which participants attend to present their approaches to the task(s).
Student numbers: this project is for 1 student only
We use this sort of "commonsense" knowledge all the time in reasoning about the world: we can bring a toaster home in our car, but not elephant. An autonmous robot moving about in the world and interacting with objects would need lots of this kind of knowledge. Moreoever, we appear to use knowledge about typical object sizes, along with other knowledge, in interpreting visual scenes, especially in 2D images, where depth must be inferred by the observer. For example, if, when viewing an image, our knowledge about the relative sizes of cars and elephants is violated under the assumption that they are in the same depth plane, then we re-interpret the image so that the car or elephant moves forward or backward relative to the other, so that the assumption is no longer violated. Thus, information about relative or absolute object size is useful in computer image interpretation. It is also useful in image generation: if I want to generate a realistic scene containing cars and elephants then I must respect their relative size constraints. Various computer graphics applications could exploit such information.
Manually coding this kind of information is a tedious and potentially never ending task, as new types of objects are constantly appearing. Is there a way of automating the acquisition of such information? The hypothesis of this project is that there is: we mine this information from textual sources on the web that make claims about the absolute or relative sizes of objects.
The project will begin by reviewing the general area of information extraction with NLP, with specific attention to tasks like named entity extraction (which has been used, e.g. to identify entities like persons, locations and organisation as well as times, dates and monetary amounts and could be adapted to identify object types and lengths) and relation extraction (which has been used to recognise relations between entities, such works-for, and attributes of entities, such as has-location, and could be adapted to recognise the size-of attribute).
Information extraction systems in general are either rule-based, where the system relies on manually-authored patterns to recognise entities and relations, or supervised learning-based where the system relies on learning patterns from manually annotated examples. Following initial reading and analysis of the data, a decision will be made about which approach to take.
In addition to identifying absolute size information (e.g. the Wikipedia article on "apple" says: "Commercial growers aim to produce an apple that is 7.0 to 8.3 cm (2.75 to 3.25 in) in diameter, due to market preference.") the project will also investigate how to acquire relative size information. For example, sentences indicating the topological relation inside ("You can help apples keep fresh longer by storing them in your pantry or refrigerator drawer") allow us to extract the information that apples are smaller than refrigerators. By building up a set of facts expressing relative sizes, together with absolute size information, we can infer upper and lower bounds on the size of various objects. Of course, differring, even conflicting, information may be acquired from different sources. Some means of merging/reconciling this information will need to be devised.
The final element of the project will be to couple the acquired facts to a simple inference engine that will allow is to infer "best estimate" bounds on the size of objects from whatever facts we have acquired. E.g. if all we know is that apples fit inside refrigerators and that refrigerators are no more than 3m tall, then need to be able to infer that apples are less than 3m tall.
Of course, in addition to implementing the above capabilities the project will need to investigate ways of evaluating the various outputs. For example, the accuracy of identifying size information ("15 cm", "5-6 feet"), size-of information ( size-of(apple,7.0-8.3cm)) and relative size information ("apple <= refrigerator") needs to be assessed, as does the correctness of the inference procedure.
Student numbers: this project is for 1 student only
The task of building coreference resolution algorithms has been studied in Natural Language Processing (NLP) for some time. To develop systems and assess their performance the coreference community has built annotated training and test datasets that contain manually annotated coreference links between coreferring phrases. The best known recent datasets are those produced for the CONLL 2011 and CONLL 2012 Shared Task Challenges, which were devoted to coreference resolution. Not only do such datasets allow competing algorithms to be compared by comparing their scores on the same data, they also permit the development of algorithms based on supervised machine learning techniques that learn from annotated examples.
Different text genres (e.g. news articles, fiction books, academic journal papers, transcribed TV programs) exhibit different linguistic features, and in particular different patterns of coreference. While the CONNL 2011 dataset contains a mix of genres, including some conversational data (transcribed telephone conversations), it does not contain and data drawn from the sort of social media conversations one finds in, e.g., reader comments on on-line news (see the BBC or The Guardian or The Independent websites for examples). This genre of text is being studied in the SENSEI Project, an EU-funded research project in which Sheffield is a partner. One of SENSEI's aims is to produce summaries of reader comment sets consisting of potentially hundreds of comments. Detecting coreferences within this data is likely to be critical to generating good summaries.
The project will begin by reading background and related work. Several freely available coreference systems, specifically the Stanford and BART coreference systems, will be investigated and run on the CONLL data. A system design, based on a supervised learning approach and possibly an extension of one of the freely available systems, will be selected and a system developed and evaluated on the CONLL data. The system will then be tested on the SENSEI reader comment data, the results analysed and improvements proposed and tested.
Student numbers: this project is for 1 student only
Named entity recognizers are software applications that identify and classify mentions of named entities in running text. I.e. these programs determine the extent or boundaries of named entities in text, typically by marking them up using XML, and then classify the marked named entities into classes such as "person", "organization", "location", etc.
Named entity recognizers are important software components used in many language processing systems such as search engines, text mining systems, machine translation systems and so on. They have been studied for many years and relatively high accuracy systems exist for English and for a few other widely spoken languages. However, for many languages such systems do not exist and there is a need to develop them if these languages are to participate fully in the newest advances in language technology applications.
Currently the most popular approach to building named entity recognizers is to use supervised learning: a selection of training texts is annotated by hand to mark up all occurrences of named entities in them and to assign a class to each annotated mention. Standard machine learning algorithms are then used to train classifiers on this manually prepared data to annotate new texts automatically.
This approach works quite well, but has the drawback that a considerable amount of text needs to be annotated. This poses a limitation for minority lanaguges, which may not be able to muster the resource to carry out the annotation. Thus the question arises: can we use existing structured or semi-structured data to create training data automatically? One way this might be done is using Wikipedia. Wikipedia contains lots of text in many languages with many mentions of named entities. Its ancillary resource, DBPedia, contains typing information about many of these entities (e.g. person, location, etc.). Can we exploit these resources to create training data to train named entity recognizers for a wide range of languages? If so, how well will they work? These are the question this project will address.
Student numbers: this project is for 1 student only
The algorithms will be implemented in Python and deployed on the NAO robots, recently acquired by the University. The goal is test the system at navigating amongst obstacles (e.g.carboard boxes) to carry out some instruction (e.g. to retrieve an object at a given location).
At the end of the project the student should have a solid understanding of the basics of robot self-localisation and path planning.