Rob Gaizauskas : UG Level 3 Projects 2018-19

Email: Rob Gaizauskas

Project Titles:

RJG-UG-1:Aspect-based Sentiment Analysis
RJG-UG-2:Clustering and Cluster Labelling of Reader Comments in On-Line News
RJG-UG-3:Generating Image Descriptions from Labelled Bounding Boxes
RJG-UG-4:Size Matters: Acquiring Vague Spatial Size Information from Textual Sources
RJG-UG-5:Machine Learning Approaches to Coreference Resolution in Conversational Data
RJG-UG-6:Communicating with a Mobile Virtual Robot via Spoken Natural Language Dialogue
RJG-UG-7:Temporal Information Extraction from Clinical Text
RJG-UG-8:Building an Information Seeking Dialogue Agent
RJG-UG-9:Building a Commonsense Reasoner
RJG-UG-10:Natural Language Analysis in Scenario Based Training Systems
RJG-UG-11:Automated Evaluation of Pathology Lab Reports in External Quality Assessment
RJG-UG-12:Building an Alexa Skill for DCS


[next][prev][Back to top]

RJG-UG-1:   Aspect-based Sentiment Analysis

Suitable for: ITMB/CS/SE/AI/CS+Maths

Background

As commerce has moved to the Web there has been an explosion of on-line customer-provided content reviewing the products (e.g cameras, phones, televisions) and services (e.g. restaurants, hotels) that are on offer. There are far too many of these for anyone to read and hence there is tremendous potential for software that can automatically identify and summarise customers' opinions on various aspects of these products and services. The output of such automatic analysis would be of tremendous benefit both to consumers trying to decide which product or service to purchase and to product and service providers trying to improve their offerings or understand the strengths and weaknesses of their competitors.

By aspects of products and services we mean the typical features or characteristics of a product or service that matter to a customer or are likely to be commented on by them. For example, for restaurants we typically see diners commenting on the quality of the food, the friendliness or speed of service, the price, the ambience or atmosphere of the restaurant, and so on. The automatic identification of aspects of products or services in customer reviews and the determination of the customer sentiment with respect to them are tasks that natural language processing researchers have been studying for several years now. As is common in the field, shared task challenges -- community-wide efforts to define a task, develop data resources and metrics for training and testing and run a friendly competition to help identify the most promising approaches -- have appeared in recent years addressing the tasks of aspect identifcation and sentiment determinatio. Specifically, SemEval, an annual forum for the evaluation of language processing technologies across a range of tasks involving understanding some aspect of natural language, has run challenges on Aspect-Based Sentiment Analysis (ABSA) in 2014, 2015 and 2016.

The SemEval ABSA challenges provide a good place to start for any project on sentiment analysis as they supply: clear task definitions, data for training and testing, a conference proceedings with lots of discussion about different approaches and results of state-of-the-art systems to compare against.

Project Description

This project will begin by reviewing: Following this review, one or more approaches to ABSA will be chosen and implemented, building on existing NLP resources where it is sensible to do so. The resulting algorithm(s) will be evaluated using SemEval data and systes as benchmarks and refinements made to the algorithms, as far as time permits. Another line of possible work is to consider how best to summarise and present the results of an ABSA system to end users.

Prerequisites

An interest in natural language processing and machine learning and Python programming skills are the only prerequisites for the project.

Initial Reading and Useful Links

Contact supervisor for further references.
[next][prev][Back to top]

RJG-UG-2:   Clustering and Cluster Labelling of Reader Comments in On-Line News

Suitable for: CS/SE/AI/CS+Maths

Background

In a news publisher website such as The Guardian, journalists publish articles on different topics from politics and civil rights to health, sports and celebrity news. The website design supports the publication and reading of original news articles and at the same time facilitates user-involvement via reader comments.

Increasingly, in a period of disruptive change for the traditional media, newspapers see their future as lying in such conversations with and between readers, and new technologies to support these conversations are becoming essential. In this scenario there are a number of potential users: news readers and the originating journalist want to gain a structured overview of the mass of comments, in terms of the sub-topics they address and the opinions (polarity and strength) the commenters hold about these topics; editors or media analysts may need a more widely scoping analysis.

At present none of these users can effectively exploit the mass of comment data -- frequently hundreds of comments per article -- as there are no tools to support them in doing so. What they need is new tools to help them make sense of the large volumes of comment data. Of particular potential here are NLP technologies such as: automatic summarisation, topic detection and clustering, sentiment detection and conversation/dialogue analysis.

The SENSEI Project is a recently completed EU-funded research project in which Sheffield was a partner. SENSEI's goal was to develop new technologies to assist in making sense of large volumes of conversational data, such as reader comments in on-line news, as described above. To do this we developed tools to group reader comments into topically related clusters and to generate relevant labels for this clusters so that a graphical summary in the form of a pie chart could be created. Using reference data created by the SENSEI project, this student project will explore techiques to improve the clustering and cluster labelling approaches developed in SENSEI

Project Description

This project will begin by reviewing the language processing literature in the areas of text clustering and cluster labelling. One or more techniques for clustering and cluster labeling will be selected for exploration.

The second part of the project will involve the detailed design, implementation and evaluation of the methods selected the first part of the project.

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing, Natural Language Processing are useful.

Initial Reading and Useful Links


Background

An area of recent growing academic and commercial interest has been the automatic generation of image descriptions, i.e. generating natural language descriptions of what is going on an image. Being able to do this would be of benefit for applications like image retrieval (images could be indexed with potentially rich natural languages descriptions and then retrieved using standard search engines) or in robotics where a robot could describe what it is seeing to a human.

While computer vision analysis of image content is a key part of generating adequate natural language descriptions of images, many approaches to image description generation require models of what people say about visual scenes, what sort of objects tend co-occur in specific scene types etc. This kind of information needs to be gathered from text. To date most reseachers seeking visually descriptive language have worked with caption datasets, but this data is limited in several ways. First, captions may not describe the literal contents of the image (e.g. a caption like Another bad day for our hero) says little about what is going on the image. Secondly, captions may include information not in the picture ( Taken with my new Nikon D7200 at f3.6 on day 2 of the holiday). Thirdly is a relatively limited amount of caption data out there.

One way to address these problems is to seek to gather visually descriptive data from sources other than image captions. Novelists and to some extent journalists or travel writers off describe scenes for us. Can we gather such data automatically? To do this one could formulate a clear definition of visually descriptive language (VDL), then annotate a corpus according to this definition so that examples of visually descriptive language were marked up and finally use supervised learning on the annotated corpus to classify text sequences as visually descriptive or not. Fortunately the first two steps of this process have been done by researchers in Computer Science at Sheffield (see reading below). This project aims to address the third task.

Project Description

After reviewing relevant background literature, this project will begin by gathering a corpus of texts potentially rich in visually descriptive language, e.g. part or all of the literary texts held in Project Gutenberg. Then, a text classifier will be trained on the existing corpus of VDL-annotated texts: various machine learning algorithms and feature sets will be explored to develop the most accurate classifier possible. Once the classifier is developed is will be run across the corpus of texts acquired in the first stage of the project. If time permits potential applications of the automatically aquired VDL will be considered.

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing, Natural Language Processing are useful. The project will likely be carried out in Python using Scikit learn so knowledge of Python would be advantageous.

Initial Reading and Useful Links

-->
[next][prev][Back to top]

RJG-UG-3:   Generating Image Descriptions from Labelled Bounding Boxes

Suitable for: CS/SE/AI/CS+Maths

Student numbers: this project is for 1 student only

Background

The ever-growing volume of photographic images and videos on the web and in personal photographic collections motivates research into techniques to automatically recognize and annotate the content, i.e., subject matter of images, using either keywords or, ideally, fuller natural language descriptions. Such annotations will support better retrieval of images given a user query.

Advances in computer vision techniques mean that a wide range of visual features can now be extracted for use in image content analysis. These features can be used for object detection -- the task of identifying instances of objects in an image, labelling them with their class and localising them in the image by drawing a bounding box around them. The accuracy of object detection by modern vision processing systems has significantly advanced in the past few years. As a consequence, it is now worth thinking of how to select which objects to include in an image description and what to say about them in terms of their spatial relations and the activities or processes we can infer they may be involved in. This is the content selection part of the task of image description generation. The other part of the description task is choosing the linguistic form, i.e. the words and their order, to use to express the selected content.

The project will work in the framework of ImageCLEF, an on-going series of evaluation challenges run by the international information retrieval community. Starting in 2003, ImageCLEF has run an annual challenge consisting of one or more "tracks" defined tasks with supplied development and test data which participants must address. Participants design and train their systems on the development data and then submit results of running their system on the test data. The results are evaluated by the organizers and then published at a workshop which participants attend to present their approaches to the task(s).

Project Description

The aim of this project is to build a prototype system for the generation of image descriptions. ImageCLEF 2015 and 2016 have included, as part of the Image Annotation task, a subtask on the "Generation of Textual Descriptions of Images". There were two conditions in this subtask: (a) participants are given "clean" object detection results for images (i.e. images with correctly labelled bounding boxes around objects of specific classes) and from this must generate an image description, and (b) participants use the output of noisy visual recognizers in generating an image description. The quality of the resulting image descriptions is then assessed against image descriptions generated by humans for the same set of images. This project will address the "clean" condition in the image description generation subtask and use ImageCLEF evaluation measures and data to assess results.

Prerequisites

Interest in language and/or text processing and in image analysis. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-4:   Size Matters: Acquiring Vague Spatial Size Information from Textual Sources

Suitable for: CS/SE/AI/CS+Maths

Background

How big is an adult elephant? or a toaster? We may not know exactly, but we have some ideas: an adult elephant is more than a meter tall and less than 10 meters; a toaster is more than 10cm tall and less than a meter. In particular we have lots of knowledge about the relative sizes of objects: a toothbrush is smaller than a toaster; a toaster is smaller than a refrigerator.

We use this sort of "commonsense" knowledge all the time in reasoning about the world: we can bring a toaster home in our car, but not elephant. An autonmous robot moving about in the world and interacting with objects would need lots of this kind of knowledge. Moreoever, we appear to use knowledge about typical object sizes, along with other knowledge, in interpreting visual scenes, especially in 2D images, where depth must be inferred by the observer. For example, if, when viewing an image, our knowledge about the relative sizes of cars and elephants is violated under the assumption that they are in the same depth plane, then we re-interpret the image so that the car or elephant moves forward or backward relative to the other, so that the assumption is no longer violated. Thus, information about relative or absolute object size is useful in computer image interpretation. It is also useful in image generation: if I want to generate a realistic scene containing cars and elephants then I must respect their relative size constraints. Various computer graphics applications could exploit such information.

Manually coding this kind of information is a tedious and potentially never ending task, as new types of objects are constantly appearing. Is there a way of automating the acquisition of such information? The hypothesis of this project is that there is: we mine this information from textual sources on the web that make claims about the absolute or relative sizes of objects.

Project Description

The project will explore ways of mining information about the absolute and relative size of objects from web sources, such as Wikipedia. The general task of acquiring structured information from free or unstructured text is called text mining or information extraction and is a well-studied application in the broader area of Natural Language Processing (NLP).

The project will begin by reviewing the general area of information extraction with NLP, with specific attention to tasks like named entity extraction (which has been used, e.g. to identify entities like persons, locations and organisation as well as times, dates and monetary amounts and could be adapted to identify object types and lengths) and relation extraction (which has been used to recognise relations between entities, such works-for, and attributes of entities, such as has-location, and could be adapted to recognise the size-of attribute).

Information extraction systems in general are either rule-based, where the system relies on manually-authored patterns to recognise entities and relations, or supervised learning-based where the system relies on learning patterns from manually annotated examples. Following initial reading and analysis of the data, a decision will be made about which approach to take.

In addition to identifying absolute size information (e.g. the Wikipedia article on "apple" says: "Commercial growers aim to produce an apple that is 7.0 to 8.3 cm (2.75 to 3.25 in) in diameter, due to market preference.") the project will also investigate how to acquire relative size information. For example, sentences indicating the topological relation inside ("You can help apples keep fresh longer by storing them in your pantry or refrigerator drawer") allow us to extract the information that apples are smaller than refrigerators. By building up a set of facts expressing relative sizes, together with absolute size information, we can infer upper and lower bounds on the size of various objects. Of course, differring, even conflicting, information may be acquired from different sources. Some means of merging/reconciling this information will need to be devised.

The final element of the project will be to couple the acquired facts to a simple inference engine that will allow is to infer "best estimate" bounds on the size of objects from whatever facts we have acquired. E.g. if all we know is that apples fit inside refrigerators and that refrigerators are no more than 3m tall, then need to be able to infer that apples are less than 3m tall.

Of course, in addition to implementing the above capabilities the project will need to investigate ways of evaluating the various outputs. For example, the accuracy of identifying size information ("15 cm", "5-6 feet"), size-of information ( size-of(apple,7.0-8.3cm)) and relative size information ("apple <= refrigerator") needs to be assessed, as does the correctness of the inference procedure.

Prerequisites

Interest in language and/or text processing and in image analysis. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-5:   Machine Learning Approaches to Coreference Resolution in Conversational Data

Suitable for: CS/SE/AI/CS+Maths

Background

Anaphora or anaphors are natural language expressions which "carry one back" to something introduced earlier in a discourse or conversation. For example, a news article might begin by introducing David Cameron and later refer to him as "the Prime Minister" or "he". In this example, "the Prime Minister" and "he" are anaphors and "David Cameron" is called the antecedent. Anaphors are said to corefer with their antecedents, since they refer to the same things in the "real" (i.e. non-textual) world. Determining which expressions corefer, the problem of coreference resolution, is an important part of understanding the meaning of a text. Easy for humans, it is a significant challenge for computers.

The task of building coreference resolution algorithms has been studied in Natural Language Processing (NLP) for some time. To develop systems and assess their performance the coreference community has built annotated training and test datasets that contain manually annotated coreference links between coreferring phrases. The best known recent datasets are those produced for the CONLL 2011 and CONLL 2012 Shared Task Challenges, which were devoted to coreference resolution. Not only do such datasets allow competing algorithms to be compared by comparing their scores on the same data, they also permit the development of algorithms based on supervised machine learning techniques that learn from annotated examples.

Different text genres (e.g. news articles, fiction books, academic journal papers, transcribed TV programs) exhibit different linguistic features, and in particular different patterns of coreference. While the CONNL 2011 dataset contains a mix of genres, including some conversational data (transcribed telephone conversations), it does not contain and data drawn from the sort of social media conversations one finds in, e.g., reader comments on on-line news (see the BBC or The Guardian or The Independent websites for examples). This genre of text is being studied in the SENSEI Project, an EU-funded research project in which Sheffield is a partner. One of SENSEI's aims is to produce summaries of reader comment sets consisting of potentially hundreds of comments. Detecting coreferences within this data is likely to be critical to generating good summaries.

Project Description

This project has two clear objectives. The first is to develop and assess a general supervised learning-based coreference resolution algorithm using the CONLL 2011 and 2012 Shared Task Challenges and datasets as a basis. The second is to explore how well the developed algorithm performs on reader comment data. Some data for evaluating coreference in reader comments is being developed by one of the SENSEI partners and should be available for use by the project. Some additional data may need to be annotated for evaluation.

The project will begin by reading background and related work. Several freely available coreference systems, specifically the Stanford and BART coreference systems, will be investigated and run on the CONLL data. A system design, based on a supervised learning approach and possibly an extension of one of the freely available systems, will be selected and a system developed and evaluated on the CONLL data. The system will then be tested on the SENSEI reader comment data, the results analysed and improvements proposed and tested.

Prerequisites

Interest in language and/or text processing and in image analysis. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful.

Initial Reading and Useful Links


[Back to top]

[next][prev][Back to top]

RJG-UG-6:   Communicating with a Mobile Virtual Robot via Spoken Natural Language Dialogue

Suitable for: CS/SE/AI/CS+Maths

Background

Robotics is a field of huge intellectual challenge with immense practical potential, increasingly attracting the attention of industry, government as well as academia. As robots begin to appear more and more in daily life, a central issue will be communication between humans and robots. One approach to communicating with robots that has huge potential is spoken natural language dialogue, especially in humn-robot co-working environments.

Researching spoken dialogue systems (SDS) for human-robot co-working is a complex endeavour. One important prerequisite is that we have available a lab-based setup where humans can interact with virtual robots, which are mucher easier and cheaper to work with than real robots, so that dialogue researchers can develop and test computational models of dialogue. One widely-used platform for developing robot software is ROS -- Robot Operation System. It, together with Gazebo (a simulation package for robots that allows visualization of a robot interacting with a virtual environment), form an ideal environment for developing and demonstrating robot behaviours that can later be straightforwardly moved onto real physical robots. In particular it could form a very powerful setting for human-robot spoken dialogue research. However, at present there are no spoken language recognition or generation tools integrated into the ROS platform nor has there been much experimentation with spoken language dialogue in virtual robot environments.

A previous Y3 project has developed software within the ROS framework that allows a virtual robot to localise itself and navigate in a virtual world. Thus tools to build a virtual world, create a map of it and allow a robot to navigate within it are available. Efforts to secure funding to integrate speech recognition and generation capabilities, using existing open source libraries, within ROS are underway and progress to achieve this may have been made by the time this project starts.

Project Description

The aims of this project are two-fold:
  1. To complete the integration of speech recogniton and generation components within ROS, depending on their state of integration at the time the project commences, and to create a ROS package for speech recognition and genertion that others could use in the future;
  2. To develop a dialogue manager that will allow a user to use spoken language to direct a mobile robot around a simple virtual world where the robot may interact with the user to answer questions about whether it can sense something or to ask questions to clarify ambiguous commands. This will provide an experimental setting to explore the research challenges in getting humans and robots to establish, via spoken dialogue, common points of reference in the shared virtual world, getting the robot to interpret spatial instructions, etc.

At the end of the project the student should have a solid understanding of the basics of developing software on the ROS platform, of using speech recognition and generation libraries and of the challenges involved in developing dialogue systems.

Prerequisites

Interest in robotics and in spoken natural language dialogue. The client libraries will be using in ROS are Python-based and geared towards Unix/Linux, so student should be comfortable with this.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-7:   Temporal Information Extraction from Clinical Text

Suitable for: CS/SE/AI/CS+Maths

Background

Automatically extracting information from clinical notes, such as patient discharge summaries, is a challenging and important task. Much information about patient conditions and treatment is available only in textual form in clinical notes. The ability to extract this automatically would enable a wide range of applications in areas such as tracking disease status, monitoring treatment outcomes and complications and discovering medication side effects (Sun et al., 2013). Clinical events (diagnosis events, scan/test events, treatment events) and their temporal ordering and duration are essential information that any information extraction system working in this domain must be able to identify. The importance of this task was recognised in the 2012 I2B2 Shared Task Challenge (Sun et al., 2013), which defined a set of tasks relevant to temporal information extraction in clinical domains and supplied a corpus of annotated discharge summaries to facilitate research in the area. 18 systems from around the world took part to see how well their systems could perform. More recently there have been annual challenges on extracting temporal information from clinical notes and pathology reports for cancer patients at the Mayo Clinic -- the Clinical TempEval challenges 2015, 2016 and 2017 (see link below).

Project Description

The aim of this project is to design, implement and evaluate a temporal relation extraction system to carry out the I2B2 Temporal relation extraction tasks or the Clinical TempEval Challenge. This will involve reviewing and assessing existing approaches, choosing and implementing an approach (likely a supervised machine learning based approach that will be trained on the training data supplied by the I2B2 organisers), and evaluating the results using the I2B2 test data. Results can be compared against the best performances obtained in the original 2012 or Clinical TempEval challenges.

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful. Python is likely to be the language used to implement the system.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-8:   Building an Information Seeking Dialogue Agent

Suitable for: CS/SE/AI/CS+Maths

Background

Given recent advances in speech recognition, there is increasing interest in voice-driven information provision systems. Siri, Cortana, Amazon's Alexa and Google's Assistant are all examples of this. The architecture of such systems typically consists of speech recognition and speech generation components as initial and final components and then in between components to analyse the user input (the natural language understanding or NLU component), manage the dialogue between the user and the system (the dialogue manager), carry out any information seeking task required to answer a user query (task manager) and formulate a natural language response to the user (natural language generation or NLG component).

Any information provision system must draw on some source of information to supply answers to user queries. This could be the Web, but the Web is huge and unstructured and any attempt to answer questions using the Web must either content itself with a search engine-like approach in which whole documents are retrieved (not suitbale for speaking an answer to a user) or address the problems of questions answering or information extraction from arbitrary natural language. Another possibility is to use a large, multi-domain structured or semi-structured knowledge base. One such knowledge base is Google's Knowledge Graph; another is Wikidata.

Project Description

The aim of this project is to design, implement and evaluate an inforomation seeking dialogue agent that aims to carry out a fluent dialogue with a user to answer user questions using Google's Knowledge Graph or Wikidata as an information source. The agent will be text-based, i.e. it will not include speech recognition and generation components, though these could in principle be straightforwardly added, to constrain the scope of the project. The system should be able to carry out naturalistic dialogues on topics covered by Knowlege Graph (e.g Who played XX in the recent TV series YY? ... What other TV series or movies has he starred in? ). The project will begin by reviewing relevant literature on natural language front ends to databases and dialogue management and by exploring the Knowledge Graph API. Then an appropriate design will be chosen and implemented and the resulting system evaluated.

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful. Python is likely to be the language used to implement the system.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-9:   Building a Commonsense Reasoner

Suitable for: CS/SE/AI/CS+Maths

Background

Commonsense reasoning is that area of Artifical Intelligence (AI) concerned with automating the knowledge and the inference abilities that enable humans to operate in the everyday world: participating in events (e.g. posting a parcel) that involve objects (e.g. boxes, tape, scissors, stamps), preconditions (e.g. box must be large enough to hold the parcel), effects (e.g. parcel gets transported to target destination) and other people (e.g. postal clerk) with their own mental states and beliefs (e.g. parcel requires adequate postage to be accepted), all of which take place in a temporal context (e.g. post office is open 09:00-17:30). Commonsense reasoning is important for a whole range of AI applications, including robotics (robot must reason about the world in order to accomplish tasks) and natural language understanding (shared commonsense background knowledge and reasoning abilities are needed to interpret texts and are assumed by participants in any dialogue). The study of commonsense reasoning and attempts to build commonsense reasoners have been activities at the core of AI research since the inception of the field in the 1950s.

Project Description

The objective of this project is to develop a commonsense reasoner (CR) that will handle a subset of the core phenomena that a CR needs to address, such as: events (their preconditions and effects), space, time and the mental states of agents. The project will involve:

Prerequisites

Interest in knowledge representation, logic and automated reasoning. No mandatory module prerequisities.

Initial Reading and Useful Links


[Back to top]

[next][prev][Back to top]

RJG-UG-10:   Natural Language Analysis in Scenario Based Training Systems

Suitable for: CS/SE/AI/CS+Maths

Note: This project offers the chance to work with Certus Technology Associates, a company that "designs, builds, delivers and supports adaptable software and services, to assist higher education establishments, life-science, biotech, healthcare and private sector organisations in the management of complex data and processes." Certus is interested in exploring the potential natural language processing (NLP) has to improve their software products and services. Any student taking this project will have the chance to meet with Certus personnel to refine requirements, acquire sample data and to get feedback on their work.

Background

Certus has developed a training and competence system where a user interacts with a virtual environment to complete a task. On request, the system generates a complex, randomised training scenario based upon a constrained domain model and a set of probability functions. The user interacts with this new training scenario and, on completion, their interaction is immediately evaluated and their performance scored.

For example, in one application the system is used to create a scenario of a biomedical scientist working in a blood transfusion laboratory. In the virtual laboratory, a sample reception in-tray contains a patient blood sample and a referral card requesting the blood be tested and 2 units of matched plasma be made available for surgery planned for later in the week.

The user of the training system first reviews the referral and then moves between scenes in the virtual lab in the normal way to complete the request. This includes putting the blood through an analyser and interpreting results, evaluating blood type and antibody content and selecting appropriate blood products from a stock fridge.

In the blood transfusion training system, there are a number of scenes to which the user can navigate. These mimic the real working environment. They include a telephone, an email programme and a laboratory log book. For such scenes, free text is entered by the user. For example, if the user finds that the patient name printed on the referral card does not match that written on the blood sample tube, they should send a message to the ward requesting a new patient sample. They should provide a reason.

Project Description

The objective of this project is to investigate potential NLP approaches to extracting enough semantics from free text entries entered in a scenario participation to enable evaluation of the appropriateness of an action. Such evaluation is to be based upon the text entered by the user, information about the scenario, and a set of evaluation rules.

Scenarios involving free text analysis include:

The project will involve:

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful. Python is likely to be the language used to implement the system.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-11:   Automated Evaluation of Pathology Lab Reports in External Quality Assessment

Suitable for: CS/SE/AI/CS+Maths

Note: This project offers the chance to work with Certus Technology Associates, a company that "designs, builds, delivers and supports adaptable software and services, to assist higher education establishments, life-science, biotech, healthcare and private sector organisations in the management of complex data and processes." Certus is interested in exploring the potential natural language processing (NLP) has to improve their software products and services. Any student taking this project will have the chance to meet with Certus personnel to refine requirements, acquire sample data and to get feedback on their work.

Background

Pathology laboratories are generally subjected to external quality assessment (EQA). This assessment is undertaken by an independent organisation (an EQA provider). In general, a laboratory is sent a sample by the EQA provider. The laboratory undertakes testing in the normal way and prepares results which are returned to the provider.

Once all the results are in, the EQA provider evaluates all the results and often prepares statistical conclusions about laboratory performance. In some situations, laboratories must provide a written report regarding their findings. In such cases, a team of assessors evaluates the written reports.

Project Description

The objective of this project is to develop an NLP-based system to evaluate the content of the written reports provided by a laboratory as part of EQA. A set of example reports is available. An evaluation schema exists that describes the necessary content of the report. The expected details of the report content are known to the EQA provider as the sample provided was previously analysed. The project will involve:

Prerequisites

Interest in language and/or text processing. No mandatory module prerequisities, but any or all of Machine Learning, Text Processing and Natural Language Processing are useful. Python is likely to be the language used to implement the system.

Initial Reading and Useful Links


[next][prev][Back to top]

RJG-UG-12:   Building an Alexa Skill for DCS

Suitable for: CS/SE/AI/CS+Maths

Background

In a move to open the Amazon Alexa platform to third party developers -- as Apple and Google have opened their mobile device platforms to App developers -- Amazon has created the concept of a "Skill" (like an App) and a development kit to support independent developers in building skills. Amazon says:
"Alexa provides a set of built-in capabilities, referred to as skills. For example, Alexa's abilities include playing music from multiple providers, answering questions, providing weather forecasts, and querying Wikipedia.

The Alexa Skills Kit lets you teach Alexa new skills. Customers can access these new abilities by asking Alexa questions or making requests. You can build skills that provide users with many different types of abilities. For example, a skill might do any one of the following:

(from Build Skills with the Alexa Skills Kit)
Just how popular and widespread skills will become is not clear yet. But clearly spoken language conversational interfaces are only going to become more widely used in the coming years. For any organisation wanting to get information about itself across to its "customers" a dedicated skill provides a new channel by which to do so. As a Computer Science Department with leading Speech and Language Processing research groups, it seems obvious that DCS at Sheffield should promote ourselves in this way.

Project Description

This aim of this project is to build an Alexa Skill that will let interested persons converse with Amazon's Alexa to ask questions and get answers about the Department of Computer Science, University of Sheffield. For example, prospective students might want to know about courses offerred; current students might want to know about lecture locations, and so on.

The project will involve:

Prerequisites

Interest in speech and language processing. No mandatory module prerequisities, but any or all of Machine Learning, Speech Text Processing and Natural Language Processing are useful.

Initial Reading and Useful Links


Last modified May 14 2018 by Rob Gaizauskas