MSc projects 2015 - 16

  • JPB-MSc-1: Web-based collaborative audio annotation tool (SSIT or ASE)
  • JPB-MSc-2: Lip-reading for audio speech enhancement (ACS or CS+SLP)
  • JPB-MSc-3: Classification Challenge Web-host (SSIT)
  • JPB-MSc-4: Acoustic event detection challenge (ACS or CS+SLP)
  • JPB-MSc-5: Multichannel speech activity detection (ACS or CS+SLP)

Mail all

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.


JPB-MSc-1: Web-based collaborative audio annotation tool (SSIT or ASE)

Description

There is a large research field concerned with the analysis of everyday audio recordings. Within this field a major sub-task is that of automatically detecting everyday sounds — Audio Event Detection and Classification (AEDC). To train AEDC systems it is necessary to have audio recordings that have been labelled by human listeners, i.e. humans have taken the recording and marked the start time and end time of each sound — this process is known as `annotation’. Audio annotation is a time-consuming and potentially expensive process and good tools are essential. This project aims to explore the possibility of building a web-based annotation tool that would enable many people to collaborate on the annotation of an audio file that has been uploaded to the Internet. The project will need to use a suitable client-server framework to allow annotators to collaborate. It will also need to build on existing web-base UI tools to develop an interface that allows clients to easily preview the audio signals and precisely label the start and end points of key audio events.

[TOP]


JPB-MSc-2: Lip-reading for audio speech enhancement (ACS or CS+SLP)

Description

Speech can be hard to understand when there is a lot of background noise present. There are many well-established signal processing techniques for removing noise from speech signals, however, most of these techniques fail to make the speech any more intelligible - they just make it sound less noisy. This project will investigate an exciting new audio-visual strategy that starts with a noisy video recording of the speaker. The system will then use computer vision techniques to extract speech information from the pattern of the speaker’s lip movements. This information will then be used to improve the speech audio signal. (This isn’t a far-fetched idea, it is something that you and I do naturally when listening to speech!)

The project breaks into several components any one of which could be the main focus: i/ image processing for visual feature extraction from video; ii/ machine learning for visual to acoustic feature mapping; iii/ testing new algorithms for speech signal processing. It will also require software skills for building tools and demonstration systems, and it will provide experience of evaluating speech signals by running controlled listening experiments.

The project will be running in parallel to UK Research Council funded collaboration between Sheffield, University of Stirling, the Institute of Hearing Research and Phonak that is developing camera-equipped hearing aids.

The following references provide some background to the field.

[TOP]


JPB-MSc-3: Classification Challenge Web-host (SSIT)

Description

In computer science research, public competitions are often used as a way of comparing algorithms that are being developed by competing groups. Such competitions have been used in many machine learning fields including speech recognition (e.g. the CHiME challenge), computer vision and text processing. In these competitions, participants either evaluate their systems themselves and then submit results – however, this is open to cheating (!) – or, competitors submit their system outputs to the organisers for remote evaluation – however, this can create a lot of work for the organisers.

This project would aim to build a competition hosting service that would be designed to reduce the work involved in running a machine learning competition. The service would provide organisers with an easy mechanism for setting up a competition. Once set up the service would allow competitors to register; it would provide them with access to the data that they need to design and train their systems; and it would provide some form of submission and evaluation service that allows teams to be evaluated. It could also provide extra features like online results tables to show which teams are doing the best etc.

[TOP]


JPB-MSc-4: Acoustic event detection challenge (ACS or CS+SLP)

Description

Consider the problem of listening an audio file and trying to detect the occurrence of a specific sound event within it. For example, the task might be to detect all occurrence of doors slamming, or of people laughing, or of telephones ringing. This task is known as acoustic event detection. Humans are incredibly good at this but it is extremely hard to produce automatic systems can come close to Human levels of performance. However, solutions to this problem would be incredible useful in a huge range of applications.

This project will attempt to build an acoustic event detector for a range of commonly occurring sounds following the specification of the recent D-CASE acoustic event detection challenge.

[TOP]


JPB-MSc-5: Multichannel speech activity detection (ACS or CS+SLP)

Description

Automatic speech recognisers, which convert audio speech recordings into text, are now used in many applications (e.g. Siri, Google voice search etc). There are two types of system: those that require the user to push a button before talking, and those which leave the microphones turned on all the time and then try to automatically detect when the user has started speaking, i.e. using Voice Activity Detection (VAD). There are many approaches to VAD but most of them perform quite poorly when there is a lot of background noise. This project will be implementing existing VAD algorithms and evaluating them on a new set of speech recordings that have recently been collected at Sheffield, CHiME-3. The CHiME-3 data has been captured using a recording device that has multiple microphones, this provides an opportunity to improve on convention single-microphone VAD algorithm by using multi-microphone signal processing techniques (e.g. beam forming). So it is expected that the project will not only test existing VAD algorithm but will also develop novel multi-microphone extensions. If successful, the project could be written up as a research paper for submission to ICASSP 2017.

[TOP]