MSc projects 2016 - 17

Mail all XHe15@sheffield.ac.uk YLi195@sheffield.ac.uk zpu1@sheffield.ac.uk YTang19@sheffield.ac.uk

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.


JPB-MSc-1: Lip reading for song transcription (ACS or CS+SLP)

Description

This project will extend an MSc project that ran last year that called “Automatic Speech Recognition in Music”. In last year’s project a student built a speech recognition system for singing. The student did this by collecting and transcribing a large database of guitar-accompanied songs that musicians had uploaded to YouTube. The project used the audio part of this database but has not done anything with the video component. This year’s project will use face and lip tracking tools to extract information from the video that can be used to improve the speech recognition system. Two tasks will be considered, i/ using the lip movements to determine the points at which the singer starts and stops singing, i.e. the start and end of each sung phrase of the lyrics; ii/ using the lip movement to improve the performance of the speech recognition system that was built last year, i.e. lip-reading for singing.

If you are interested in this project please email me and I will send a copy of the dissertation from last year’s project which will provide useful background.

Background reading

The following references provide some background to the field of automatic lip reading

  • LipNet - AI takes lip reading into the future http://www.cs.ox.ac.uk/news/1217-full.html
  • A review of automatic lip reading http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=1230212
  • Python OpenCV computer vision library, https://opencv-python-tutroals.readthedocs.org/en/latest/

For more information mail me

[TOP]


JPB-MSc-2: Lip reading for audio speech enhancement (ACS or CS+SLP)

Description

Speech can be hard to understand when there is a lot of background noise present. There are many well-established signal processing techniques for removing noise from speech signals, however, most of these techniques fail to make the speech any more intelligible - they just make it sound less noisy. This project will investigate an exciting new audio-visual strategy that starts with a noisy video recording of the speaker. The system will then use computer vision techniques to extract speech information from the pattern of the speaker’s lip movements. This information will then be used to improve the speech audio signal. (This isn’t a far-fetched idea, it is something that you and I do naturally when listening to speech!)

The project breaks into several components any one of which could be the main focus: i/ image processing for visual feature extraction from video; ii/ machine learning for visual to acoustic feature mapping; iii/ testing new algorithms for speech signal processing. It will also require software skills for building tools and demonstration systems, and it will provide experience of evaluating speech signals by running controlled listening experiments.

The project will be running in parallel to UK Research Council funded collaboration between Sheffield, University of Stirling, the Institute of Hearing Research and Phonak that is developing camera-equipped hearing aids.

Background reading

The following references provide some background to the field.

For more information mail me.

[TOP]


JPB-MSc-3: Acoustic event detection challenge (ACS or CS+SLP)

Description

Consider the problem of listening an audio file and trying to detect the occurrence of a specific sound event within it. For example, the task might be to detect all occurrence of doors slamming, or of people laughing, or of telephones ringing. This task is known as acoustic event detection. Humans are incredibly good at this but it is extremely hard to produce automatic systems can come close to Human levels of performance. However, solutions to this problem would be incredible useful in a huge range of applications.

This project will attempt to build an acoustic event detector for a range of commonly occurring sounds following the specification of the recent D-CASE acoustic event detection challenge.

For more information mail me.

[TOP]


JPB-MSc-4: Web-based collaborative audio annotation tool (SSIT or ASE)

Description

Automatic speech recognisers, which convert audio speech recordings into text, are now used in many applications (e.g. Siri, Google voice search etc). There are two types of system: those that require the user to push a button before talking, and those which leave the microphones turned on all the time and then try to automatically detect when the user has started speaking, i.e. using Voice Activity Detection (VAD). There are many approaches to VAD but most of them perform quite poorly when there is a lot of background noise. This project will be implementing existing VAD algorithms and evaluating them on a new set of speech recordings that have recently been collected at Sheffield, CHiME-3. The CHiME-3 data has been captured using a recording device that has multiple microphones, this provides an opportunity to improve on convention single-microphone VAD algorithm by using multi-microphone signal processing techniques (e.g. beam forming). So it is expected that the project will not only test existing VAD algorithm but will also develop novel multi-microphone extensions. If successful, the project could be written up as a research paper for submission to ICASSP 2017.

For more information mail me.

[TOP]


JPB-MSc-5: Web-based collaborative audio annotation tool (SSIT or ASE)

Description

There is a large research field concerned with the analysis of everyday audio recordings. Within this field a major sub-task is that of automatically detecting everyday sounds — Audio Event Detection and Classification (AEDC). To train AEDC systems it is necessary to have audio recordings that have been labelled by human listeners, i.e. humans have taken the recording and marked the start time and end time of each sound — this process is known as `annotation’. Audio annotation is a time-consuming and potentially expensive process and good tools are essential. This project aims to explore the possibility of building a web-based annotation tool that would enable many people to collaborate on the annotation of an audio file that has been uploaded to the Internet. The project will need to use a suitable client-server framework to allow annotators to collaborate. It will also need to build on existing web-base UI tools to develop an interface that allows clients to easily preview the audio signals and precisely label the start and end points of key audio events.

For more information mail me.

[TOP]