MSc projects 2017 - 18

  • JPB-MSc-1: Distant microphone speech processing for CHiME-5 (CS+SLP or ACS) (Xin Zhang)
  • JPB-MSc-2: Distant microphone speech processing for CHiME-5 (CS+SLP or ACS)
  • JPB-MSc-3: Android/IOS app for audio-visual speech collection (CS+SLP or ASE) (Xie Wang)
  • JPB-MSc-4: Lip reading for audio speech enhancement (CS+SLP or ACS) (Sifan Wu)
  • JPB-MSc-5: Developing speech recognition for the MIRo robot (CS+SLP or ACS) (Xin Sun)
  • JPB-MSc-6: Eye tracking software for audio-visual speech perception research (CS+SLP or ACS)

Mail all XZhang156@sheffield.ac.uk XWang153@sheffield.ac.uk SWu34@sheffield.ac.uk XSun41@sheffield.ac.uk

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.


JPB-MSc-1: Distant microphone speech processing for CHiME-5 (CS+SLP or ACS)


JPB-MSc-2: Distant microphone speech processing for CHiME-5 (CS+SLP or ACS)

Description

The student on this project will contribute to the development of a distant microphone speech recognition system (e.g. similar to Amazon Alexa or Google Home). The project will be part of a larger project being conducted in collaboration with Toshiba Research Labs. The system will be designed for recognising conversational speech using audio captured in people’s homes. The project will be using CHiME-5, a brand new conversational speech dataset made up of recordings of real dinner parties recorded in [people’s home] (http://spandh.dcs.shef.ac.uk/chime_challenge/). The data is currently being recorded by Sheffield in collaboration with Google and others and will be released in January.

The CHiME-5 dataset will be released with a baseline speech recognition system. The MSc project will aim to improve this baseline system by working on one component. There are multiple sub-tasks that may form the focus of this project: deep-learning for acoustic or language modelling; multiple microphone speech enhancement; speech source separation for overlapping speech; training data simulation and augmentation. The project is ideally suited to students on the Computer Science with Speech and Language processing MSc, but a student on the ACS MSc with an interest in machine learning could also be suitable.

The project is suitable for up to two students. Each student would be working with the same data and baseline software framework, but will focus on a different aspect of the the system.

If you are interested in this project please make an appointment to see me.

Background reading

  • [Distant microphone speech recognition reference] (http://spandh.dcs.shef.ac.uk/projects/chime/)
  • The CHiME-5 dataset
  • Further information may appear on my website project pages. http://staffwww.dcs.shef.ac.uk/people/J.Barker//project-year/pgt-2017.html
[TOP]


JPB-MSc-3: Android/IOS app for audio-visual speech collection (CS+SLP or ASE)

Description

When talking in noisy environments (e.g., cafes, factories) people naturally modify the way that they speak to help them to be better understood. This alteration to the normal speaking style is known as the Lombard effect and it can be observed both from changes to the sound of the speech and from video of the speaker’s lip movements. Most studies of Lombard speech have induced it in laboratory conditions by asking speakers to talk while wearing headphones playing noise. There is very little data of Lombard speech recorded in real everyday noisy conditions.

We would like to build a tool that will make it easy to collect speech recordings in everyday settings. The tool would run on an Android tablet or on an iPad and would be designed to capture both speech audio and video while people are prompted with text to read or with questions to answer. The project will build this app and then demonstrate its effectiveness by using it to make a collection of audio visual speech recordings.

If successful the app will be used to collect data for an ongoing EPSRC-funded research project being conducted in collaboration with the University of Stirling, and the hearing aid manufacturer, Sonova Ltd.

If you are interested in this project please make an appointment to see me.

Prerequisites

  • The project will require some experience with mobile app development either on Android devices or in iOS.

Background reading

  • The Lombard effect
  • Further information may appear on my website project pages. http://staffwww.dcs.shef.ac.uk/people/J.Barker//project-year/pgt-2017.html
[TOP]


JPB-MSc-4: Lip reading for audio speech enhancement (CS+SLP or ACS)

Description

Speech can be hard to understand when there is a lot of background noise present. There are many well-established signal processing techniques for removing noise from speech signals, however, most of these techniques fail to make the speech any more intelligible - they just make it sound less noisy. This project will investigate an exciting new audio-visual strategy that starts with a noisy video recording of the speaker. The system will then use computer vision techniques to extract speech information from the pattern of the speaker’s lip movements. This information will then be used to improve the speech audio signal. (This isn’t a far-fetched idea, it is something that you and I do naturally when listening to speech!)

The project breaks into several components any one of which could be the main focus: i/ image processing for visual feature extraction from video; ii/ machine learning for visual to acoustic feature mapping; iii/ testing new algorithms for speech signal processing. It will also require software skills for building tools and demonstration systems, and it will provide experience of evaluating speech signals by running controlled listening experiments.

The project will be running in parallel to UK Research Council funded collaboration between Sheffield, University of Stirling, the Institute of Hearing Research and Phonak that is developing camera-equipped hearing aids.

Background reading

The following references provide some background to the field.

[TOP]


JPB-MSc-5: Developing speech recognition for the MIRo robot (CS+SLP or ACS)

Description

The MIRo robot is a low cost bio-mimetic robot designed as a ‘companion robot’. The robot is based on a pet dog and has a pair of microphones mounted inside moveable ears. We have been working with an especially adapted 8-microphone version of the robot for which we will be developing machine listening algorithms.

The aim of this project is to build an automatic speech recognition system for the MIRo robot. This will involve three main steps: i) processing existing speech training data to simulate the effect of it being recorded with the MIRo robot; ii) using the Kaldi speech recognition toolkit to train a speech recognition system adapted to the MIRo robot; iii) evaluating the system using speech data captured by the robot. The project may also investigate the use of microphone ‘beamforming’ techniques to build a recogniser that can work well in noisy environments (e.g., when the robot is moving an generating motor noise).

The project will make use of measurement that have already been made of the robot motor noise and microphone behaviour.

Prerequisites

  • An interest in automatic speech recognition
  • Python programming skills

Background reading

For further details see here,

[TOP]


JPB-MSc-6: Eye tracking software for audio-visual speech perception research (CS+SLP or ACS)

Description

Eye tracking is used in psychology experiments as a way of monitoring a subject’s attention. In the Speech and Hearing research group we are interested in learning how normal hearing and hearing impaired listeners use their eyes to capture `visual speech cues’ (e.g., lip movements) that help them understand speech. This can be achieved by tracking a users gaze direction while they watch videos of speech presented on a monitor.

The Department has access to a state-of-the-art wearable eye-tracking device, the Tobii Pro Glasses 2 that could potentially be used for these experiments. However wearable eye-trackers allow gaze tracking to be tracked relative to the wearer’s head orientation. If used in a screen-based experiment, they do not directly tell you whereabouts on the screen the user is looking. However, this problem can be solved with some additional computer vision software.

This project would develop software that would allow the Tobii Pro Glasses to be used for screen-based experiments. This is essentially a video processing task that can be solved using computer vision techniques available in the OpenCV toolkit. The project would then demonstrate the effectiveness of the software by using it in some audio-visual speech perception experiments that we have planned.

If you are interested then please make an appointment to see me so that I can explain the problem in more detail.

Requirements

  • An interest in Computer Vision
  • Some Python programming experience

Background Reading

  • Tobii Glasses
  • Python OpenCV
  • Further information may appear on my website project pages. http://staffwww.dcs.shef.ac.uk/people/J.Barker//project-year/pgt-2017.html
[TOP]