Pgt 2008

JPB-MSc-1: Scrabble Referee (Nimil Christopher)
JPB-MSc-2: Acoustic Guitar Hero
JPB-MSc-3: Rapid Adaptation of Visual Speech Models
JPB-MSc-4: Tool for Conducting Audio-Visual Speech Perception Experiments (Doranala Praveen)
JPB-MSc-5: Java Applet for Collecting a Huge Lip Image Database (Zhen Hua Dai)

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.

JPB-MSc-1: Scrabble Referee

Description

Scrabble is a word game in which two players take it in turn to place letter tiles on a board to produce words. The game uses a scoring system that is based on the value of the letter tiles and the position they are placed. The game often results in dispute about whether scores have been correctly calculated, or whether words are spelt correctly.

The idea of this fun but challenging project would be to create a Scrabble `referee’. This would be a computer program that automatically checks the validity of each word and calculates the score. The key feature of the system is that it will use a webcam to passively watch the game and possibly a speech synthesizer to report the scores.

The project will involve using computer vision techniques to monitor the appearance of the game board, work out when a move has been played, and to read the words off the board using optical character recognition. The problem can be broken down into a number of stages each of which can be tackled using surprisingly simple strategies and some existing software libraries. The challenge of the project will be to put the pieces together into a system that works sufficiently reliably to be worth using.

Requirements

This project is suitable for MSc students on either the ACS or ASE MSc programmes. No previous computer vision experience required, but good Java skills or some C/C++ programming experience are essential. C/C++ programmers will be able to use the OpenCV and GOCR libraries that will make development easier.

Initial reading

Scrabble page on Wikipeida - here
Jain (1989) “Fundamentals of digital image processing” Prentice Hall - useful text book, can be found in the library.
OpenCV computer vision library - see here.
Bradski and Kaehler, (2008), “Learning OpenCV”, O’Reilly - details here.
GOCR - Open-source character recognition software here.

[TOP]

JPB-MSc-2: Acoustic Guitar Hero

Description

When learning to play an instrument it is usually very important to have a teacher who is able to tell you what you are doing wrong. Hearing your own mistakes is often difficult because so much attention is being spent on reading music and moving fingers. Further, without feedback, practise quickly becomes repetitious and dull. An interactive computer system is needed to make practise fun and to motivate the learner!

This project will build a system that uses acoustic analysis to perform the music teacher’s role. It will focus on fingerstyle acoustic guitar (because this is something I’ve been struggling to learn myself!). Fingerstyle (or finger-picking) guitar is a style of playing that uses the fingertips to pluck the strings and involves elements of country, blues and ragtime. Although it can involve strumming chords, most early exercises focus on learning to coordinate the movement of the right-hand fingers which sequentially play patterns of individual strings. The difficulty for the beginner is in developing an even rhythm and getting a `clean’ and balanced sound from each string.

Although general music transcription and analysis is a challenging research area, this project should be able to achieve good results for a number of reasons: i) the system will not be transcribing an unknown piece of music, but instead, it will be comparing a performance against a known musical score; ii) practise exercises have a simple structure; iii) the exercise do no necessarily have to contain chords; iv) there is only one instrument playing and there will be little or no background noise.

The basic system might just check, i) the timing of the notes and score them according to how regular they are, ii) the pitch of the notes to make sure that the player is plucking strings in the correct order. More advanced version could also look at the amplitude of the notes and the sound quality to make sure that the notes are been played evenly and cleanly. The output could be in the form of a score presented at the end of the exercise, or better, a running indicator showing the quality as time progresses. The output could even be presented graphically and embedded into a game that would help motivate younger learners.

Note, a good system would have real commercial appeal

Requirements

This project is suitable for ACS and HLT students or DataComms students with good programming skills . Experience from the Speech Processing and Speech Technology will be an advantage, but not required.

Initial reading

Fingerstyle guitar page on Wikipeida - here .
Klupuri (2003) “Automatic transcription of music”, Proc. Stockholm Music Acoustics Conference, SMAC03 – available here .
Bello, Monti and Sandler (2000) “Techniques for automatic music transcription”, Proc. International Symposium of Music Information Retrieval – available here .

[TOP]

JPB-MSc-3: Rapid Adaptation of Visual Speech Models

Description

The performance of a speech recognition system can be improved by adapting the model parameters to better fit the characteristics of the user. The standard algorithms (e.g. MAP adaptation, and MLLR adaptation) adapt very slowly requiring large amounts of data from the user. In more recent years new algorithms have emerged that can adapt rapidly using very little data. One of the best know of these is the eigenvoice technique. This technique has been shown to work very well for acoustic speech models but has not previously been tested for visual speech models (i.e. as used in automatic lip reading systems).

This exciting project will apply the eigenvoice model adaptation technique to models of visual speech. This has never been previously attempted and if successful could lead to publishable results.

The project is ambitious but it is manageable because it will be building on the output of previous related research conducted at Sheffield. The project will employ the audio-visual Grid corpus – a large audio-visual speech database that has been collected at Sheffield in recent years. The project will also make use of visual features and speaker-dependent models that have been constructed from this data as part of a recent EPSRC research grant.

Requirements

The project will involve using HTK and will best suite a student who is comfortable using linux and understands the basic principles of writing shell scripts. Some experience of MATLAB will also be helpful. Students who have taken the Speech Technology module are preferred.

Note, this project is suitable for HLT students or ACS students with an interest in speech processing.

Initial reading

R. Kuhn, J.-C. Junqua, P.Nguyen and N. Niedzielski, (2000), “Rapid speaker adaptation in eigenvoice space”, IEEE Trans. Speech and Audio Proc., Vol 8, No. 6 695–707, 2000 here
Woodland, Phil C. (2001), “Speaker adaptation for continuous density HMMs: A review”, Invited Lecture, In Adaptation-2001, 11-19.
P. S. Aleksic and A. K. Katsaggelos (2004) “Comparison of low- and high-level visual features for audio-visual continuous automatic speech recognition” Proc ICASSP 2004, Volume 5, 917-920 here
M. Cooke, J. Barker, S. Cunningham and X. Shao, (2006) An audio-visual corpus for speech perception and automatic speech recognition. Journal of the Acoustical Society of America 120(5), 2421-2424, 2006 here

[TOP]

JPB-MSc-4: Tool for Conducting Audio-Visual Speech Perception Experiments

Description

This project will design and build a tool suitable for testing people’s ability to lip-read in noisy environments.

It is well know that normal listeners are able to use their eyes to help them understand speech when there is a lot of background noise. This ability can be measured by asking subjects to listen to carefully controlled sentences and report the words that they hear. By counting how many words they recognise correctly you can get an estimate of the `intelligibility’ of the speech. These experiments are then repeated using audio only or synchronised audio and video of the talkers face. In order to run such experiments efficiently an easy-to-use and reliable software front-end is needed to present the sentences to the listener.

This project will design and build a tool for running these intelligibility tests. Although the tool has quite a simple function it has quite strict requirements which make this an interesting challenge. For example, it must easy to set up, highly reliable, it needs a very simple interface, it needs to be able to play precisely synchronised audio and video data.

If successful the project will also use the tool to run a set of AV perception experiments that will examine the effect of different qualities of lip video on the intelligibility of speech in so-called ‘non-stationary’ noise conditions.

Requirements

This is a software `design and build’ project. Students from either SSIT or ASE are preferred. It may also obe of interest to HLT students with a general interest in audio-visual speech perception.

Initial reading

M. Cooke, J. Barker, S. Cunningham and X. Shao, (2006) An audio-visual corpus for speech perception and automatic speech recognition. Journal of the Acoustical Society of America 120(5), 2421-2424, 2006 here
Audiovisual Speech Web-Lab - tutorial and demonstrations of audiovisual speech perception effects - here.

[TOP]

JPB-MSc-5: Java Applet for Collecting a Huge Lip Image Database

Description

The aim of this project is to use the internet to collect a huge database of images of people’s lips. These images could then be used to understand and model the variability in lip appear across individuals.

The project would use a signed-applet that would run on a client and use the client’s webcam to capture several images of the low half of the user’s face. These images would be uploaded to a server along with some general information such as the user’s age, gender, etc.

In order to motivate people to run the application some sort of `fun’ element needs to be involved. Perhaps the system could return some score to the user, or compare the collected image with images taken from celebrities. The system will also need to do some verification to make sure that the images are genuine.

This is a challenging project. Issues about access a local webcam from a Java applet running in a browser need to be resolved. The project requires a student who has good Java programming experience and is knowledgeable about Java-based internet applications.

Requirements

This is a software `design and build’ project. Students from either SSIT or ASE are preferred. Confidence in Java programming is essential.

Initial reading

Please come and speak to me if you wish to take this project.

[TOP]