Ug 2008

JPB-UG-1: Video-input for Chess Playing Robot (David Jones)
JPB-UG-2: Improving a Video-Based Darts Match Analyser (Rob White)
JPB-UG-3: Active Shape Modelling for Lip Tracking
JPB-UG-4: Video-Based Speech Detection (Tom Hanusiak)

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.

JPB-UG-1: Video-input for Chess Playing Robot

Description

Imagine playing chess against your computer using a standard chess board. Such a system would require three components: i) A mechanism to allow the computer to move the piece; ii) a chess-engine to work out the next move; iii) a computer vision system to allow the computer to work out what move the human has made. This project will concentrate on the final component. The project will capture data from a camera looking down on the board. It will then need to analyse the video input to detect when a move has been made, and to work out what piece has been moved from where to where. This sounds like a difficult vision problem, but it can be made simpler by using knowledge of the game itself (i.e. by knowing what moves are possible at each stage). The project will then use an open-source chess-engine (many are available) to generate the computer’s move. The computer could then either instruct the human to update the board (e.g. using speech synthesis, “Move my bishop to square E4”) or could be used to control a robot arm to move the piece.

This project will require good programming skills. The project may make use of the <a href=http://www.intel.com/research/mrl/research/opencv/>OpenCV</a> computer vision library. Knowledge of C/C++ (or the willingness to learn) will be an advantage.

Requirements

good Java skills or some C++ programming experience

Reading

Wikipedia (Chess engines) - Useful information about chess engines.
<a href=http://www.intel.com/research/mrl/research/opencv/>OpenCV</a> - A state-of-the-art library for computer vision applications
Gonzales and Woods, Digital Image Processing, Addison-Wesley Pub. Co, Reading, Massachusetts, 1992. (or any other similar textbook)
Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation

Data

See here.

[TOP]

JPB-UG-2: Improving a Video-Based Darts Match Analyser

Description

This challenging project will build on a successful project that was run this year that `watches’ a recording of a darts match and recovers a dart-by-dart account of how the score progresses. This project has built a system that exploits the fact that TV darts coverage is designed to be easy to follow and is very predictable. The system is composed of several stages. First it segments the match footage into video shots. Shots involving close-ups of the board are detected by looking for the characteristic colours that appear in a dartboard. By looking at the differences between subsequent video frames the system detects the precise moment of arrival of the dart. Once a dart arrival has been detected simple shape analysis is used to reveal where on the board the point of the dart has landed. The shape of the region where the dart has struck is used to work out the point value of a dart without having to read the numbers on the board. Each of these stages has been implemented using a technique that works fairly well, however, at each stage there is plenty of room for improvement. This project will focus on one or two of the stages in this system and aim to improve the robustness and hence improve the reliability of the complete system. Many ideas that can be used to get started are contained in this year’s project report.

The project will employ recordings of world series darts matches that will be provided on DVD.

Requirements

good Java skills or some C++ programming experience

Reading

Wikipedia (darts) - Lots of general information about the game.
<a href=http://www.intel.com/research/mrl/research/opencv/>OpenCV</a> - A state-of-the-art library for computer vision applications
Gonzales and Woods, Digital Image Processing, Addison-Wesley Pub. Co, Reading, Massachusetts, 1992. (or any other similar textbook)
Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation

Data

See here.

[TOP]

JPB-UG-3: Active Shape Modelling for Lip Tracking

Description

Lip tracking is the task of following the outline of a speaker’s lips through a sequence of video frames. This task is an important component of many audio-visual speech processing applications – including audio-visual speech recognition. The most successful lip tracking systems employ a technique know as active shape modelling (ASM). This technique employs a statistical model of the shape (and possibly, appearance) of the speaker’s lips that has been learnt from a small number of video frames in which the lip outlines have been traced by hand. The tracking system then examines the video and employs an iterative search to find a sequence of smoothly changing lip shapes that fit well to this model.

This project aims to build a Java-based demonstration of the ASM technique. For evaluation purposes the project will use part of a large corpus of audio-visual speech data that has recently been collected at Sheffield. Much of the necessary preprocessing has already been performed allowing the project to have a running start.

The references below provide an overview of Cootes’ ASM technique that will underpin the project (do not be put off by the maths - at heart the idea is quite simple and it works equally well whether you understand the maths or not!). For further background see Cootes’ web page.

Requirements

COM3400 and good Java programming skills

Reading

T.F.Cootes, C.J. Taylor, D.H.Cooper and J.Graham (1995) Active Shape Models - Their training and application, Computer Vision and Image Understanding 61(1) pp. 38-59
T.F. Cootes and C.J. Taylor (2001), Statistical models of appearance for medical image analysis and computer vision. Proc. SPIE Medical Imaging
I. Matthews, T.F. Cootes and J.A. Bangham (2002), Extraction of Visual Features for Lipreading. IEEE PAMI Vol.24, No.2, pp.198-213
D.Cristinacce and T.F.Cootes (2004), A comparison of shape constrained facial feature detectors. Proc. Int.Conf on Face and Gesture Recognition

Data

[TOP]

JPB-UG-4: Video-Based Speech Detection

Description

Automatic detection of whether or not someone is speaking has useful applications in speech recognition and telecommunications. There are many audio-based techniques for speech detection, but these techniques can be unreliable in the presence of background noise. Speech may also be detected using visual lip movement information. However, the problem is not as trivial as it may first seem. A trivial solution would simply detect lip motion and then assume that lips are moving if and only if the person is speaking. However, people often move their lips while not speaking. A better solution would need to discriminate between the type of lip movements that accompany speech, and lip movements that occur during non-speech periods.

For evaluation purposes the project will use part of a large corpus of audio-visual speech data that has recently been collected at Sheffield. The audio signal will be used to determine when the speaker starts and stops talking. The aim of the project will be to attempt to estimate these start and stop times using only the video data. Much of the necessary video and audio preprocessing has already been performed allowing the project to have a running start.

Requirements

COM3400 and good Java programming skills</BLOCKQUOTE>

Reading

Iyengar and Neti (2001) A Vision-based Microphone Switch for Speech Intent Detection
Talking Heads Website - Useful visual speech resource

[TOP]