Ug 2007

JPB-UG-1: Video-Based Darts Match Analysis
JPB-UG-2: Audio-Based Tennis Match Indexing (Chris Hutton)
JPB-UG-3: Active Shape Modelling for Lip Tracking
JPB-UG-4: Video-Based Speech Detection

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.

JPB-UG-1: Video-Based Darts Match Analysis

Description

This challenging project will attempt to build a system that `watches’ a recording of a darts match and recovers a dart-by-dart account of how the score progresses. Although this sounds very challenging it should be quite do-able because TV darts coverage is designed to be easy to follow and is very predictable. For example, there is usually a close-up view showing the moment where each dart hits the dart board. The close-ups are quite easy to detect due to the characteristic dart board pattern. By looking at the differences between subsequent video frames it will be possible to reliably detect the precise moment of arrival of a dart. Once a dart arrival has been detected some simple shape analysis will reveal where on the board the point of the dart has landed. By using the angles of boundaries between the board regions it will then be possible to work out the point value of a dart without having to read the numbers on the board. Reliably dealing with darts that land very close to the wires is likely to be the most challenging part. A dart-by-dart record of a series of games will have interesting applications. For example, if the system is run over enough matches it would be possible to compute reliable player performance statistics. It will also be possible to use the timing of the events to build an index of the match allowing the viewer to navigate the video in a user-friendly way, e.g. to step through the video throw-by-throw.

The project will employ recordings of world series darts matches that will be provided on DVD.

This project will require good programming skills. The project may make use of the <a href=http://www.intel.com/research/mrl/research/opencv/>OpenCV</a> computer vision library. Knowledge of C/C++ (or the willingness to learn) will be an advantage.

Requirements

good Java skills or some C++ programming experience

Reading

Wikipedia (darts) - Lots of general information about the game.
<a href=http://www.intel.com/research/mrl/research/opencv/>OpenCV</a> - A state-of-the-art library for computer vision applications
Gonzales and Woods, Digital Image Processing, Addison-Wesley Pub. Co, Reading, Massachusetts, 1992. (or any other similar textbook)
Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation

Data

See here.

[TOP]

JPB-UG-2: Audio-Based Tennis Match Indexing

Description

This project will build and test a system that automatically places point',game’ and `set’ index points into recordings of tennis matches. The indexing will work by performing an analysis of the audio track. By using pattern processing techniques it will detect the characteristic sound of a ball being struck. The match can then be partitioned into points by looking for sequences of these events. By analysing the timing between points it should be possible to do a higher level analysis, i.e. to split the match into games and sets. The project will also build a simple viewing tool that uses the index points to allow the user to navigate the video in a user-friendly way. e.g. skipping forward game by game. Additionally the index data can be used to collect certain match statistics, e.g. average length of a rally, longest rally, longest games, shortest game. etc.

Requirements

COM3400 and Java programming skills

Reading

Wikipedia (tennis) - Lots of general information about the game.</li>
Dufaux et al. “Automatic sound detection and recognition for noisy environment” Proc EUSIPCO 2002 </li>
Whalen, “Detection of signals in noise”, Academic Press, 1971 (St. Georges Library)
Poor, “An introduction to signal detection and estimation”, Springer-Verlag, 1988

Data

See here.

[TOP]

JPB-UG-3: Active Shape Modelling for Lip Tracking

Description

Lip tracking is the task of following the outline of a speaker’s lips through a sequence of video frames. This task is an important component of many audio-visual speech processing applications – including audio-visual speech recognition. The most successful lip tracking systems employ a technique know as active shape modelling (ASM). This technique employs a statistical model of the shape (and possibly, appearance) of the speaker’s lips that has been learnt from a small number of video frames in which the lip outlines have been traced by hand. The tracking system then examines the video and employs an iterative search to find a sequence of smoothly changing lip shapes that fit well to this model.

This project aims to build a Java-based demonstration of the ASM technique. For evaluation purposes the project will use part of a large corpus of audio-visual speech data that has recently been collected at Sheffield. Much of the necessary preprocessing has already been performed allowing the project to have a running start.

The references below provide an overview of Cootes’ ASM technique that will underpin the project (do not be put off by the maths - at heart the idea is quite simple and it works equally well whether you understand the maths or not!). For further background see Cootes’ web page.

Requirements

COM3400 and good Java programming skills

Reading

T.F.Cootes, C.J. Taylor, D.H.Cooper and J.Graham (1995) Active Shape Models - Their training and application, Computer Vision and Image Understanding 61(1) pp. 38-59
T.F. Cootes and C.J. Taylor (2001), Statistical models of appearance for medical image analysis and computer vision. Proc. SPIE Medical Imaging
I. Matthews, T.F. Cootes and J.A. Bangham (2002), Extraction of Visual Features for Lipreading. IEEE PAMI Vol.24, No.2, pp.198-213
D.Cristinacce and T.F.Cootes (2004), A comparison of shape constrained facial feature detectors. Proc. Int.Conf on Face and Gesture Recognition

Data

[TOP]

JPB-UG-4: Video-Based Speech Detection

Description

Automatic detection of whether or not someone is speaking has useful applications in speech recognition and telecommunications. There are many audio-based techniques for speech detection, but these techniques can be unreliable in the presence of background noise. Speech may also be detected using visual lip movement information. However, the problem is not as trivial as it may first seem. A trivial solution would simply detect lip motion and then assume that lips are moving if and only if the person is speaking. However, people often move their lips while not speaking. A better solution would need to discriminate between the type of lip movements that accompany speech, and lip movements that occur during non-speech periods.

For evaluation purposes the project will use part of a large corpus of audio-visual speech data that has recently been collected at Sheffield. The audio signal will be used to determine when the speaker starts and stops talking. The aim of the project will be to attempt to estimate these start and stop times using only the video data. Much of the necessary video and audio preprocessing has already been performed allowing the project to have a running start.

Requirements

COM3400 and good Java programming skills

Reading

Iyengar and Neti (2001) A Vision-based Microphone Switch for Speech Intent Detection
Talking Heads Website - Useful visual speech resource

[TOP]