Pgt 2002

JPB-MSc-1: Robust lip parameterisation for automatic lipreading
JPB-MSc-2: A tool for semi-automatic facial feature segmentation (Weiping Hu)

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.

JPB-MSc-1: Robust lip parameterisation for automatic lipreading

Description

Automatic lipreading systems can work well if users make an effort to keep their head stationary while speaking. However, these systems are often not robust under natural speaking conditions where there may be significant head movement. This project will attempt to address this problem.

The project will employ a new audio-visual speaker database (CUAVE) that has been specifically designed to investigate the problem of moving talkers. The designers of the database have distributed a set of baseline results comparing the performance of various types of visual feature (Patterson et al, PDF). Best performance under stationary-talker conditions is achieved using simple DCT-based features. However, these features are very sensitive to the scale and angle of the face and recognition performance consequently degrades when the speaker’s head is moving. The project will examine techniques for normalising the angle and scale of the face image prior to computation of the DCT lip features, hopefully leading to an improvement upon Patterson’s baseline recognition results.

The data is stored in MPEG-2 format and the project will require using either libmpeg3 (distributed with linux) or the a Java Media Framework for decoding the data.

This project requires good maths and good C/C++ or Java programming skills.

Resources

C++ or Java; PC

Prerequisites

Good maths and programming skills: Preferably C/C++
COM6460 Speech Technology

Initial reading

Patterson et al., “Moving-talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus” PDF

[TOP]

JPB-MSc-2: A tool for semi-automatic facial feature segmentation

Description

Manually segmented facial features (e.g. lips, eyes, faces) are useful in the development of statistical models employed in automatic lip reading systems. However, manual segmentation is a laborious process. This project aims to build a semi-automatic tool which will speed up the process.

The segmentation tool will operate as follows: The user will sketch the approximate boundaries of the desired feature (e.g. the speaker’s lips) on the initial frame of the video. The tool will then calculate and display a refined estimate of the feature boundaries. It will then use a suitable technique to try and track the feature boundaries from frame to frame. The user will be able to step through each video frame, displaying the boundary estimates overlayed on the the video data. Robust automatic tracking is difficult to achieve as errors can quickly accumulate. To prevent such problems the user will be allowed to intervene at any stage to hand correct poor boundary estimates before proceeding.

The tool will be applied to the (CUAVE) audio-visual speech database. This data is stored in MPEG-2 format and the project will require using either libmpeg3 (distributed with linux) or the Java Media Framework for decoding the data.

This project requires good C/C++ or Java programming skills.

Resources

C++ or Java; PC

Prerequisites

Good programming skills: Preferably C/C++

Initial reading

Zhong et al., “Interactive Tracker - A semi-automatic video object tracking and segmentation system” PDF
Patterson et al., “Moving-talker, speaker-independent feture study and baseline results using the CUAVE multimodal speech corpus” PDF

[TOP]