Ug 2003

JPB-UG-1: A digital theremin (Sel Vin)
JPB-UG-2: A chroma-key tool for producing noisy audio-visual speech data (Richard King)
JPB-UG-3: Automatic lecture note taker (George Weller)
JPB-UG-4: Television watching assistant (Antranik Kasparian)
JPB-UG-5: Innovative applications in web-based retrieval (Gary Barton)

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.

JPB-UG-1: A digital theremin

Description

This project aims to build software for manipulating a synthesised audio output, via images input from a web cam or similar device. The basic system will see image recognition techniques used to detect a coloured dot held by the user. Sound is created according to the position and colour of this dot, as received by the web cam. So movement up and down could be interpreted to control volume, and pitch could be controlled by movement left and right. A graphical interface could be designed to allow the user to control the sound produced from various movements.

The basic system allows for a number possible expansions. Functions could be written to pick up specific movements such as circles and lines, which trigger some special effect when detected, and the ability for the user to record their own movement triggers could be implemented. Also, it would be nice if the system could function from recognising the user’s bare hand instead of a coloured dot. This way, sound could be generated if a specific hand gesture is detected, for example a fist, or thumb’s up.

The system may be expanded to take in additional inputs from MIDI devices as controls for further manipulating the sound. And at another level, the system could also take an analogue microphone input, and output variations of the sound, according to the input received.

[TOP]

JPB-UG-2: A chroma-key tool for producing noisy audio-visual speech data

Description

Chroma-key is a common technique used in the film and television industry to artificially place a background behind an actor/actress. The actor/actress is first filmed standing in front of a uniformly coloured screen (usually blue or green) and then the video is digitally edited to replace the background colour with an arbitrary scene.

This project will construct a tool that employs the chroma-key technique to construct data suitable for testing an audio-visual speech recognition system. The project will employ the CUAVE audio-visual speech database which contains recording of 32 different speakers reciting connected digit strings while standing in front of a green screen. Chroma-key and audio mixing will be employed to produce `noisy’ conditions where the speakers appear to be standing in busy natural scenes (e.g. on street corners, in restaurants etc).

The data is stored in MPEG-2 format and the project will require using either libmpeg3 (distributed with linux) or the Java Media Framework for decoding the data.

This project requires good maths and good C/C++ or Java programming skills.

Requirements

Java or C/C++ programming skills

Reading

Patterson et al., (2002) CUAVE: A new audio-visual database for multimodal human-computer interface research, Proc. ICASSP 2002.

[TOP]

JPB-UG-3: Automatic lecture note taker

Description

Every day hundreds of students are sitting in lecture halls transcribing the contents of projected transparencies into hand written notes. This project aims to replace the student with a digital camcorder and a bit of software!

A camera focused on the white board will be used to capture the lecture. Standard video processing techniques will be used to reduce the video of the lecture to a series of stills showing each component transparency. Text identification techniques will be used to locate the regions of text. Finally optical character recognition (OCR) software will be used to transcribe each text region.

The system can be developed using video data already available on the web, and if successful can be tested on a real DCS lecture.

The project will make use of existing OCR software, but will develop the other components from scratch.

Requirements

good java or C/C++ programming skills, some maths

Reading

He, Liu and Zhang</li> (2003) Why take notes? Use the whiteboard capture system Proc ICASSP 2003
Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation
Smith and Kanade (1995) Video skimming for quick browsing based on audio and image characterization, Carnegie Mellon University technical report CMU-CS-95-186

[TOP]

JPB-UG-4: Television watching assistant

Description

This project aims to build a ‘Television Watching Assistant’. This assistant will read the name captions that often appear when people are interviewed on television, and then search the internet for web pages related to the person appearing on the screen.

The project will develop standard video processing techniques to identify captions, and will make use of existing OCR software to read them.

Recently a set of APIs to the full Google web search engine was released by Google, enabling developers to freely access Google web search from their programs. This project will make use of the Google API to serve the TV viewer web pages relevant to the person being captioned.

Although this is a challenging project it can be made do-able by making a few simplification. For example, a given program will use a consistent caption style, so a program specific version of the software can exploit prior knowledge of the style and positioning of captions to make their detection easier. Prior knowledge of the colour, size and font of a program’s captioning style will also make the OCR more reliable.

Requirements

good java or C/C++ programming skills

Reading

Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation
Smith and Kanade (1995) Video skimming for quick browsing based on audio and image characterization, Carnegie Mellon University technical report CMU-CS-95-186

[TOP]

JPB-UG-5: Innovative applications in web-based retrieval

Description

Google has become well-known as a large, effective and efficient web search engine. Recently a set of APIs to the full Google web search engine was released by Google, enabling developers to access Google web search from their programs.

This project is concerned with developing innovative applications using this API. There is plenty of scope for new ideas here, but to set you thinking some possibilities include:

tracking the focus of a set of queries over time
finding “communities of documents”
development of new user interfaces for web searching

Some other ideas are listed on Google’s web pages.

The project is suited to students who like learning new things, have strong programming skills, and have some idea that they would like to try out. Nothing like an index of 2 billion web pages has been available to programmers previously, so there is clear scope for innovation. As a start see the reference by Brin and Page.

Requirements

programming skill: Google Web APIs contain APIs for various languages, including Java.

Reading

S. Brin and L. Page, the anatomy of a large-scale hypertextual web search engine, Proceedings of WWW-7, 1998.

[TOP]