- JPB-UG-1: A digital theremin (Sel Vin)
- JPB-UG-2: A chroma-key tool for producing noisy audio-visual speech data (Richard King)
- JPB-UG-3: Automatic lecture note taker (George Weller)
- JPB-UG-4: Television watching assistant (Antranik Kasparian)
- JPB-UG-5: Innovative applications in web-based retrieval (Gary Barton)
JPB-UG-1: A digital theremin
Description
This project aims to build software for manipulating a synthesised audio output, via images input from a web cam or similar device. The basic system will see image recognition techniques used to detect a coloured dot held by the user. Sound is created according to the position and colour of this dot, as received by the web cam. So movement up and down could be interpreted to control volume, and pitch could be controlled by movement left and right. A graphical interface could be designed to allow the user to control the sound produced from various movements.
The basic system allows for a number possible expansions. Functions could be written to pick up specific movements such as circles and lines, which trigger some special effect when detected, and the ability for the user to record their own movement triggers could be implemented. Also, it would be nice if the system could function from recognising the user’s bare hand instead of a coloured dot. This way, sound could be generated if a specific hand gesture is detected, for example a fist, or thumb’s up.
The system may be expanded to take in additional inputs from MIDI devices as controls for further manipulating the sound. And at another level, the system could also take an analogue microphone input, and output variations of the sound, according to the input received.
[TOP]
JPB-UG-2: A chroma-key tool for producing noisy audio-visual speech data
Description
Chroma-key is a common technique used in the film and television industry to artificially place a background behind an actor/actress. The actor/actress is first filmed standing in front of a uniformly coloured screen (usually blue or green) and then the video is digitally edited to replace the background colour with an arbitrary scene.
This project will construct a tool that employs the chroma-key technique to construct data suitable for testing an audio-visual speech recognition system. The project will employ the CUAVE audio-visual speech database which contains recording of 32 different speakers reciting connected digit strings while standing in front of a green screen. Chroma-key and audio mixing will be employed to produce `noisy’ conditions where the speakers appear to be standing in busy natural scenes (e.g. on street corners, in restaurants etc).
The data is stored in MPEG-2 format and the project will require using either libmpeg3 (distributed with linux) or the Java Media Framework for decoding the data.
This project requires good maths and good C/C++ or Java programming skills.
Requirements
- Java or C/C++ programming skills
Reading
- Patterson et al., (2002) CUAVE: A new audio-visual database for multimodal human-computer interface research, Proc. ICASSP 2002.
JPB-UG-3: Automatic lecture note taker
Description
Every day hundreds of students are sitting in lecture halls transcribing the contents of projected transparencies into hand written notes. This project aims to replace the student with a digital camcorder and a bit of software!
A camera focused on the white board will be used to capture the lecture. Standard video processing techniques will be used to reduce the video of the lecture to a series of stills showing each component transparency. Text identification techniques will be used to locate the regions of text. Finally optical character recognition (OCR) software will be used to transcribe each text region.
The system can be developed using video data already available on the web, and if successful can be tested on a real DCS lecture.
The project will make use of existing OCR software, but will develop the other components from scratch.
Requirements
- good java or C/C++ programming skills, some maths
Reading
- He, Liu and Zhang</li> (2003) Why take notes? Use the whiteboard capture system Proc ICASSP 2003
- Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation
- Smith and Kanade (1995) Video skimming for quick browsing based on audio and image characterization, Carnegie Mellon University technical report CMU-CS-95-186
JPB-UG-4: Television watching assistant
Description
This project aims to build a ‘Television Watching Assistant’. This assistant will read the name captions that often appear when people are interviewed on television, and then search the internet for web pages related to the person appearing on the screen.
The project will develop standard video processing techniques to identify captions, and will make use of existing OCR software to read them.
Recently a set of APIs to the full Google web search engine was released by Google, enabling developers to freely access Google web search from their programs. This project will make use of the Google API to serve the TV viewer web pages relevant to the person being captioned.
Although this is a challenging project it can be made do-able by making a few simplification. For example, a given program will use a consistent caption style, so a program specific version of the software can exploit prior knowledge of the style and positioning of captions to make their detection easier. Prior knowledge of the colour, size and font of a program’s captioning style will also make the OCR more reliable.
Requirements
- good java or C/C++ programming skills
Reading
- Brunelli, Mich and Modena (1999) A Survey of video indexing, J. of Visual Communication and Image Representation
- Smith and Kanade (1995) Video skimming for quick browsing based on audio and image characterization, Carnegie Mellon University technical report CMU-CS-95-186
JPB-UG-5: Innovative applications in web-based retrieval
Description
Google has become well-known as a large, effective and efficient web search engine. Recently a set of APIs to the full Google web search engine was released by Google, enabling developers to access Google web search from their programs.
This project is concerned with developing innovative applications using this API. There is plenty of scope for new ideas here, but to set you thinking some possibilities include:
- tracking the focus of a set of queries over time
- finding “communities of documents”
- development of new user interfaces for web searching
Some other ideas are listed on Google’s web pages.
The project is suited to students who like learning new things, have strong programming skills, and have some idea that they would like to try out. Nothing like an index of 2 billion web pages has been available to programmers previously, so there is clear scope for innovation. As a start see the reference by Brin and Page.
Requirements
- programming skill: Google Web APIs contain APIs for various languages, including Java.
Reading
- S. Brin and L. Page, the anatomy of a large-scale hypertextual web search engine, Proceedings of WWW-7, 1998.