3rd year projects 2011 - 12

  • JPB-UG-1: Scissors, Paper, Stone with Microsoft Kinect (Lewis Morley)
  • JPB-UG-1: Hand-drawn GUIs (Radina Kalpakova)
  • JPB-UG-3: Voice alignment for automated dialogue replacement (Zhe Wei)
  • JPB-UG-4: Spot-the-difference solver (Vaclav Hudec)
  • JPB-UG-5: Auto-tune detection (Domonic Ellam)

Mail all

The project descriptions below are only intended as starting points. If you wish to discuss possibilities in greater detail I encourage you to email me to arrange a meeting.


JPB-UG-1: Scissors, Paper, Stone with Microsoft Kinect

Description

‘Scissors, Paper, Stone’ (or ‘Rock, Paper, Scissors’ as it is known in the US) is a ‘hand game’ in which players simultaneously make one of three specific hand gestures representing scissors, paper or stone (see images above). The winner is decided by the simple rules: scissors beats paper, paper beats stone, stone beats scissors. If the two players choose the same gesture the round is repeated.

This project will use a Microsoft Kinect sensor and gesture recognition techniques to write a program that can play the game against a human opponent.

Requirements

  • An interest in Computer Vision and good programming skills.

Initial reading

[TOP]


JPB-UG-1: Hand-drawn GUIs

Description

This challenging project will explore the use of computer vision techniques to allow users to interact with a simple GUI sketched on a piece of paper.

Please contact me for further detail

Requirements

  • An interest in Computer Vision and good programming skills.

Initial reading

[TOP]


JPB-UG-3: Voice alignment for automated dialogue replacement

Description

Automated dialogue replacement (ADR) is a process used in the film industry in which an actor’s dialogue, originally captured during filming, is replaced with dialogue that has been re-recorded by the same actor in a recording studio. This is done because it is often difficult to record good quality audio on set, it also makes it easier to cleanly replace the dialogue when dubbing into foreign languages. However, the rerecorded audio has to precisely match the timing of the original audio so that it fits correctly to the lip movements in the film. Many actors find this difficult to do – i.e. it’s hard to simultaneously concentrate on the timing of the delivery while expressing the meaning of the lines with emotional depth. Actors often have to spend a long time in the studio in order to produce a result. This can be highly costly.

This project will produce a system that subtly manipulates the timing of the ADR recording so that it matches the originally recorded speech, i.e. if the timing is not spot on, the system will stretch or compress parts of the utterance to improve the fit. This will mean that actors do not have to worry quite so much about the timing and will be free to concentrate on giving a good performance. There are already a few commercial systems that can do this job extremely well. Such systems are extremely expensive and the details of the algorithms they employ are closely guarded. However, the basic principals are fairly straight forward, so while it is not expected that the project will be able to compete with the quality of commercial systems, it should be possible to produce something of ‘passable’ quality.

Requirements

  • The Speech Processing and Speech Technology should be useful.

Initial reading

[TOP]


JPB-UG-4: Spot-the-difference solver

Description

This computer vision project will attempt to solve ‘spot the difference’ puzzles (e.g. see above). The idea is that the user should be able to hold a puzzle in front of a USB-camera, and the program will produce an image of the puzzle in which the differences have been circled.

The task may appear straightforward but producing a robust solution will be difficult because the system will need to ignore irrelevant differences caused by irregular lighting, lens distortion and sensor noise. A further difficulty arises when deciding what constitutes a single difference. For example, in the puzzle above, the large sheep has textured wool in the image on the left but not in the image on the right – this is clearly meant to be a single difference, but at a low level this difference involves the omission of many separate line segments in the right-hand image.

Requirements

  • An interest in computer vision and pattern classification.

Initial reading

[TOP]


JPB-UG-5: Auto-tune detection

Description

Auto-tune is a patented audio processor created by Antares Audio Technologies that corrects the pitch of notes during vocal or instrumental performances. It is being increasingly used in the record industry – in some cases it is used overtly to produce a ‘mechanical effect’ e.g. in the Cher track ‘Believe’, more commonly it is used to subtly correct pitch imperfections with the aim of making the singer sound more accomplished than they actually are, e.g. its use on X-factor which has recently sparked much controversy.

This project will produce a system that analyses music and attempts to detect whether auto-tune has been used. This may be possible because auto-tuned music is often too perfect and transitions between notes can be unnaturally abrupt. This unnaturalness can be detected by examining the output of pitch detection algorithms. Because it may be impossible to know for certain whether autotune has been used on any particular track (i.e. performers and producers tend to be rather coy about its use) the project will instead look for statistical differences between pitch track data recovered from albums before and after autotune became mainstream.

Requirements

  • An interest in audio processing.

Initial reading

  • Auto-Tune on Wikipedia
  • Antares Audio Technologies website
  • Chapter 2, “Computational Auditory Scene Analysis: Principles, Algorithms and Applications”, Eds. Wang and Brown, IEEE-press/Wiley-Interscience
[TOP]