About the book | Software | Chapter Resources | Corpora | Links
About the book
Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?
Albert Bregman's book Auditory Scene Analysis, published in 1990, draws an analogy between the perception of auditory scenes and visual scenes, and describes a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).
Computational Auditory Scene Analysis: Principles, Algorithms and Applications provides a comprehensive and coherent account of the state-of-the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. The text is written at a level that will be accessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source.
The book is edited by DeLiang Wang (The Ohio State University, USA) and Guy J. Brown (University of Sheffield, UK), with the following contributing authors:
- Jon Barker (University of Sheffield, UK)
- Alain de Cheveigné (Université Paris 5, France)
- Daniel P. W. Ellis (Columbia University, USA)
- Albert S. Feng (University of Illinois at Urbana-Champaign, USA)
- Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST), Japan
- Douglas L. Jones (University of Illinois at Urbana-Champaign, USA)
- Kalle Palomäki (Helsinki University of Technology, Finland)
- Richard Stern (Carnegie Mellon University, USA)
For more information, and to purchase, see amazon.com or amazon.co.uk
Software
Software for producing some of the figures in the book can be downloaded by clicking here.
Chapter Resources
Chapter 1. Fundamentals of Computational Auditory Scene Analysis by DeLiang Wang and Guy J. Brown
Software
Matlab m-files to produce the following figures are included in the chapter1 directory of the software archive:
- chap1fig6.m - both panels of figure 1.6
- chap1fig7a.m - left panel of figure 1.7
- chap1fig7b.m - right panel of figure 1.7
- chap1fig9a.m - left panel of figure 1.9
- chap1fig9b.m - right panel of figure 1.9
Chapter 2. Multiple F0 estimation by Alain de Cheveigné
Supplementary material will be available shortly
Chapter 3. Feature-based speech segregation by DeLiang Wang
Supplementary material will be available shortly
Chapter 4. Model-based scene analysis by Daniel P. W. Ellis
Supplementary material will be available shortly
Chapter 5. Binaural sound localization by Richard M. Stern, Guy J. Brown and DeLiang Wang
Papers and software downloads are available from Richard Stern's web pages for binaural hearing.
Chapter 6. Localization-based grouping by Albert S. Feng and Douglas L. Jones
Audio demonstrations
Demonstration of the two-sensor FMV system of Lockwood et al. The test condition comprised a target source at 0 degrees azimuth and four interferers at -20, +20, -60 and +60 degrees azimuth. The combined noise had an overall SNR of -4 dB at the position of the two-sensor array, at 1m away from all the sound sources.
Chapter 7. Reverberation by Guy J. Brown and Kalle J. Palomäki
Audio demonstrations
Harmonic dereverberation (HERB). The following two sets of sound files show the performance of the HERB algorithm for a male and female voice. The female sound files were used to generate figure 7.9. Many thanks to Tomohiro Nakatani for supplying these sound files and allowing us to distribute them.
Chapter 8. Analysis of musical audio signals by Masataka Goto
Supplementary material for this chapter is available here.
Chapter 9. Robust automatic speech recognition by Jon Barker
Supplementary material will be available shortly
Chapter 10. Neural and Perceptual modeling by Guy J. Brown and DeLiang Wang
Software
See the MAD demonstrations wangNeuron and wangNetwork for simulations of a single neural oscillator and a small oscillator network, respectively.
Matlab m-files to produce the following figures are included in the chapter10 directory of the software archive:
- chap10figs7and8.m - all panels of figure 10.7, and panels B and C of figure 10.8
Corpora
The SPIN corpus was developed at the University of Illinois but is not currently available through
the web. Those interested in purchasing the corpus can contact:
Mark Joseph
Department of Speech and Hearing Science
University of Illinois at Urbana-Champaign
Champaign, IL 61820
Email: markjos@uiuc.edu
Phone: 217-333-2230
Links
Software tools
- DSAM - Development system for auditory modelling (University of Essex)
- Malcolm Slaney's auditory toolbox
- MAD - Matlab auditory demonstrations (University of Sheffield)
- AIM - Auditory Image Model (University of Cambridge)
- Roomsim - Matlab application for sound spatialisation in a rectangular room
Web sites about auditory perception and CASA
- www.auditory.org - home page for the AUDITORY list, an email list for the discussion of organizational aspects of auditory perception
Research groups
Personal web pages
If you are working in the field of CASA and would like your web page added to this list, please email us.
- Barker, Jon (University of Sheffield)
- Brown, Guy J. (University of Sheffield)
- de Cheveigné, Alain (Université Paris 5)
- Cooke, Martin (University of Sheffield)
- Ellis, Dan (Columbia University)
- Feng, Albert (Beckman Institute/University of Illinois)
- Goto, Masataka (AIST, Japan)
- Irino, Toshio (Wakayama University)
- Jones, Douglas L. (University of Illinois at Urbana-Champaign)
- Kawahara, Hideki (Wakayama University)
- Lyon, Richard F. (Dick)
- Okuno, Hiroshi (Kyoto University)
- Palomäki, Kalle (Helsinki University of Technology)
- Pichevar, Ramin (Communications Research Centre, Ottawa/University of Sherbrooke)
- Patterson, Roy (University of Cambridge)
- Slaney, Malcolm (Yahoo! Research)
- Stern, Richard (Carnegie Mellon University)
- Wang, DeLiang (Ohio State University)
- Wrigley, Stuart (University of Sheffield)
Last updated by Guy on 23/1/2013
|