About the book | Software | Chapter Resources | Corpora | Links

About the book

Human listeners are able to perceptually segregate one sound source from an acoustic mixture, such as a single voice from a mixture of other voices and music at a busy cocktail party. How can we engineer "machine listening" systems that achieve this perceptual feat?

Cover of the Wiley/IEEE Press book on Computational Auditory Scene Analysis, edited by DeLiang Wang and Guy J. Brown

Albert Bregman's book Auditory Scene Analysis, published in 1990, draws an analogy between the perception of auditory scenes and visual scenes, and describes a coherent framework for understanding the perceptual organization of sound. His account has stimulated much interest in computational studies of hearing. Such studies are motivated in part by the demand for practical sound separation systems, which have many applications including noise-robust automatic speech recognition, hearing prostheses, and automatic music transcription. This emerging field has become known as computational auditory scene analysis (CASA).

Computational Auditory Scene Analysis: Principles, Algorithms and Applications provides a comprehensive and coherent account of the state-of-the art in CASA, in terms of the underlying principles, the algorithms and system architectures that are employed, and the potential applications of this exciting new technology. The text is written at a level that will be accessible to graduate students and researchers from related science and engineering disciplines. The extensive bibliography accompanying each chapter will also make this book a valuable reference source.

The book is edited by DeLiang Wang (The Ohio State University, USA) and Guy J. Brown (University of Sheffield, UK), with the following contributing authors:

Jon Barker (University of Sheffield, UK)
Alain de Cheveigné (Université Paris 5, France)
Daniel P. W. Ellis (Columbia University, USA)
Albert S. Feng (University of Illinois at Urbana-Champaign, USA)
Masataka Goto (National Institute of Advanced Industrial Science and Technology (AIST), Japan)
Douglas L. Jones (University of Illinois at Urbana-Champaign, USA)
Kalle Palomäki (Helsinki University of Technology, Finland)
Richard Stern (Carnegie Mellon University, USA)

For more information, and to purchase, see amazon.com or amazon.co.uk

Software

Software for producing many of the figures in the book is available for download below:

code.zip [156 KB]

This will unzip into a directory structure that contains a common library (which you should add to your Matlab path) and several directories containing Matlab m-files for specific chapters.

Chapter Resources

Chapter 1. Fundamentals of computational auditory scene analysis
by DeLiang Wang and Guy J. Brown

Software

Matlab m-files to produce the following figures are included in the chapter1 directory of the software archive:

chap1fig6.m - both panels of figure 1.6
chap1fig7a.m - left panel of figure 1.7
chap1fig7b.m - right panel of figure 1.7
chap1fig9a.m - left panel of figure 1.9
chap1fig9b.m - right panel of figure 1.9

Matlab software for generating a cochleagram and resynthesizing an audio waveform from the cochleagram are available here (under "Cochleagram Analysis and Resynthesis").

Chapter 2. Multiple F0 estimation
by Alain de Cheveigné

Supplementary material will be available shortly.

Chapter 3. Feature-based speech segregation
by DeLiang Wang

Software

Feature extraction. Matlab software for onset/offset detection and envelope extraction is available for download below:
onsetOffset.zip [7.5KB]
envelopeExtraction.zip [6.1KB]
Auditory segmentation. Software for cross-channel correlation based auditory segmentation is available here (under "Speech Segregation Based on Oscillatory Correlation").
Software for onset/offset based segmentation is available here (under "Auditory Segmentation").
Voiced speech segregation. Software for the Hu-Wang model of voiced speech segregation is available here (under "Voiced Speech Segregation").

Audio demonstrations

An audio demo of the Hu-Wang system for monaural voiced speech segregation is available here.

Chapter 4. Model-based scene analysis
by Daniel P. W. Ellis

Supplementary material will be available shortly.

Chapter 5. Binaural sound localization
by Richard M. Stern, Guy J. Brown and DeLiang Wang

Papers and software downloads are available from Richard Stern's web pages for binaural hearing.

Chapter 6. Localization-based grouping
by Albert S. Feng and Douglas L. Jones

Audio demonstrations

Demonstration of the two-sensor FMV system of Lockwood et al. The test condition comprised a target source at 0 degrees azimuth and four interferers at -20, +20, -60 and +60 degrees azimuth. The combined noise had an overall SNR of -4 dB at the position of the two-sensor array, at 1m away from all the sound sources.

Unprocessed [2.7 MB WAV]
Processed [2.7 MB WAV]

Chapter 7. Reverberation
by Guy J. Brown and Kalle J. Palomäki

Audio demonstrations

Harmonic dereverberation (HERB). The following two sets of sound files show the performance of the HERB algorithm for a male and female voice. The female sound files were used to generate figure 7.9. Many thanks to Tomohiro Nakatani for supplying these sound files and allowing us to distribute them.

Female speaker: dry [50KB WAV], reverberated [50KB WAV], dereverberated [50KB WAV]
Male speaker: dry [52KB WAV], reverberated [52KB WAV], dereverberated [52KB WAV]

Chapter 8. Analysis of musical audio signals
by Masataka Goto

Supplementary material for this chapter is available here.

Chapter 9. Robust automatic speech recognition
by Jon Barker

Software

Matlab m-files to produce the following figures are included in chapter9 of the software archive:

demo1.m - makes a colour version of figure 9.2
demo2.m - an interactive version of figure 9.3

This directory also includes figure 9.1 in electronic form (fig1.eps).

Jon Barker's CASA toolkit (CTK) is available here.

Chapter 10. Neural and perceptual modeling
by Guy J. Brown and DeLiang Wang

Software

See the MAD demonstrations wangNeuron and wangNetwork for simulations of a single neural oscillator and a small oscillator network, respectively.

Matlab m-files to produce the following figures are included in the chapter10 directory of the software archive:

chap10figs7and8.m - all panels of figure 10.7, and panels B and C of figure 10.8

The complete C source code for the Wang-Brown model of speech segregation based on oscillatory correlation is available here (under "Speech Segregation Based on Oscillatory Correlation").

Corpora

TIMIT
NOISEX
HINT
TiDigits
BKB-SIN
Aurora
NOIZEUS
ShATR
ICSI meetings corpus
RWC music database
KEMAR HRTFs recorded by Gardner and Martin, MIT
Sound mixtures from Martin Cooke's PhD
Signals used in the book Visual Representations of Speech Signals, edited by Cooke, Beet and Crawford
INTERSPEECH 2006 speech separation challenge
Guoning Hu's corpus of 100 nonspeech sounds

Links

Software tools

DSAM - Development system for auditory modelling (University of Essex)
Malcolm Slaney's auditory toolbox
MAD - Matlab auditory demonstrations (University of Sheffield)
AIM - Auditory Image Model (University of Cambridge)
Roomsim - Matlab application for sound spatialisation in a rectangular room
CTK - Jon Barker's CASA toolkit

Web sites about auditory perception and CASA

www.auditory.org - home page for the AUDITORY list, an email list for the discussion of organizational aspects of auditory perception

Research groups

Centre for the Neural Basis of Hearing, University of Cambridge
Speech and Hearing Research Group, University of Sheffield
Perception and Neurodynamics Laboratory, The Ohio State University
Laboratory for the Recognition and Organization of Speech and Audio (LabROSA), Columbia University
CMU Robust Speech Recognition Group, Carnegie Mellon University
NECOTIS (Neurocomputational and Intelligent Signal Processing Research group), Université de Sherbrooke

Personal web pages

If you are working in the field of CASA and would like your web page added to this list, please email us.

Barker, Jon (University of Sheffield)
Brown, Guy J. (University of Sheffield)
de Cheveigné, Alain (Université Paris 5)
Cooke, Martin (University of Sheffield)
Ellis, Dan (Columbia University)
Feng, Albert (Beckman Institute/University of Illinois)
Goto, Masataka (AIST, Japan)
Irino, Toshio (Wakayama University)
Jones, Douglas L. (University of Illinois at Urbana-Champaign)
Kawahara, Hideki (Wakayama University)
Lazzaro, John (UC Berkeley)
Lyon, Richard F. (Dick)
Okuno, Hiroshi (Kyoto University)
Palomäki, Kalle (Helsinki University of Technology)
Patterson, Roy (University of Cambridge)
Pichevar, Ramin (Communications Research Centre, Ottawa/University of Sherbrooke)
Rouat, Jean (Université de Montréal, Université de Sherbrooke)
Slaney, Malcolm (Yahoo! Research)
Stern, Richard (Carnegie Mellon University)
Wang, DeLiang (Ohio State University)
Wrigley, Stuart (University of Sheffield)

Last updated by Guy on 28/6/2007 The keywords for this site are computational auditory scene analysis, CASA, auditory perception, machine hearing, Guy Brown, DeLiang Wang, Wiley, IEEE press, book, textbook, auditory model, sound separation, independent component analysis, ICA, matlab, source code, sound example, audio demonstration, hearing aid, Albert Bregman

About the book

Software

Chapter Resources

Chapter 1. Fundamentals of computational auditory scene analysisby DeLiang Wang and Guy J. Brown

Software

Chapter 2. Multiple F0 estimationby Alain de Cheveigné

Chapter 3. Feature-based speech segregationby DeLiang Wang

Software

Audio demonstrations

Chapter 4. Model-based scene analysisby Daniel P. W. Ellis

Chapter 5. Binaural sound localizationby Richard M. Stern, Guy J. Brown and DeLiang Wang

Chapter 6. Localization-based groupingby Albert S. Feng and Douglas L. Jones

Audio demonstrations

Chapter 7. Reverberationby Guy J. Brown and Kalle J. Palomäki

Audio demonstrations

Chapter 8. Analysis of musical audio signalsby Masataka Goto

Chapter 9. Robust automatic speech recognitionby Jon Barker

Software

Chapter 10. Neural and perceptual modelingby Guy J. Brown and DeLiang Wang

Software

Corpora

Links

Software tools

Web sites about auditory perception and CASA

Research groups

Personal web pages

Chapter 1. Fundamentals of computational auditory scene analysis
by DeLiang Wang and Guy J. Brown

Chapter 2. Multiple F0 estimation
by Alain de Cheveigné

Chapter 3. Feature-based speech segregation
by DeLiang Wang

Chapter 4. Model-based scene analysis
by Daniel P. W. Ellis

Chapter 5. Binaural sound localization
by Richard M. Stern, Guy J. Brown and DeLiang Wang

Chapter 6. Localization-based grouping
by Albert S. Feng and Douglas L. Jones

Chapter 7. Reverberation
by Guy J. Brown and Kalle J. Palomäki

Chapter 8. Analysis of musical audio signals
by Masataka Goto

Chapter 9. Robust automatic speech recognition
by Jon Barker

Chapter 10. Neural and perceptual modeling
by Guy J. Brown and DeLiang Wang