Autocorrelogram - A Visual Display of Sound Periodicity

Description

A C++ implementation of the autocorrelogram (ACG) model employed in (Ma et al. 2007).

The autocorrelogram, or simply correlogram, is a visual display of sound periodicity and an important representation of auditory temporal activity that combines both spectral and temporal information. It is normally defined as a three-dimensional volumetric function, mapping a frequency channel of an auditory periphery model, temporal autocorrelation delay (or lag), and time to the amount of periodic energy in that channel at that delay and time. The periodicity of sound is well represented in the correlogram. If the original sound contains a signal that is approximately periodic, such as voiced speech, then each frequency channel excited by that signal will have a high similarity to itself delayed by the period of repetition. Primarily because it is well-suited to detecting signal periodicity, the ACG model is widely considered as the preferred computational representation of early sound processing in the auditory system.

Fig. 1. Autocorrelogram for a clean speech signal spoken by a female speaker. The autocorrelogram is sampled across time and displayed as a serias of 2-dimensional graphs.

The autocorrelogram is normally sampled across time to produce a series of 2-dimensional graphs, in which frequency and autocorrelation delay are displayed on orthogonal axes (see Fig. 1). All of the frequency channels respond to a periodic signal at the rate of its fundamental frequency (F0) and this can be emphasised by summing the correlogram over all frequency channels, producing a "summary correlogram". The position of the largest peak in the summary correlogram corresponds to the F0 of the periodic sound source.

Implementation

This ACG model filters signals with a gammatone auditory filterbank and computes a short-time autocorrelation on the output of each filter with a 30 ms Hann window. The implementation is efficient as fast Fourier transform (FFT) can be exploited to compute autocorrelations, but has an effect that longer autocorrelation delays have attenuated correlation due to the narrowing of the effective window. To compensate for this effect a normalised form of autocorrelation is used. At a given time $t$

, the autocorrelation $A(i,t,\tau)$ for channel $i$

with a delay $\tau$ is given by Normalised ACG

where

is the output of the gammatone filterbank and $w$

is a local Hann window of width $K$

time steps. More details about this implementation can be found in (Ma et al., 2007).

Source code

C++ source code is available here. The program takes a WAV audio file as input and the ACG outputs are saved in a binary format as float numbers (32 bits) with a 12-byte header (3 integers) which in sequence represent the maximum delay, the number of channels and the number of frames. A Matlab script "read_acg.m" is provided to read ACG data files of such a format. See the README file for more details.

The fftw3 package (http://www.fftw.org) is required to compute FFT. Compiled fftw3 libraries for Linux, Linux 64 and cygwin are included in the source code package. On a Pentium IV 2.0 GHz Linux machine the implementation runs 10 times faster than the real-time when computing 32-channel ACGs for signals sampled at 8 kHz.

Reference

Licklider, J. (1951) A duplex theory of pitch perception. Experientia, 7:128--134.
Slaney, M. and Lyon, R. (1990) A perceptual pitch detector. In Proc. IEEE ICASSP’90, pages 357--360, Albequerque.
Ma, N., Green, P., Barker, J. and Coy, A. (2007) Exploiting correlogram structure for robust speech recognition with multiple speech sources. Speech Communication, 49 (12):874--891.

Acknowledgements

This autocorrelogram model was modified from a model originally suggested by Guy Brown.