The CHiME Corpus
CHiME is collecting a corpus of domestic audio recorded using a binaural manikin. Recordings are being made in the living rooms, dining rooms and kitchens of a couple of real homes. In addition to the audio recordings, rooms impulse responses are being sampled at a systematic set of positions in each room. The responses will allow external speech recordings to be mixed into the background audio in a realistic and yet carefully controllable manner.
Once complete, the corpus will be made freely available for research use.
Recording Set Up

The CHiME recording setup consists of a pair of B&K microphones mounted in a B&K anthropometric manikin (head and torso) connected to a MacBook Pro which records 96KHz binaural signals direct to disk via a MOTU A-D unit. Impulse responses are measured with the same equipment using
Farina's sine sweep method. The sine-sweeps are played through a B&K artificial mouth in order to simulate the directivity of natural speech.
Recordings have been made in a number of sessions sampling roughly a week's worth of family activity in each room. Impulse responses for each room have been made at a range of distances and angles with respect to the B&K head.
Audio Samples

The figure to the right shows an example of a time-frequency representation of a twenty second segment of data that has already been recorded. This acoustically cluttered example illustrates some of the huge challenges presented by the target scenario: The speech is embedded in a noise background that although quasi-stationary can change abruptly in response to unpredictable events occurring in the room (doors opening, appliances being turned on or off); on top of the background there are abrupt impact noises such as footsteps and doors banging that can mask even highly energetic portions of the speech signal; there may be multiple speakers in the room producing overlapping speech; not all speech will be directed at the system.
Some audio examples are available
here.
Further details
A technical report providing details of the recordings will appear shortly.