RESPITE/SPHEAR Workshop

Les Marecottes, 13/14 Sept 99

Location & Joining Instructions

PROGRAMME

Delegates

Bochum

Karsten Lehn

Christos Tsakostas

Daimler-Chrysler

Fritz Class

Joan Mari

FPMs

Christophe Ris

ICP

Herve Glotin

ICSI

Dan Ellis

IDIAP

Andrew Morris

Herve Bourlard

Astrid Hagen

Christopher Kermorvant

Keele

Bill Ainsworth

MATRA

Catherine Glorion

Philip Lockwood

Patras

John Mourjopoulos

Joerg Buchholz

Sheffield

Phil Green

Martin Cooke

Ascencion Vizinho

Jon Barker

Agenda

Introduction

Coordinator's Report

Research Updates

Discussion Groups

SPHEAR Steering Committee

RESPITE Steering Committee

Report Back & Planning

Coordinator's Report

SPHEAR

1st Periodic Payment

Arrived in Sheffield in July & has recently been disributed to partners. The delay was cause by querying a deduction of around 8KEuro from what we were expecting. In fact the way the finances work is

* In the end we get paid what we have spent, on the basis of annual cost statements.

* We get an initial advance, which is gradually clawed back by deduction from yearly periodic payments.

This claw-back, plus some over-budget overhead claims, caused the discrepancy. More details in the SPHEAR Steering Cttee.

Mid-Term Review

The SPHEAR mid-term review will take place in Bochum on Monday 28th Feb, following our workshop on Saturday 26th and Sunday 27th. Our project Officer, Christiane Bernard, and an expert reviewer will be present.

The agenda has to be fixed 2 months in advance.

A mid term report has to be written & circulated 1 month in advance.

The review occupies a whole day, and features

* The coordinator's report (1 hour!),

* A tour de table of all scientists-in-charge and task leaders - 5 minute presentations.

* A ten minute presentation by each young researcher of his work and experiences.

Each of the young researchers will also be asked to fill out a confidential questionnaire during the meeting.

A copy of the mid term review guidelines is on the private page of the SPHEAR web site.

Three-monthly reports. The next is due at the end of October. In the one after that, at the end of December, I'll need material for the mid term report.

RESPITE

Management Reports

I've agreed to give the RESPITE project officer, Antonio Sanfilippo, 2-monthly management reports. The first of these was extened to cover the first 6 months & submitted at the end of July (& is on the private page). Next is due end of September.

These reports should cover `Main achievements, activities (incl. manpower) and events in the reporting period including where appropriate problems, delays, etc. encountered.'

Monthly Reports

We said we'd review whether the monthly reports are worthwile at this meeting...

Year 1 Report

... and cost statements due end December.

Annual Review

RESPITE now comes under Unit 4 of DG XIII, Human Language Technologies, in Luxembourg. They do their reviews like so:

* There is an annual event in Luxembourg at which they review all their projects.

* It takes place early February.... no date fixed yet.

* Each project review takes about 4 hours.

* Each project is represented by the coordinator and 1-2 consortium members.

* There are 2-3 reviewers

International Workshop

The RESPITE programme says we will organise an International Workshop around month 27 (March 2001).

Web Sites

It really is important to keep our web sites up-to-date and lively.

Research Updates

Herve Bourlard

Overview of SPHEAR/RESPITE work at IDIAP

This presentation will summarise the main research topics undertaken by IDIAP for the SPHEAR and RESPITE projects. There will then be a brief introduction to each person's main research activities to date, and a list of publications.

Herve Bourlard

Non-stationary multi-channel processing, towards robust and adaptive ASR.

Summary of main points from Tampere keynote

Christopher Kermorvant

MUSE: unsupervised model based on-line equalization

MUSE (MUlti-path Stochastic Equalization) offers a general framework to integrate equalization functions into classical HMM based modeling. MUSE is based on the following idea : associate an equalization function to each possible state sequence hypothesized during the decoding process and compute both the equalization function parameters and the best path with a Maximum Likelihood or Maximum A Posteriori criterion. MUSE has been implemented in HTK in the case of Bias Removal. Long-terms statistic and Maximum A Posteriori criterion have been introduced.

Andrew Morris

HMM/RBFs: Combining the advantages of likelihood with posteriors based ASR (1 slide)

There are a number of tradition ML adaptation methods, and more recent missing data compensation methods, which can be applied to likelihood based ASR, but cannot be applied to a posteriors based HMM/ANN bybrid. RBF networks have a hidden layer which outputs Gaussian mixture based likelihoods, and a further layer which outputs posteriors. These may therefore permit us to combine the advantages from both likelihood and posteriors based approaches. One step RBF training (using e.g. HTK) is preferable to two step training.

Andrew Morris

Extensions to the full combination decomposition for multiband ASR (1 slide)

Present results for the full combination multiband approach have used either static expert weights, which should be optimal but only for clean speech, or else noise adaptive weights, which do not take any account of the static weighting. Here it is shown very briefly how decomposition of the fullband posteriors using two latent variables instead of just one leads to a simple method for combining static with adaptive weights.

Katrin Keller

Combining wavelet domain hidden-markov trees (WHMTs) with hidden markov models

Wavelet coefficients with their inherent multiresolution characteristics could be advantageous for ASR. Furthermore, the modeling of time/frequency correlations can improve recognition accuracy. The integration of those two approaches was investigated by developing a new modeling structure that uses WHMTs on top of HMMs.

Astrid Hagen

Some weight estimation experiments

A short presentation of two new expert combination weight estimation methods: one based on Fletcher's "product of errors rule", the other on local maximum likelihood. Results so far have been somewhat negative but could improve after some small changes to the implementation details.

Astrid Hagen

Recent full combination multiband (FCM) results with DC car noise, factory noise and with cheating

DC car noise has proved to be a lot more challenging that Noisex92 car noise. DC car noise results are compared with results for Noisex factory noise and with cheating. Cheating results show great potential for any FCM based method with a suitably intelligent system for expert weighting.

Christos Tsakostas

Precedence Effect & streams

Results from previous experiments in our laboratory, indicated that ?precedence effect? (P.E.) operates on different streams. In this study we used experiments from ?auditory scene analysis? and adjusted them in order to study P.E.. The results confirmed our initial observations.

Herve Glotin (+ Frederic Berthommier, Emmanuel Tessier)

CASA labelling versus SNR estimation: study of the localisation feature

A comparison between two different approaches is realised for cocktail party speech recognition. The task is to apply speech recognition on a stereo-database which is composed of overlapped speeches, and to improve baseline recognition scores, using the localisation feature. The processing of the localisation cue allows to extract information about the relative level (i.e. the SNR) existing in time frequency regions. The two models are described. Information is adressed to a multistream recogniser as a labelling information or used to segregate the two concurrent sources, which are then recognised. Results obtained with these two models are shown, with some discussion about similarity and difference between them.

Bill Ainsworth

Effects of filtered noise on the perception of voiced plosives.

Dan Ellis

Hybrid-connectionist and multistream systems for the AURORA task

I will describe the baseline hybrid-connectionist system we have implemented for the AURORA noisy digits task. Following successes with similar tasks, we experimented with a multistream approach, combining conventional PLP and the novel modulation spectrogram features at the posterior-probability level. We then looked at several techniques to exploit the benefits of this approach within the standard HTK Gaussian-mixture-model system.

I will also briefly describe some work going on in data-driven multifeature design for differing conditions, and multiband pronunciation modeling.

Jon Barker

The CASA Toolkit - A progress report

The RESPITE Computational Auditory Scene Analysis (CASA) Toolkit was conceived to provide a flexible, extensible and consistent framework for the development of CASA systems and to allow their testing on large speech corpora. It was hoped that the provision of such a framework would also facilitate the smooth integration of existing CASA software components contributed by the various RESPITE partners.

Since the first RESPITE meeting much work has been done on the development of this software. The user interface and software core are now basically complete and the toolkit is approaching its first release. This talk will give an overview of the systems design focusing on the following aspects:

i) Flexibility - describing the block processing paradigm which allows potentially complex processes to be constructed from sequences of simpler inbuilt processing primitives.

ii) Ease of use - afforded by a simple inbuilt scripting language.

iii) Extensibility - how the toolkit may be consistently extended through additions to the existing library of processing blocks.

iv) Cost - considerations of cost in terms of both computation and memory.

In the coming months development work will concentrate on `populating' the toolkit with a library of basic CASA processing blocks. It is hoped that feedback from this meeting will aid in drawing up of a list of CASA algorithms to be supplied in the early releases of the toolkit.