Introduction
Coordinator's Report
Research Updates
Discussion Groups
SPHEAR Steering Committee
RESPITE Steering Committee
Report Back & Planning
* In the end we get paid what we have spent, on the basis of annual cost statements.
* We get an initial advance, which is gradually clawed back by deduction from yearly periodic payments.
This claw-back, plus some over-budget overhead claims, caused the discrepancy. More details in the SPHEAR Steering Cttee.
The agenda has to be fixed 2 months in advance.
A mid term report has to be written & circulated 1 month in advance.
The review occupies a whole day, and features
* The coordinator's report (1 hour!),
* A tour de table of all scientists-in-charge and task leaders - 5 minute presentations.
* A ten minute presentation by each young researcher of his work and experiences.
Each of the young researchers will also be asked to fill out a confidential questionnaire during the meeting.
A copy of the mid term review guidelines is on the private page of the SPHEAR web site.
These reports should cover `Main achievements, activities (incl. manpower) and events in the reporting period including where appropriate problems, delays, etc. encountered.'
* There is an annual event in Luxembourg at which they review all their projects.
* It takes place early February.... no date fixed yet.
* Each project review takes about 4 hours.
* Each project is represented by the coordinator and 1-2 consortium members.
This presentation will summarise the main research topics undertaken by IDIAP for the SPHEAR and RESPITE projects. There will then be a brief introduction to each person's main research activities to date, and a list of publications.
Non-stationary multi-channel processing, towards robust and adaptive ASR.
Summary of main points from Tampere keynote
MUSE: unsupervised model based on-line equalization
MUSE (MUlti-path Stochastic Equalization) offers a general framework to integrate equalization functions into classical HMM based modeling. MUSE is based on the following idea : associate an equalization function to each possible state sequence hypothesized during the decoding process and compute both the equalization function parameters and the best path with a Maximum Likelihood or Maximum A Posteriori criterion. MUSE has been implemented in HTK in the case of Bias Removal. Long-terms statistic and Maximum A Posteriori criterion have been introduced.
HMM/RBFs: Combining the advantages
of likelihood with posteriors based ASR (1 slide)
There are a number of tradition ML adaptation methods, and more recent missing data compensation methods, which can be applied to likelihood based ASR, but cannot be applied to a posteriors based HMM/ANN bybrid. RBF networks have a hidden layer which outputs Gaussian mixture based likelihoods, and a further layer which outputs posteriors. These may therefore permit us to combine the advantages from both likelihood and posteriors based approaches. One step RBF training (using e.g. HTK) is preferable to two step training.
Extensions to the full combination decomposition for multiband ASR (1 slide)
Present results for the full combination multiband approach have used either static expert weights, which should be optimal but only for clean speech, or else noise adaptive weights, which do not take any account of the static weighting. Here it is shown very briefly how decomposition of the fullband posteriors using two latent variables instead of just one leads to a simple method for combining static with adaptive weights.
Combining wavelet domain hidden-markov trees (WHMTs) with hidden markov models
Wavelet coefficients with their inherent multiresolution characteristics could be advantageous for ASR. Furthermore, the modeling of time/frequency correlations can improve recognition accuracy. The integration of those two approaches was investigated by developing a new modeling structure that uses WHMTs on top of HMMs.
Some weight estimation experiments
A short presentation of two new expert combination weight estimation methods: one based on Fletcher's "product of errors rule", the other on local maximum likelihood. Results so far have been somewhat negative but could improve after some small changes to the implementation details.
Recent full combination multiband (FCM) results with DC car noise, factory noise and with cheating
DC car noise has proved to be a lot more challenging that Noisex92 car noise. DC car noise results are compared with results for Noisex factory noise and with cheating. Cheating results show great potential for any FCM based method with a suitably intelligent system for expert weighting.
Results from previous experiments in our laboratory, indicated that ?precedence effect? (P.E.) operates on different streams. In this study we used experiments from ?auditory scene analysis? and adjusted them in order to study P.E.. The results confirmed our initial observations.
Herve Glotin (+ Frederic Berthommier, Emmanuel Tessier)
CASA labelling versus SNR estimation: study of the localisation feature
A comparison between two different approaches is realised for cocktail party speech recognition. The task is to apply speech recognition on a stereo-database which is composed of overlapped speeches, and to improve baseline recognition scores, using the localisation feature. The processing of the localisation cue allows to extract information about the relative level (i.e. the SNR) existing in time frequency regions. The two models are described. Information is adressed to a multistream recogniser as a labelling information or used to segregate the two concurrent sources, which are then recognised. Results obtained with these two models are shown, with some discussion about similarity and difference between them.
Effects of filtered noise on the perception of voiced plosives.
Hybrid-connectionist and multistream systems for the AURORA task
I will describe the baseline hybrid-connectionist system we have implemented for the AURORA noisy digits task. Following successes with similar tasks, we experimented with a multistream approach, combining conventional PLP and the novel modulation spectrogram features at the posterior-probability level. We then looked at several techniques to exploit the benefits of this approach within the standard HTK Gaussian-mixture-model system.
I will also briefly describe some work going on in data-driven multifeature design for differing conditions, and multiband pronunciation modeling.
The CASA Toolkit - A progress report
The RESPITE Computational Auditory Scene Analysis (CASA) Toolkit was conceived to provide a flexible, extensible and consistent framework for the development of CASA systems and to allow their testing on large speech corpora. It was hoped that the provision of such a framework would also facilitate the smooth integration of existing CASA software components contributed by the various RESPITE partners.
Since the first RESPITE meeting much work has been done on the development of this software. The user interface and software core are now basically complete and the toolkit is approaching its first release. This talk will give an overview of the systems design focusing on the following aspects:
i) Flexibility - describing the block processing paradigm which allows potentially complex processes to be constructed from sequences of simpler inbuilt processing primitives.
ii) Ease of use - afforded by a simple inbuilt scripting language.
iii) Extensibility - how the toolkit may be consistently extended through additions to the existing library of processing blocks.
iv) Cost - considerations of cost in terms of both computation and memory.
In the coming months development work will concentrate on `populating' the toolkit with a library of basic CASA processing blocks. It is hoped that feedback from this meeting will aid in drawing up of a list of CASA algorithms to be supplied in the early releases of the toolkit.