RESPITE/SPHEAR Workshop

Les Marecottes, 13/14 Sept 99


Location & Joining Instructions


PROGRAMME

 

 

Delegates

Bochum
Karsten Lehn
Christos Tsakostas
Daimler-Chrysler
Fritz Class
Joan Mari
FPMs
Christophe Ris
ICP
Herve Glotin
ICSI
Dan Ellis
IDIAP
Andrew Morris
Herve Bourlard

Astrid Hagen

Christopher Kermorvant

Keele
Bill Ainsworth
MATRA
Catherine Glorion
Philip Lockwood
Patras
John Mourjopoulos
Joerg Buchholz
Sheffield
Phil Green
Martin Cooke

Ascencion Vizinho

Jon Barker

Agenda

  • Introduction
  • Coordinator's Report
  • Research Updates
  • Discussion Groups
  • SPHEAR Steering Committee
  • RESPITE Steering Committee
  • Report Back & Planning
  • Coordinator's Report

    SPHEAR

    RESPITE

    Research Updates

    Herve Bourlard
    Overview of SPHEAR/RESPITE work at IDIAP

    This presentation will summarise the main research topics undertaken by IDIAP for the SPHEAR and RESPITE projects. There will then be a brief introduction to each person's main research activities to date, and a list of publications.

    Herve Bourlard

    Non-stationary multi-channel processing, towards robust and adaptive ASR.

    Summary of main points from Tampere keynote

    Christopher Kermorvant

    MUSE: unsupervised model based on-line equalization

    MUSE (MUlti-path Stochastic Equalization) offers a general framework to integrate equalization functions into classical HMM based modeling. MUSE is based on the following idea : associate an equalization function to each possible state sequence hypothesized during the decoding process and compute both the equalization function parameters and the best path with a Maximum Likelihood or Maximum A Posteriori criterion. MUSE has been implemented in HTK in the case of Bias Removal. Long-terms statistic and Maximum A Posteriori criterion have been introduced.

    Andrew Morris

    HMM/RBFs: Combining the advantages of likelihood with posteriors based ASR (1 slide)
     
     

    There are a number of tradition ML adaptation methods, and more recent missing data compensation methods, which can be applied to likelihood based ASR, but cannot be applied to a posteriors based HMM/ANN bybrid. RBF networks have a hidden layer which outputs Gaussian mixture based likelihoods, and a further layer which outputs posteriors. These may therefore permit us to combine the advantages from both likelihood and posteriors based approaches. One step RBF training (using e.g. HTK) is preferable to two step training.

    Andrew Morris

    Extensions to the full combination decomposition for multiband ASR (1 slide)

    Present results for the full combination multiband approach have used either static expert weights, which should be optimal but only for clean speech, or else noise adaptive weights, which do not take any account of the static weighting. Here it is shown very briefly how decomposition of the fullband posteriors using two latent variables instead of just one leads to a simple method for combining static with adaptive weights.

    Katrin Keller

    Combining wavelet domain hidden-markov trees (WHMTs) with hidden markov models

    Wavelet coefficients with their inherent multiresolution characteristics could be advantageous for ASR. Furthermore, the modeling of time/frequency correlations can improve recognition accuracy. The integration of those two approaches was investigated by developing a new modeling structure that uses WHMTs on top of HMMs.

    Astrid Hagen

    Some weight estimation experiments

    A short presentation of two new expert combination weight estimation methods: one based on Fletcher's "product of errors rule", the other on local maximum likelihood. Results so far have been somewhat negative but could improve after some small changes to the implementation details.

    Astrid Hagen

    Recent full combination multiband (FCM) results with DC car noise, factory noise and with cheating

    DC car noise has proved to be a lot more challenging that Noisex92 car noise. DC car noise results are compared with results for Noisex factory noise and with cheating. Cheating results show great potential for any FCM based method with a suitably intelligent system for expert weighting.

    Christos Tsakostas

    Precedence Effect & streams

    Results from previous experiments in our laboratory, indicated that ?precedence effect? (P.E.) operates on different streams. In this study we used experiments from ?auditory scene analysis? and adjusted them in order to study P.E.. The results confirmed our initial observations.

    Herve Glotin (+ Frederic Berthommier, Emmanuel Tessier)

    CASA labelling versus SNR estimation: study of the localisation feature

    A comparison between two different approaches is realised for cocktail party speech recognition. The task is to apply speech recognition on a stereo-database which is composed of overlapped speeches, and to improve baseline recognition scores, using the localisation feature. The processing of the localisation cue allows to extract information about the relative level (i.e. the SNR) existing in time frequency regions. The two models are described. Information is adressed to a multistream recogniser as a labelling information or used to segregate the two concurrent sources, which are then recognised. Results obtained with these two models are shown, with some discussion about similarity and difference between them.

    Bill Ainsworth

    Effects of filtered noise on the perception of voiced plosives.

    Dan Ellis

    Hybrid-connectionist and multistream systems for the AURORA task

    I will describe the baseline hybrid-connectionist system we have implemented for the AURORA noisy digits task. Following successes with similar tasks, we experimented with a multistream approach, combining conventional PLP and the novel modulation spectrogram features at the posterior-probability level. We then looked at several techniques to exploit the benefits of this approach within the standard HTK Gaussian-mixture-model system.

    I will also briefly describe some work going on in data-driven multifeature design for differing conditions, and multiband pronunciation modeling.

    Jon Barker

    The CASA Toolkit - A progress report

    The RESPITE Computational Auditory Scene Analysis (CASA) Toolkit was conceived to provide a flexible, extensible and consistent framework for the development of CASA systems and to allow their testing on large speech corpora. It was hoped that the provision of such a framework would also facilitate the smooth integration of existing CASA software components contributed by the various RESPITE partners.

    Since the first RESPITE meeting much work has been done on the development of this software. The user interface and software core are now basically complete and the toolkit is approaching its first release. This talk will give an overview of the systems design focusing on the following aspects:

    i) Flexibility - describing the block processing paradigm which allows potentially complex processes to be constructed from sequences of simpler inbuilt processing primitives.

    ii) Ease of use - afforded by a simple inbuilt scripting language.

    iii) Extensibility - how the toolkit may be consistently extended through additions to the existing library of processing blocks.

    iv) Cost - considerations of cost in terms of both computation and memory.

    In the coming months development work will concentrate on `populating' the toolkit with a library of basic CASA processing blocks. It is hoped that feedback from this meeting will aid in drawing up of a list of CASA algorithms to be supplied in the early releases of the toolkit.