ICML 2008 Tutorial: Dimensionality Reduction the Probabilistic Way
Neil D. Lawrence
Goals and Summary
The main focus of this tutorial will be probabilistic interpretations of dimensional reduction. It is aimed to complement the tutorial given by Lawrence Saul at NIPS 2005 on "Spectral Methods for Dimensional Reduction". Its particular focus will be probabilistic approaches to dimensional reduction based on generative models. These approaches have become increasingly popular in graphics and vision through the Gaussian Process Latent Variable Model. However, there also is a history to these methods which is perhaps less widely known amongst the newer generation of researchers. In particular the Generative Topographic Mapping and Latent Density Networks. This tutorial will give grounding to these methods through unifying them in the context of probabilistic latent variable models. This will involve a introduction to these approaches through the mechanism of probabilistic PCA, then a discussion of density networks leading into the generative topographic mapping. Finally the dual interpretation of probabilistic PCA and its extension to the GP-LVM will be given. Throughout the tutorial we will develop intuition about the methods with an ongoing set of example data sets. A particular focus of these example data sets will be motion capture data. Motion capture data is a nice example to use because it is easy for the human eye to tell when samples from the model are realistic.
One aspect of the tutorial will be the difference between the probabilistic approaches and the more commonly applied spectral approaches. In particular we will emphasize the distance preservation character of the probabilistic approaches: namely that local distances in the data are not necessarily preserved in the latent space. This contrasts with spectral algorithms which typically aim to preserve such local distances. These different characteristics mean that probabilistic approaches complement the spectral approaches, but the bring their own range of associated problems, in particular local minima in the optimization space. Heuristics for avoiding these local minima will also be discussed.
Slides are now available here.
Code is available from here.
Tutorial Outline
Motivation
- Why do we need to do dimensionality reduction?
Some motivation based on the difficulties of high dimensional spaces.
Probabilistic Dimensionality Reduction
- Introduction of the probabilistic approach to dimensional reduction and latent variable models.
Introduction based around with probabilistic PCA.
Spectral Methods and Probabilistic Methods Comparison
- How the distance preservation is different.
PCA maps both ways, but non-linear algorithms ``map'' in one direction.
The nature of the distance preservation in spectral methods implies that there could be a smooth mapping from data to latent space. In probabilistic approaches, by construction, the mapping is from the latent to data space. Discussion of classical scaling and how it relates to techniques such as LLE, Isomap and Maximum Variance Unfolding. Why probabilistic methods are different, what their disadvantages are (e.g. local minima) but what their advantages are (e.g. ease of dealing with missing data and noisy data).
Density Networks and GTM
- Density Networks
Problems with propagating latent densities through non-linear models. Solution via Importance Sampling
- Generative Topographic Mappings
Algorithm details and examples. Starting with the Density Network approach we will evolve into the GTM, which can either be interpreted as importance sampling with an uniformly spaced grid, or as an EM algorithm.
GP-LVM
- Dual Probabilistic PCA
Back to mathematical details again for the proof of dual probabilistic PCA. The main idea of this section will be to show that we can consider marginalising the mapping instead of the latent variables.
- Gaussian Processes
A quick refresher on Gaussian processes, not much maths, but a reminder as to what priors over functions are.
- Gaussian Process Latent Variable Models
Review work on the GP-LVM with a focus on applications: tracking, robotics etc.
Brief Bio of Speaker
Motivation
- Why do we need to do dimensionality reduction?
Some motivation based on the difficulties of high dimensional spaces.
Probabilistic Dimensionality Reduction
- Introduction of the probabilistic approach to dimensional reduction and latent variable models.
Introduction based around with probabilistic PCA.
Spectral Methods and Probabilistic Methods Comparison
- How the distance preservation is different.
PCA maps both ways, but non-linear algorithms ``map'' in one direction.
The nature of the distance preservation in spectral methods implies that there could be a smooth mapping from data to latent space. In probabilistic approaches, by construction, the mapping is from the latent to data space. Discussion of classical scaling and how it relates to techniques such as LLE, Isomap and Maximum Variance Unfolding. Why probabilistic methods are different, what their disadvantages are (e.g. local minima) but what their advantages are (e.g. ease of dealing with missing data and noisy data).
Density Networks and GTM
- Density Networks
Problems with propagating latent densities through non-linear models. Solution via Importance Sampling
- Generative Topographic Mappings
Algorithm details and examples. Starting with the Density Network approach we will evolve into the GTM, which can either be interpreted as importance sampling with an uniformly spaced grid, or as an EM algorithm.
GP-LVM
- Dual Probabilistic PCA
Back to mathematical details again for the proof of dual probabilistic PCA. The main idea of this section will be to show that we can consider marginalising the mapping instead of the latent variables.
- Gaussian Processes
A quick refresher on Gaussian processes, not much maths, but a reminder as to what priors over functions are.
- Gaussian Process Latent Variable Models
Review work on the GP-LVM with a focus on applications: tracking, robotics etc.
Brief Bio of Speaker
Neil Lawrence is a Senior Research Fellow in the School of Computer Science at the University of Manchester, U.K.. Previous to this appointment he was a Senior Lecturer in the Department of Computer Science at the University of Sheffield, U.K. where he was head of the Machine Learning Research Group. His main research interest is machine learning through probabilistic models. He is interested in both the algorithmic side of these models and their application in areas such as computational systems biology, health, speech, vision and graphics.
His PhD was awarded in 2000 from the Computer Lab at the University of Cambridge. He then spent a year at Microsoft Research, Cambridge before moving to Sheffield in 2001 and then to Manchester in 2007.
This document last modified Thursday, 09-Oct-2014 19:42:56 BST