Neil Lawrence ML@SITraN

This page describes examples of how to use the Gaussian Process Software (GP).
The GP software can be downloaded here.
Current release is 0.137.
As well as downloading the GP software you need to obtain the toolboxes specified below. These can be downloaded using the same password you get from registering for the GP software.
Toolbox  Version 
NETLAB  3.3 
MOCAP  0.136 
NDLUTIL  0.162 
PRIOR  0.22 
MLTOOLS  0.138 
MOCAP  0.136 
OPTIMI  0.132 
DATASETS  0.1371 
KERN  0.226 
NOISE  0.141 
Updates to allow deconstruction of model files when writing to disk (gpWriteResult, gpLoadResult, gpDeconstruct, gpReconstruct).
Updates for running a GPLVM/GP using the data's inner product matrix for Interspeech synthesis demos.
Examples transfered from oxford toolbox, variational approximation from Titsias added as an option with 'dtcvar'.
Changes to allow compatibility with SGPLVM and NCCA toolboxes.
Changes to allow more flexibility in optimisation of beta.
Various minor changes for enabling back constraints in hierarchical GPLVM models.
Changes include the use of the optimiDefaultConstraint('positive') to obtain the function to constrain beta to be positive (which now returns 'exp' rather than 'negLogLogit' which was previously the default). Similarly default optimiser is now given by a command in optimiDefaultOptimiser.
The first version which is spun out of the FGPLVM toolbox. The corresponding FGPLVM toolbox is 0.15.
Release 0.1 splits away the Gaussian process section of the FGPLVM toolbox into this separate toolbox.
The GPLVM C++ software is available from here.
The IVM C++ software is available from here.
The MATLAB IVM toolbox is available here here.
The original MATLAB GPLVM toolbox is available here here.
This example shows how points which look like they come from a function to be sampled from a Gaussian distribution. The sample is 25 dimensional and is from a Gaussian with a particular covariance.
>> demGpSample
Gaussian processes are about conditioning a Gaussian distribution on the training data to make the test predictions. To illustrate this process, we can look at the joint distribution over two variables.
>> demGpCov2D([1 2])
Gives the joint distribution for f_{1} and f_{2}. The plots show the joint distributions as well as the conditional for f_{2} given f_{1}.
>> gpSample('rbf', 10, [1 1], [3 3], 1e5)
will give 10 samples from an RBF covariance function with a parameter vector given by [1 1] (inverse width 1, variance 1) across the range 3 to 3 on the xaxis. The random seed will be set to 1e5.
>> gpSample('rbf', 10, [16 1], [3 3], 1e5)
is similar, but the inverse width is now set to 16 (length scale 0.25).
Other covariance functions can be sampled, an interesting one is the MLP covariance which is non stationary and can produce point symmetric functions,
>> gpSample('mlp', 10, [100 100 1], [1 1], 1e5)gives 10 samples from the MLP covariance function where the "bias variance" is 100 (basis functions are centered around the origin with standard deviation of 10) and the "weight variance" is 100.
>> gpSample('mlp', 10, [100 1e16 1], [1 1], 1e5)gives 10 samples from the MLP covariance function where the "bias variance" is approximately zero (basis functions are placed on the origin) and the "weight variance" is 100.
Gaussian processes are nonparametric models. They are specified by their covariance function and a mean function. When combined with data observations a posterior Gaussian process is induced. The demos below show samples from that posterior.
>> gpPosteriorSample('rbf', 5, [1 1], [3 3], 1e5)and
>> gpPosteriorSample('rbf', 5, [16 1], [3 3], 1e5)
This simple demonstration plots, consecutively, an increasing number of data points, followed by an interpolated fit through the data points using a Gaussian process. This is a noiseless system, and the data is sampled from a GP with a known covariance function. The curve is then recovered with minimal uncertainty after only nine data points are included. The code is run with
>> demInterpolation
The regression demo very much follows the format of the interpolation demo. Here the difference is that the data is sampled with noise. Fitting a model with noise means that the regression will not necessarily pass right through each data point. The code is run with
>> demRegression
One of the advantages of Gaussian processes over pure kernel interpretations of regression is the ability to select the hyper parameters of the kernel automatically. The demo
>> demOptimiseGp
shows a series of plots of a Gaussian process with different length scales fitted to six data points. For each plot there is a corresponding plot of the log likelihood. The log likelihood peaks for a length scale equal to 1. This was the length scale used to generate the data.
As a simple example of regression for real data we consider a motion capture data set. The data is from Ohio State University. In the example script we perform Gaussian process regression with time as the input and the x,y,z position of the marker attached to the left ankle. To demonstrate the behavior of the model when the marker is lost, we remove data from This code can be run with
>> demStickGp1
The code will optimize hyper parameters and show plots of the posterior process through the training data and the missing test points.
The result of the script is given in the plot below.
Notice how the error bars are tight except in the region where the training data is missing and in the region where the training data disappears.
The sparse approximation used in this toolbox is based on the Sparse Pseudoinput Gaussian Process model described by Snelson and Ghahramani. Also provided are the extensions suggested by QuiñoneroCandela and Rasmussen. They provide a unifying terminology for describing these approximations which we shall use in what follows.
There are three demos provided for Gaussian process regression in 1D. They each use a different form of likelihood approximation. The first demonstration uses the "projected latent variable" approach first described by Csato and Opper and later used by Seeger et al.. In the terminology of QuiñoneroCandela and Rasmussen (QRterminology) this is known as the "deterministic training conditional" (DTC) approximation.
To use this approximation the following script can be run.
>> demSpgp1dGp1
The result of the script is given in the plot below.
The improved approximation suggested by Snelson and Ghahramani, in QRterminology this is known as the fully independent training conditional (FITC). To try this approximation run the following script
>> demSpgp1dGp2
The result of the script is given on the left of the plot below.
At the Sheffield Gaussian Process Round Table Lehel Csato pointed out that the Bayesian Committee Machine of Schwaighofer and Tresp can also be viewed within the same framework. This idea is formalised in QuiñoneroCandela and Rasmussen's review. This approximation is known as the "partially independent training conditional" (PITC) in QRterminology. To try this approximation run the following script
>> demSpgp1dGp3
The result of the script is given on the right of the plot above.
Finally we can compare these results to the result from the full Gaussian process on the data with the correct hyperparameters. To do this the following script can be run.
>> demSpgp1dGp4
The result of the script is given in the plot below.