Subsections

2. Boltzmann Machines and Deep Belief Networks

The equations can be seen on http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DBNEquations.

All the code files are located in $PLEARNDIR/plearn_learners/online.

2.1 Architecture

2.1.1 Restricted Boltzmann Machines

A Restricted Boltzmann Machine (RBM) is composed of two different layers of units, with weighted connection between them.

The layers are modelled by the RBMLayer class, while the connections are represented by RBMConnection. Different sub-classes implement the multiple types of layers and connections. RBMLayer and RBMConnection both inherit from OnlineLearningModule.

An RBM can therefore be considered as a structure containing two instances of RBMLayer and one of RBMConnection, but there is no class modelling an RBM for the moment.

2.1.2 Deep Belief Networks

A Deep Belief Network (DBN) is a learning algorithm, therefore contained in a PLearner, namely DeepBeliefNet.

It is composed of stacked RBMs. The units of a layer are shared between two RBMs, hence the need of dissociating layers and connections. A DeepBeliefNet containing $n$ unit layers (including input and output layers) will typically contain $n$ instances of RBMLayer and $n-1$ instances of RBMConnection.

The training is usually done one layer at a time, each layer being trained as an RBM. See part [*] for the detailed explanation.

There are no functions for sampling from the learned probability distribution yet, they might be added at some point in time.

2.2 Code Components

Both classes inherit from OnlineLearningModule, so they have deterministic fprop(...) and bpropUpdate(...) functions, that can be chained.

2.2.1 RBMLayer

This class models a set of (usually independant) units, some of their intrinsic parameters, and their current state.

RBMLayer stores:

The methods are:

And from the OnlineLearningModule interface:

Different types of units (binomial, Gaussian, even groups of units representing a multinomial distribution, etc.), so this class has several derived sub-classes, which may store more information (like a quadratic parameter, and the standard deviation for a Gaussian unit) and use them in the accumulate{Pos,Neg}Stats(...) and update() methods.

List of known sub-classes:

2.2.2 RBMParameters

This class represents a linear transformation (not affine! the bias is in the RBMLayer), used to compute one layer's activation given the other layer's value.

RBMConnection stores (and has to update):

The different sub-classes will store differently the parameters allowing to compute the linear transformation, and the statistics used to update those parameters (usually named [paramname]_pos_stats and [paramname]_neg_stats).

The methods are:

And from the OnlineLearningModule interface:

input_grad = weights' * output_grad;
weights -= learning_rate * output_grad * input';

List of known subclasses, and their parameters:

2.3 Code Samples

2.3.1 Propagation in an RBM

In the simple case of a Restricted Boltzmann Machine, we have two instances of RBMLayer (input and hidden) and one of RBMConnection (rbmc) linking both of them.

Getting in hidden_exp the expected value of the hidden layer, given one input sample input_sample, is easy:

input.sample << input_sample;
rbmc.setAsDownInput( input.sample );
hidden.getAllActivations( rbmc );
hidden.computeExpectation();
hidden_exp << hidden.expectation;

If we want a sample hidden_sample instead, it is:

input.sample << input_sample;
rbmc.setAsDownInput( input.sample );
hidden.getAllActivations( rbmc );
hidden.generateSample();
hidden_sample << hidden.sample;

2.3.2 Step of Contrastive Divergence in an RBM

One step of contrastive divergence learning (with only one example, input_sample) in the same RBM would be:

// positive phase
input.sample << input_sample;
rbmc.setAsDownInput( input.sample );
hidden.getAllActivations( rbmc );
hidden.computeExpectation();
hidden.generateSample();
input.accumulatePosStats( input.sample );
rbmc.accumulatePosStats( input.sample, hidden.expectation );
hidden.accumulatePosStats( hidden.expectation );

// down propagation
rbmc.setAsUpInput( hidden.sample );
input.getAllActivations( rbmc );
input.generateSample();

// negative phase
rbmc.setAsDownInput( input.sample );
hidden.getAllActivations( rbmc );
hidden.computeExpectation();
input.accumulateNegStats( input.sample );
rbmc.accumulateNegStats( input.sample, hidden.expectation );
hidden.accumulateNegStats( hidden.expectation );

// update
input.update();
rbmc.update();
hidden.update();

Note: it was empirically shown that the convergence is better if we use hidden.expectation instead of hidden.sample in the statistics.

Or update(..., ...)

2.3.3 Learning in a DBN

Instead of having only one RBM, let's consider three sequential layers (input, hidden, output) and two connections:

They form a (small) DBN.

We first train the first RBM formed by (input, rbmc_ih, hidden) as shown previously, ignoring the other elements. Then, we freeze the parameters of input and rbmc_ih, and train the second RBM, formed by (hidden, rbmc_ho, output) taking the output of the first one as inputs.

One step of this second phase (with only one example, input_sample) will look like:

// propagation to hidden
input.sample << input_sample;
rbmc_ih.setAsDownInput( input.sample );
hidden.getAllActivations( rbmc_ih );
hidden.computeExpectation(); // we use mean-field approximation

// positive phase
rbmc_ho.setAsDownInput( hidden.expectation );
output.getAllActivations( rbmc_ho );
output.computeExpectation();
output.generateSample();
hidden.accumulatePosStats( hidden.expectation );
rbmc_ho.accumulatePosStats( hidden.expectation, output.expectation );
output.accumulatePosStats( output.expectation );

// down propagation
rbmc_ho.setAsUpInput( output.sample );
hidden.getAllActivations( rbmc_ho );
hidden.generateSample();

// negative phase
rbmc_ho.setAsDownInput( hidden.sample );
output.getUnitActivations( rbmc_ho );
output.computeExpectation();
hidden.accumulateNegStats( hidden.sample );
rbmc_ho.accumulateNegStats( hidden.sample, output.expectation );
output.accumulateNegStats( output.expectation );

// update
hidden.update();
rbmc_ho.update();
output.update();


2.4 The DeepBeliefNet Class

To be continued...