The equations can be seen on http://www.iro.umontreal.ca/~lisa/twiki/bin/view.cgi/Public/DBNEquations.
All the code files are located in $PLEARNDIR/plearn_learners/online.
A Restricted Boltzmann Machine (RBM) is composed of two different layers of units, with weighted connection between them.
The layers are modelled by the RBMLayer class, while the connections are represented by RBMConnection. Different sub-classes implement the multiple types of layers and connections. RBMLayer and RBMConnection both inherit from OnlineLearningModule.
An RBM can therefore be considered as a structure containing two instances of RBMLayer and one of RBMConnection, but there is no class modelling an RBM for the moment.
A Deep Belief Network (DBN) is a learning algorithm, therefore contained in a PLearner, namely DeepBeliefNet.
It is composed of stacked RBMs. The units of a layer are shared between two RBMs, hence the need of dissociating layers and connections. A DeepBeliefNet containing unit layers (including input and output layers) will typically contain instances of RBMLayer and instances of RBMConnection.
The training is usually done one layer at a time, each layer being trained as an RBM. See part for the detailed explanation.
There are no functions for sampling from the learned probability distribution yet, they might be added at some point in time.
Both classes inherit from OnlineLearningModule, so they have deterministic fprop(...) and bpropUpdate(...) functions, that can be chained.
This class models a set of (usually independant) units, some of their intrinsic parameters, and their current state.
RBMLayer stores:
The methods are:
bias_pos_stats += pos_values; pos_count++;
bias_neg_stats += neg_values; neg_count++;
bias -= learning_rate * (bias_pos_stats/pos_count - bias_neg_stats/neg_count) # reset bias_pos_stats.clear(); bias_neg_stats.clear(); pos_count = 0; neg_count = 0;
And from the OnlineLearningModule interface:
output = sigmoid( -(input + bias) );
Different types of units (binomial, Gaussian, even groups of units representing a multinomial distribution, etc.), so this class has several derived sub-classes, which may store more information (like a quadratic parameter, and the standard deviation for a Gaussian unit) and use them in the accumulate{Pos,Neg}Stats(...) and update() methods.
List of known sub-classes:
This class represents a linear transformation (not affine! the bias is in the RBMLayer), used to compute one layer's activation given the other layer's value.
RBMConnection stores (and has to update):
The different sub-classes will store differently the parameters allowing to compute the linear transformation, and the statistics used to update those parameters (usually named [paramname]_pos_stats and [paramname]_neg_stats).
The methods are:
if( up ): for i=start to start+length: activations[i-start] += sum_j weights(i,j) input_vec[j] else: for j=start to start+length: activations[j-start] += sum_i weights(i,j) input_vec[i]
weights_pos_stats += up_values * down_values'; pos_count++;
weights_neg_stats += up_values * down_values'; neg_count++;
weights -= learning_rate * (weights_pos_stats/pos_count - weight_neg_stats/neg_count); # reset weights_pos_stats.clear(); weights_neg_stats.clear(); pos_count = 0; neg_count = 0;
output = weights * input;
And from the OnlineLearningModule interface:
input_grad = weights' * output_grad; weights -= learning_rate * output_grad * input';
List of known subclasses, and their parameters:
In the simple case of a Restricted Boltzmann Machine, we have two instances of RBMLayer (input and hidden) and one of RBMConnection (rbmc) linking both of them.
Getting in hidden_exp the expected value of the hidden layer, given one input sample input_sample, is easy:
input.sample << input_sample; rbmc.setAsDownInput( input.sample ); hidden.getAllActivations( rbmc ); hidden.computeExpectation(); hidden_exp << hidden.expectation;
If we want a sample hidden_sample instead, it is:
input.sample << input_sample; rbmc.setAsDownInput( input.sample ); hidden.getAllActivations( rbmc ); hidden.generateSample(); hidden_sample << hidden.sample;
One step of contrastive divergence learning (with only one example, input_sample) in the same RBM would be:
// positive phase input.sample << input_sample; rbmc.setAsDownInput( input.sample ); hidden.getAllActivations( rbmc ); hidden.computeExpectation(); hidden.generateSample(); input.accumulatePosStats( input.sample ); rbmc.accumulatePosStats( input.sample, hidden.expectation ); hidden.accumulatePosStats( hidden.expectation ); // down propagation rbmc.setAsUpInput( hidden.sample ); input.getAllActivations( rbmc ); input.generateSample(); // negative phase rbmc.setAsDownInput( input.sample ); hidden.getAllActivations( rbmc ); hidden.computeExpectation(); input.accumulateNegStats( input.sample ); rbmc.accumulateNegStats( input.sample, hidden.expectation ); hidden.accumulateNegStats( hidden.expectation ); // update input.update(); rbmc.update(); hidden.update();
Note: it was empirically shown that the convergence is better if we use hidden.expectation instead of hidden.sample in the statistics.
Or update(..., ...)
Instead of having only one RBM, let's consider three sequential layers (input, hidden, output) and two connections:
We first train the first RBM formed by (input, rbmc_ih, hidden) as shown previously, ignoring the other elements. Then, we freeze the parameters of input and rbmc_ih, and train the second RBM, formed by (hidden, rbmc_ho, output) taking the output of the first one as inputs.
One step of this second phase (with only one example, input_sample) will look like:
// propagation to hidden input.sample << input_sample; rbmc_ih.setAsDownInput( input.sample ); hidden.getAllActivations( rbmc_ih ); hidden.computeExpectation(); // we use mean-field approximation // positive phase rbmc_ho.setAsDownInput( hidden.expectation ); output.getAllActivations( rbmc_ho ); output.computeExpectation(); output.generateSample(); hidden.accumulatePosStats( hidden.expectation ); rbmc_ho.accumulatePosStats( hidden.expectation, output.expectation ); output.accumulatePosStats( output.expectation ); // down propagation rbmc_ho.setAsUpInput( output.sample ); hidden.getAllActivations( rbmc_ho ); hidden.generateSample(); // negative phase rbmc_ho.setAsDownInput( hidden.sample ); output.getUnitActivations( rbmc_ho ); output.computeExpectation(); hidden.accumulateNegStats( hidden.sample ); rbmc_ho.accumulateNegStats( hidden.sample, output.expectation ); output.accumulateNegStats( output.expectation ); // update hidden.update(); rbmc_ho.update(); output.update();
To be continued...