Solved – When should we use Gibbs Sampling in a deep belief network? Before or after fine-tuning

deep learningdeep-belief-networksgibbs

Gibbs sampling allows for sampling a vector with a deep belief network. There are two steps to training a DBN for a supervised learning task: greedy unsupervised pre-training and supervised fine-tuning.

After what step should we do Gibbs Sampling? After we fine-tune our network to solve the give supervised problem?

Best Answer

It depends on which fine tuning algorithm you are using:

Back-propagation / Initialising a DNN

If you are using Back-propagation to fine-tune to solve the supervise task, then you are nonlonger working with a Deep Belief Network, but rather with a Deep Neural Network (DNN). This is commonly done, by many people including Erhan et al. The necessary appending of an output layer, and the discarding of the "downward biases" makes it a DNN, which can be then trained with Back-propagation (this process is refered to as "using the DBN to initialise the weights in a DNN" but I always think of it as "converting the DBN to a DNN"). It is also the method uses in many libraries including the Deep Learning Toolbox.

A DNN is not a generative architecture, so you can not use Gibbs Sampling to generate a input vector. Thus if you would like to generate some "dreams", then you must do it before converting to a DNN and fine-tunining. But then the generated vectors will not be of a particular class, since you can't classify them.


Generative Fine-Tuning algorithms

On the other hand, if you are using a generative fine tuning algorithm, such as the Up-Down algorithm, then you always have a DBN, and can always generate vectors at the input layer. You should do so after fine-tuning -- they will be more optimally similar to actual inputs (section 5 of 3) -- this is what fine-tuning does.


Supervised DBN Training

Figure 1 from Hinton Et Al, Fact Greed Algorithm for Training Deep Networks

If you are using an architecture that involves providing the Labels as part of the input data during training, as was done in Hinton's et al original paper (see inparicular figure 1 reproduced above, with the 10 label units and the 28x28 pixel units, which are both inputs). Then you will be able to generate inputs of a specified class, by running yours sampling repeatedly on the top layer until it stabilises (holding the label input constant), then generating the path back to the other input (the pixels in the image below). This is how Hinton et Al and others are able to generate the good images shown below (Figure 8 from the aforementioned paper).

This can be done before or after fine-tuning, but of-course after fine-tuning (with a DBN algorithm like Up-Down), it will be better.


Figure 8 from Hinton Et Al, Fact Greed Algorithm for Training Deep Networks

Related Question