Solved – Basic confusion about Restricted Boltzmann Machines (RBM)

neural networksrestricted-boltzmann-machine

As I understand it, the standard restricted Boltzmann machine (RBM) exhibits binary stochastic visible and hidden units. The joint probability of the binary and visible units is given by the Boltzmann factor familiar from statistical physics:

$$ P(v,h) = \frac{e^{-E(v,h)}}{Z} $$
where the energy and partition function are given by
$$ E(v,h) = -(a_i v_i + b_j h_j + v_i W_{ij} h_j) $$
$$ Z = \sum_{\text{configurations}} e^{-E(v,h)} $$

A particular configuration consists of two sets of binary vectors, $(v,h)$, and the sum over all configurations then corresponds to summing over all possible such pairs.

There is another type of RBM, known as a Gaussian RBM, which makes use of continuous units, so that $(v,h)$ are real valued. Clearly the sum over configurations must now be modified.

Consider now the MNIST data set, where the visible units correspond to integer-valued pixel values ranging from 0 to 255. As is, these visible units do not work with either RBM algorithm. One solution would be to expand the 256-valued discrete vectors into larger binary vectors, and to then use the first RBM. Another solution would be to divide the pixels by 255, so that they then lie within the unit interval. They could then be taken to be real, and the Gaussian RBM applied.

My confusion is that I have found a few cases, such as

http://www.pyimagesearch.com/2014/06/23/applying-deep-learning-rbm-mnist-using-python/
https://gist.github.com/dwf/359323

where the data was rescaled to lie within [0,1] AND the RBM was taken to have stochastic binary units. Could someone please explain to me why this is acceptable?

Best Answer

Have a look at section $\textbf{13.2}$ of Hinton's guide to train an RBM, at equation $\textbf{17}$ or a similar and better description in Salakhutdinov's Learning Deep Generative Models, section $\textbf{2.2}$.

http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf http://www.cs.cmu.edu/~rsalakhu/papers/annrev.pdf

The Gaussian RBM assumes you have real-valued visible units between interval 0-1 (as is with normalized MNIST) and some variance $\textbf{$\sigma^2$}$. In reality you would have to infer $\textbf{$\sigma^2$}$, but for all purposes, this is chosen prior to training your model, in some cases, variance 0.01.

The tutorial you mentioned uses the scikit learn package BernoulliRBM, which inherently accepts floats as input and sets the variance to 0.01(see the $\textbf{fit}$ function on github). So what they do is allowed and ok but it's done behind the scenes:) Hope this helps!

Patric

Related Question