Solved – Restricted Boltzmann machines – Free Energy

restricted-boltzmann-machine

I am reading a deep learning introduction on RBMs.

It mentions that the formula below (screenshot copied in) has two terms, which are referred to as the positive and negative phase. The first term is desribed as that it increases the probability of training data (by reducing the corresponding free energy), while the second term decreases the probability of samples generated by the model.

enter image description here

Can someone share the intuition on why the first term increases the probability of the training data (because a higher free energy is linked to a higher likelihood of the training data?) and why the second term is linked to the probability of the samples?

Best Answer

What we want to achieve by training the RBM, is to find a solution, for which the training patterns are very likely, and other patterns are unlikely. You therefore make small steps in the direction which increases the probability $p(x)$ of a training pattern $x$.

For an easier understanding, let's look at the inverse of the equation you posted:

$\frac{\partial\log p(x)}{\partial\theta} = -\frac{\partial\mathcal{F}(x)}{\partial\theta} + \sum_{\tilde{x}} p(\tilde{x})\frac{\partial\mathcal{F}(\tilde{x})}{\partial\theta}$

We want to increase the probability $p(x)$, thus in training we go upwards in the direction of this gradient. As we usually don't like going upwards, but prefer going downwards, one simply inverts the equation and does a gradient descend. For the intuition behind the equation, let's stay with this form, where we want to maximize the probability.

For one given training example $x$, the positive phase $-\frac{\partial\mathcal{F}(x)}{\partial\theta}$ is the direction, in which the free energy gets lower. A lower energy corresponds to a higher probability. So, as you already stated, the positive phase increases the probability of the current training example $x$.

The negative phase $\sum_{\tilde{x}} p(\tilde{x})\frac{\partial\mathcal{F}(\tilde{x})}{\partial\theta}$ is a sum over all possible values $\tilde{x}$, i.e. all possible configurations. For each configuration, one calculates the direction in which the free energy gets higher, i.e. the probability gets lower. Each of those is weighted by the probability that this pattern $\tilde{x}$ occurs. The negative phase thus tries to make all possible patterns a bit less likely.