Solved – Good tutorial for Restricted Boltzmann Machines (RBM)


I’m studying the Restricted Boltzmann Machine (RBM) and am having some issues understanding log likelihood calculations with respect to the parameters of the RBM. Even though a lot of research papers on RBM have been published, there are no detailed steps of the derivatives. After searching online I was able to find them in this document:

  • Fischer, A., & Igel, C. (2012). An Introduction to Restricted Boltzmann Machines. In L. Alvarez et al. (Eds.): CIARP, LNCS 7441, pp. 14–36, Springer-Verlag: Berlin-Heidelberg. (pdf)

However, the details of this document are too advanced for me. Can somebody point me towards a good tutorial / set of lecture notes about RBM?

Edit: @David, the confusing section is shown below (equation 29 in page 26):

\frac{\partial\ln\mathcal{L}(\theta|v)}{\partial w_{ij}} &= -\sum_h p(h|v)\frac{\partial E(v, h)}{\partial w_{ij}} + \sum_{v,h} p(v,h)\frac{\partial E(v,h)}{\partial w_{ij}} \\[5pt]
&= \sum_h p(h|v)h_iv_j – \sum_v p(v) \sum_h p(h|v)h_iv_j \\[5pt]
&= \color{orange}{\boxed{\color{black}{p(H_i=1|v)}}}v_j – \sum_v p(v) \color{orange}{\boxed{\color{black}{p(H_i=1|v)}}}v_j\; . \tag{29}

Best Answer

I know it is a little late, but maybe it helps. To obtain the first term of your equation, it takes these steps: \begin{align} \sum_{\mathbf{h}} p(\mathbf{h} | \mathbf{v})h_iv_j &= v_j \sum_{h_1}...\sum_{h_i}...\sum_{h_n} p(h_1,...,h_i,...h_n | \mathbf{v}) h_i \\[5pt] &= v_j \sum_{h_i} \sum_{\mathbf{h_{\_ i}}}p(h_i, \mathbf{h_{\_i}} | \mathbf{v}) h_i \end{align} We have assumed that conditional independence between the hidden units, given the visible units, exists. Thus we can factorize the conditional joint probability distribution for the hidden states. \begin{align} &= v_j \sum_{h_i} \sum_{\mathbf{h_{\_ i}}} p(h_i | \mathbf{v}) h_i \: p(\mathbf{h_{\_ i}}|\mathbf{v}) \\[5pt] &= v_j \sum_{h_i} p(h_i | \mathbf{v}) h_i \: \sum_{\mathbf{h_{\_ i}}} p(\mathbf{h_{\_ i}}|\mathbf{v}) \end{align} The last term equals $1$, since we are summing over all states. Thus what is left, is the first term. Since $h_i$ only takes states $1$ and $0$ we end up with: $$ \hspace{-25mm}= v_j \: p(H_i = 1 | \mathbf{v}) $$