Solved – Good tutorial for Restricted Boltzmann Machines (RBM)

referencesrestricted-boltzmann-machine

I’m studying the Restricted Boltzmann Machine (RBM) and am having some issues understanding log likelihood calculations with respect to the parameters of the RBM. Even though a lot of research papers on RBM have been published, there are no detailed steps of the derivatives. After searching online I was able to find them in this document:

  • Fischer, A., & Igel, C. (2012). An Introduction to Restricted Boltzmann Machines. In L. Alvarez et al. (Eds.): CIARP, LNCS 7441, pp. 14–36, Springer-Verlag: Berlin-Heidelberg. (pdf)

However, the details of this document are too advanced for me. Can somebody point me towards a good tutorial / set of lecture notes about RBM?


Edit: @David, the confusing section is shown below (equation 29 in page 26):

\begin{align}
\frac{\partial\ln\mathcal{L}(\theta|v)}{\partial w_{ij}} &= -\sum_h p(h|v)\frac{\partial E(v, h)}{\partial w_{ij}} + \sum_{v,h} p(v,h)\frac{\partial E(v,h)}{\partial w_{ij}} \\[5pt]
&= \sum_h p(h|v)h_iv_j – \sum_v p(v) \sum_h p(h|v)h_iv_j \\[5pt]
&= \color{orange}{\boxed{\color{black}{p(H_i=1|v)}}}v_j – \sum_v p(v) \color{orange}{\boxed{\color{black}{p(H_i=1|v)}}}v_j\; . \tag{29}
\end{align}

Best Answer

I know it is a little late, but maybe it helps. To obtain the first term of your equation, it takes these steps: \begin{align} \sum_{\mathbf{h}} p(\mathbf{h} | \mathbf{v})h_iv_j &= v_j \sum_{h_1}...\sum_{h_i}...\sum_{h_n} p(h_1,...,h_i,...h_n | \mathbf{v}) h_i \\[5pt] &= v_j \sum_{h_i} \sum_{\mathbf{h_{\_ i}}}p(h_i, \mathbf{h_{\_i}} | \mathbf{v}) h_i \end{align} We have assumed that conditional independence between the hidden units, given the visible units, exists. Thus we can factorize the conditional joint probability distribution for the hidden states. \begin{align} &= v_j \sum_{h_i} \sum_{\mathbf{h_{\_ i}}} p(h_i | \mathbf{v}) h_i \: p(\mathbf{h_{\_ i}}|\mathbf{v}) \\[5pt] &= v_j \sum_{h_i} p(h_i | \mathbf{v}) h_i \: \sum_{\mathbf{h_{\_ i}}} p(\mathbf{h_{\_ i}}|\mathbf{v}) \end{align} The last term equals $1$, since we are summing over all states. Thus what is left, is the first term. Since $h_i$ only takes states $1$ and $0$ we end up with: $$ \hspace{-25mm}= v_j \: p(H_i = 1 | \mathbf{v}) $$