Solved – Comparison between Helmholtz machines and Boltzmann machines

machine learning

Today I started reading about Helmholtz machines. So far they seem very similar to – though clearly not the same as – Boltzmann machines, and I feel that my learning process would be much easier if I clearly understood what the key differences were. I come from a statistical physics background and understand Boltzmann machines very well (I've developed several of my own variations on the Boltzmann machine concept for various purposes), so I'm really looking for a brief explanation of the basic idea behind Helmholtz machines, assuming prior knowledge of Boltzmann machines and stat mech, but not necessarily much knowledge about belief nets or other types of neural network. (Though I do understand the difference between directed and undirected models, which seems like it should be relevant.)

To be specific, I suppose my questions are: How do Helmholtz machines and Boltzmann machines relate to each other? Is one a special case of the other, or are they just different; if the latter, what is the key difference in the assumptions they're built on? Is the difference to do with the difference between directed and undirected models, and if so, how exactly does that difference translate into the two different architectures?

Best Answer

These networks are very different in architecture and in how they are trained. Here http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.9404&rep=rep1&type=pdf many important details are covered and I suggest you to start with this.

A Helmholtz machine contains two networks, a bottom-up recognition network that takes the data as input and produces a distribution over hidden variables, and a top-down "generative" network that generates values of the hidden variables and the data itself.

Boltzmann machines are much simpler and their units are just divided into 'visible' units, V, and 'hidden' units, H. The visible units are those which receive information from the 'environment', i.e. our training set is a set of binary vectors over the set V.

Helmholtz machines were created to improve noise resilience, which is always present in natural data, and in hope that by learning economical representations of the data, the underlying structure of the generative model should reasonably approximate the hidden structure of the data set.

The two phases of the Boltzmann machine contrast the statistics of the activations of the network when input patterns are presented with the statistics of the activations of the network when it is running ‘free’. This contrastive procedure involves substantial noise and is therefore slow.

Helmholtz machines are trained by using so called 'wake-sleep' algorithm ( http://www.gatsby.ucl.ac.uk/~dayan/papers/d2000a.pdf ) where the wake and sleep phases are not contrastive. Rather, the recognition and generative models are forced to chase the other.

The Helmholtz machine also bears an interesting relationship to the Boltz mann machine , which can be seen as an undirected belief net.

In the Boltzmann machine a potentially drawn-out process of Gibbs sam pling is used to recognize and generate inputs, since there is nothing like the simple, one-pass, directed recognition and generative belief networks of the Helmholtz machine.

Also, the Boltzmann machine learning rule performs true stochastic gradient ascent of the log likelihood using a contrastive procedure, which, confusingly, involves wake and sleep phases that are quite different from the wake and the sleep phases of the wake- sleep algorithm.

So, summing up, those models are related in a way, that both use representational learning, but for many problems Helmholtz machines just deliver tractable approximations.

You can have very detailed and accessible reading if you will browse the works by Geoffrey Hinton, their "father".

Related Question