Solved – Mathematically modeling neural networks as graphical models

deep learningdeep-belief-networksgraphical-modelmarkov-processneural networks

I am struggling to make the mathematical connection between a neural network and a graphical model.

In graphical models the idea is simple: the probability distribution factorizes according to the cliques in the graph, with the potentials usually being of the exponential family.

Is there an equivalent reasoning for a neural network? Can one express the probability distribution over the units (variables) in a Restricted Boltzmann machine or a CNN as a function of their energy, or the product of the energies between units?

Also, is the probability distribution modelled by an RBM or Deep belief network (e.g. with CNNs) of the exponential family?

I am hoping to find a text that formalizes the connection between these modern types of neural networks and statistics in the same way that Jordan & Wainwright did for graphical models with their Graphical Models, Exponential Families and Variational Inference. Any pointers would be great.

Best Answer

Another good introduction on the subject is the CSC321 course at the University of Toronto, and the neuralnets-2012-001 course on Coursera, both taught by Geoffrey Hinton.

From the video on Belief Nets:

Graphical models

Early graphical models used experts to define the graph structure and the conditional probabilities. The graphs were sparsely connected, and the focus was on performing correct inference, and not on learning (the knowledge came from the experts).

Neural networks

For neural nets, learning was central. Hard-wiring the knowledge was not cool (OK, maybe a little bit). Learning came from learning the training data, not from experts. Neural networks did not aim for interpretability of sparse connectivity to make inference easy. Nevertheless, there are neural network versions of belief nets.


My understanding is that belief nets are usually too densely connected, and their cliques are too large, to be interpretable. Belief nets use the sigmoid function to integrate inputs, while continuous graphical models typically use the Gaussian function. The sigmoid makes the network easier to train, but it is more difficult to interpret in terms of the probability. I believe both are in the exponential family.

I am far from an expert on this, but the lecture notes and videos are a great resource.

Related Question