Solved – Difference between neural network architectures

conv-neural-networkdeep learningneural networksrecurrent neural networkrestricted-boltzmann-machine

Can someone explain to me the differences between the following neural network network architectures? How are they different in their prediction settings? Which networks are still used today, and why?

  • Multi-layered perceptron
  • Feedforward neural network
  • Recurrent neural network
  • Hopfield network
  • Restricted Boltzmann machine
  • Recursive neural network
  • Convolutional neural network

Best Answer

To fully answer this question, it would require a lot of pages here. Don't forget, stackexchange is not a textbook from which people read for you.

  • Multi-layered perceptron (MLP): are the neural networks that (probably) started everything. They are strictly feed-forward (one directional), i.e. a node from one layer can only have connections to a node of the next layer (no crazy stuff here). All layers are fully connected. This is the equivalent to a feed-forward neural network. Both are directed graphs. Backprop is usually used to train these networks. They neurons/nodes in this network perform a dot-product of a weight-vector belonging to this neuron with the input. The output is passed through a sigmoidal function, which later makes it easy to compute gradients and form the backprop algorithms.
  • Recurrent neural networks (RNNs) are networks which form an undirected cycle, essentially per layer. Meaning that this kind of network has a (fixed) storage capacity of information. It is/was often used on problems that require these specific "memory buffers", e.g. handwriting recognition. Training is usually performed by gradient descent (the principle behind backprop).
  • Hopfield network: can be seen as an (somewhat unofficial) form of a RNN. It only has one layer, which then (already) provides outputs.. The nodes, however, are interconnected in a special way -- Feedback-Nets (google it). One important point to make is that the neurons/nodes are of binary nature, e.g. they only take 1 or 0 as an input. Training is usually performed by Hebbian learning.
  • Restricted Boltzman Machines (RBMs) also usually only take binary input. It can be described as a two-layer "network" (better: 'graph'). The first layer are visible units, i.e. we observe them. The second layer are hidden (latent) units, i.e. we have to infer them. These nets are trained using contrastive divergence (a mix of gradient descent and Gibbs Sampling). Note that the training procedure does not optimize the exact energy function (I won't explain that here) but rather a different yet related type. In practice this works well. The power of these models lies in the fact that they can be stacked, i.e. one RBM after another. Training is performed separately. Research on RBMs and their development into stacked models was mainly executed by Geoffrey Hinton and his team. It can be categorized as a form of deep learning.
  • Recursive neural network: I actually never worked with them, so I probably can't say much about them. I think the main idea is that a neuron can point at itself and therefore enables temporal modeling. These networks can be unrolled and then trained in a regular fashion.
  • Convolutional neural network: Are usually a special kind of networks in deep learning. Let's first discuss them. 'Deep' here essentially means to have more and more layers in your model. Why didn't we do this before with MLPs? Well, backprob pushes the error the network has produced back to the inputs, i.e. in reverse using the derivatives w.r.t. all parameters. We said before a non-linear transfer function is used in the neurons -- a sigmoidal function. The problem here is, that with many layers, this function causes the gradient to vanish. This is obvious, you put your signal through mutliple sigmoidal functions, which are capped at [0,1] or [-1,1]. They were essentially replaced with rectified linear units (ReLu). These are essentially zero from $-\infty$ to zero and grow linearly from zero to $+\infty$. That solved the issue of the vanishing gradients. Another problem was that it took quite a long time to train such networks on the computers back then. This was resolved by porting the problem to modern GPUs, which can train the most sophisticated nets these days in roughly a week and the more easier ones in less than a day.
  • CNN: So what is a convolutional neural network? In its simplest form it is a shallow MLP and the input is, e.g. and most often, an image. Convolutional filters are computed over the image and give input to the next (second) layer. Note: The weights of the convolutional filters are learned as well in the process. These days they are almost always used in deep architectures in combination with pooling layers and other tricks of the trade.

Material for you:

These explanations are by far not complete but hopefully correct. If you want to understand this field, you have to read a lot more than this.