Neural Networks – Difference Between Convolutional Neural Network and Recurrent Neural Network

conv-neural-networkneural networksrecurrent neural network

What are the differences between a convolutional neural network and a recurrent neural network?

When should I apply which of the two models?

Best Answer

Convolutional neural nets apply a convolution to the data before using it in fully connected layers.

They are best used in cases where you want positional invariance, that is to say, you want features to be captured regardless of where they are in the input sample.
Think of a picture with all sorts of animals in it. If you apply a convolutional neural net to classify whether there is a cat in the picture, it will identify the cat no matter what position in the picture the cat is (at the top, the bottom, left or right). This is very useful for image classification.

Recurrent neural nets are neural networks that keep state between input samples. They remember previous input samples and use those to help classify the current input sample.

They are mostly useful when the order of your data is important. So for instance in speech (previous words do help identify the current word), video (frames are ordered) and also text processing.
Generally speaking, problems related to time-series data (data with a timestamp on them) are good candidates to be solved well with recurrent neural nets.

Related Solutions

Neural Networks – Difference Between Feed-Forward and Recurrent Neural Networks

Feed-forward ANNs allow signals to travel one way only: from input to output. There are no feedback (loops); i.e., the output of any layer does not affect that same layer. Feed-forward ANNs tend to be straightforward networks that associate inputs with outputs. They are extensively used in pattern recognition. This type of organisation is also referred to as bottom-up or top-down.

A feed-forward network

Feedback (or recurrent or interactive) networks can have signals traveling in both directions by introducing loops in the network. Feedback networks are powerful and can get extremely complicated. Computations derived from earlier input are fed back into the network, which gives them a kind of memory. Feedback networks are dynamic; their 'state' is changing continuously until they reach an equilibrium point. They remain at the equilibrium point until the input changes and a new equilibrium needs to be found.

feedback architecture

Feedforward neural networks are ideally suitable for modeling relationships between a set of predictor or input variables and one or more response or output variables. In other words, they are appropriate for any functional mapping problem where we want to know how a number of input variables affect the output variable. The multilayer feedforward neural networks, also called multi-layer perceptrons (MLP), are the most widely studied and used neural network model in practice.

As an example of feedback network, I can recall Hopfield’s network. The main use of Hopfield’s network is as associative memory. An associative memory is a device which accepts an input pattern and generates an output as the stored pattern which is most closely associated with the input. The function of the associate memory is to recall the corresponding stored pattern, and then produce a clear version of the pattern at the output. Hopfield networks are typically used for those problems with binary pattern vectors and the input pattern may be a noisy version of one of the stored patterns. In the Hopfield network, the stored patterns are encoded as the weights of the network.

Kohonen’s self-organizing maps (SOM) represent another neural network type that is markedly different from the feedforward multilayer networks. Unlike training in the feedforward MLP, the SOM training or learning is often called unsupervised because there are no known target outputs associated with each input pattern in SOM and during the training process, the SOM processes the input patterns and learns to cluster or segment the data through adjustment of weights (that makes it an important neural network model for dimension reduction and data clustering). A two-dimensional map is typically created in such a way that the orders of the interrelationships among inputs are preserved. The number and composition of clusters can be visually determined based on the output distribution generated by the training process. With only input variables in the training sample, SOM aims to learn or discover the underlying structure of the data.

(The diagrams are from Dana Vrajitoru's C463 / B551 Artificial Intelligence web site.)

Solved – Difference between neural network architectures

To fully answer this question, it would require a lot of pages here. Don't forget, stackexchange is not a textbook from which people read for you.

Multi-layered perceptron (MLP): are the neural networks that (probably) started everything. They are strictly feed-forward (one directional), i.e. a node from one layer can only have connections to a node of the next layer (no crazy stuff here). All layers are fully connected. This is the equivalent to a feed-forward neural network. Both are directed graphs. Backprop is usually used to train these networks. They neurons/nodes in this network perform a dot-product of a weight-vector belonging to this neuron with the input. The output is passed through a sigmoidal function, which later makes it easy to compute gradients and form the backprop algorithms.
Recurrent neural networks (RNNs) are networks which form an undirected cycle, essentially per layer. Meaning that this kind of network has a (fixed) storage capacity of information. It is/was often used on problems that require these specific "memory buffers", e.g. handwriting recognition. Training is usually performed by gradient descent (the principle behind backprop).
Hopfield network: can be seen as an (somewhat unofficial) form of a RNN. It only has one layer, which then (already) provides outputs.. The nodes, however, are interconnected in a special way -- Feedback-Nets (google it). One important point to make is that the neurons/nodes are of binary nature, e.g. they only take 1 or 0 as an input. Training is usually performed by Hebbian learning.
Restricted Boltzman Machines (RBMs) also usually only take binary input. It can be described as a two-layer "network" (better: 'graph'). The first layer are visible units, i.e. we observe them. The second layer are hidden (latent) units, i.e. we have to infer them. These nets are trained using contrastive divergence (a mix of gradient descent and Gibbs Sampling). Note that the training procedure does not optimize the exact energy function (I won't explain that here) but rather a different yet related type. In practice this works well. The power of these models lies in the fact that they can be stacked, i.e. one RBM after another. Training is performed separately. Research on RBMs and their development into stacked models was mainly executed by Geoffrey Hinton and his team. It can be categorized as a form of deep learning.
Recursive neural network: I actually never worked with them, so I probably can't say much about them. I think the main idea is that a neuron can point at itself and therefore enables temporal modeling. These networks can be unrolled and then trained in a regular fashion.
Convolutional neural network: Are usually a special kind of networks in deep learning. Let's first discuss them. 'Deep' here essentially means to have more and more layers in your model. Why didn't we do this before with MLPs? Well, backprob pushes the error the network has produced back to the inputs, i.e. in reverse using the derivatives w.r.t. all parameters. We said before a non-linear transfer function is used in the neurons -- a sigmoidal function. The problem here is, that with many layers, this function causes the gradient to vanish. This is obvious, you put your signal through mutliple sigmoidal functions, which are capped at [0,1] or [-1,1]. They were essentially replaced with rectified linear units (ReLu). These are essentially zero from $-\infty$ to zero and grow linearly from zero to $+\infty$. That solved the issue of the vanishing gradients. Another problem was that it took quite a long time to train such networks on the computers back then. This was resolved by porting the problem to modern GPUs, which can train the most sophisticated nets these days in roughly a week and the more easier ones in less than a day.
CNN: So what is a convolutional neural network? In its simplest form it is a shallow MLP and the input is, e.g. and most often, an image. Convolutional filters are computed over the image and give input to the next (second) layer. Note: The weights of the convolutional filters are learned as well in the process. These days they are almost always used in deep architectures in combination with pooling layers and other tricks of the trade.

Material for you:

Books to read:
- Neural Networks for Pattern Recognition by Christopher M. Bishop -everybody working with network structures such as the ones you asked for should have read this book.
- The Deep Learning Book by Ian Goodfellow, Yoshua Bengio and Aaron Courville (+ the community), This book is still in progress and hence can be downloaded for free at this point in time: http://www.deeplearningbook.org/
Lectures:
- Machine Learning Summer School: http://videolectures.net/mlss09uk_cambridge/?q=Machine%20Learning%20Summer%20School -- a very good summer school and other years are online as well. You should be interested in the talk by Geoffrey Hinton.
- Deep Learning summer school: http://videolectures.net/deeplearning2015_montreal/?q=Deep%20Learning%20summer%20school -- this one should help you a lot.

These explanations are by far not complete but hopefully correct. If you want to understand this field, you have to read a lot more than this.

Best Answer

Related Solutions

Neural Networks – Difference Between Feed-Forward and Recurrent Neural Networks

Solved – Difference between neural network architectures

Related Question