Solved – Multi-layer perceptron vs deep neural network

neural networksperceptron

This is a question of terminology. Sometimes I see people refer to deep neural networks as "multi-layered perceptrons", why is this? A perceptron, I was taught, is a single layer classifier (or regressor) with a binary threshold output using a specific way of training the weights (not back-prop). If the output of the perceptron doesn't match the target output, we add or subtract the input vector to the weights (depending on if the perceptron gave a false positive or a false negative). It's a quite primitive machine learning algorithm. The training procedure doesn't appear to generalize to a multi-layer case (at least not without modification). A deep neural network is trained via backprop which uses the chain rule to propagate gradients of the cost function back through all of the weights of the network.

So, the question is. Is a "multi-layer perceptron" the same thing as a "deep neural network"? If so, why is this terminology used? It seems to be unnecessarily confusing. In addition, assuming the terminology is somewhat interchangeable, I've only seen the terminology "multi-layer perceptron" when referring to a feed-forward network made up of fully connected layers (no convolutional layers, or recurrent connections). How broad is this terminology? Would one use the term "multi-layered perceptron" when referring to, for example, Inception net? How about for a recurrent network using LSTM modules used in NLP?

Best Answer

One can consider multi-layer perceptron (MLP) to be a subset of deep neural networks (DNN), but are often used interchangeably in literature.

The assumption that perceptrons are named based on their learning rule is incorrect. The classical "perceptron update rule" is one of the ways that can be used to train it. The early rejection of neural networks was because of this very reason, as the perceptron update rule was prone to vanishing and exploding gradients, making it impossible to train networks with more than a layer.

The use of back-propagation in training networks led to using alternate squashing activation functions such as tanh and sigmoid.

So, to answer the questions,

the question is. Is a "multi-layer perceptron" the same thing as a "deep neural network"?

MLP is subset of DNN. While DNN can have loops and MLP are always feed-forward, i.e.,

A multi layer perceptrons (MLP)is a finite acyclic graph

why is this terminology used?

A lot of the terminologies used in the literature of science has got to do with trends of the time and has caught on.

How broad is this terminology? Would one use the term "multi-layered perceptron" when referring to, for example, Inception net? How about for a recurrent network using LSTM modules used in NLP?

So, yes inception, convolutional network, resnet etc are all MLP because there is no cycle between connections. Even if there is a shortcut connections skipping layers, as long as it is in forward direction, it can be called a multilayer perceptron. But, LSTMs, or Vanilla RNNs etc have cyclic connections, hence cannot be called MLPs but are a subset of DNN.

This is my understanding of things. Please correct me if I am wrong.

Reference Links:

https://cs.stackexchange.com/questions/53521/what-is-difference-between-multilayer-perceptron-and-multilayer-neural-network

https://en.wikipedia.org/wiki/Multilayer_perceptron

https://en.wikipedia.org/wiki/Perceptron

http://ml.informatik.uni-freiburg.de/former/_media/teaching/ss10/05_mlps.printer.pdf