Solved – the difference between deep and shallow machine learning

algorithmsclassificationdeep learningmachine learning

There's a lot of information about deep learning but what features should an algorithm have for it to be classified as deep or shallow? Is shallow only related to neural networks?

Best Answer

Term "deep" is coming from Neural Networks domain. This term is "soft", and doesn't have exactly unified definition.

First NN networks similar to present models started in 1986 when work of (Rumelhart, Hinton, Williams) popularized backpropagation method and shown that NN can handle aprox. of not-linear function. In 90s NN was used but frequently outperformed by other models, like SVM. Researches recognized that NN can't handle very complex function aprox. Adding additional layers into NN leads to problem of learning, known as "vanishing gradient problem" (Hochreitera, 1991). Lack of computation power and better results from strong math-based models (probabilistic, SVM etc.) put out to pasture NN. However classic models quality improvement was very slow.

In 2006 Hinton showed NN with higher number of layers, learnt in different way. Adding more layers made NN to be deeper. More powerful CPUs and then new GPUs allowed to show that deep NN outperform classic models. Additionally DNN solution was often simpler and more flexible, i.e. CNN (first proposed in 1998) vs classic image processing. This encourage researches to focus on DNN, and provide additional solution for vanishing gradient, like using ReLU activation function. The quality improvement based on DNN was so big that often seen as revolution, DL term born.

In original term "deep" means that NN contains more than 1 hidden layer. However often is identified with any present complex high quality solution.

Effects of learning CNN was illustrated by generating pictures of layers output and interpreted that higher level features corresponds to higher level features. Like identify edges on pictures (low level) and identify edges form shape of face (high level). The main point is that CNN achieve this "characterization" of features in unsupervised way. This is the reason many researches expect that term Deep Learning is corresponding to models with ability to achieve unsupervised hierarchical feature extraction. However usually deep learning is identified with DNNs, which is wrong in my opinion. Nature of NN is flexibility which allowed to easily find deep models, however it is not a good reason to identify DL only with DNN.