Solved – What’s the difference between “deep learning” and multilevel/hierarchical modeling

deep learninghierarchical-bayesianmachine learningmultilevel-analysis

Is "deep learning" just another term for multilevel/hierarchical modeling?

I'm much more familiar with the latter than the former, but from what I can tell, the primary difference is not in their definition, but how they are used and evaluated within their application domain.

It looks like the number of nodes in a typical "deep learning" application is larger and uses a generic hierarchical form, whereas applications of multilevel modeling typically uses a hierarchical relationships that mimic the generative process being modeled. Using a generic hierarchy in an applied statistics (hierarchical modeling) domain would be regarded as an "incorrect" model of the phenomena, whereas modeling a domain-specific hierarchy might be regarded as subverting the objective of making a generic deep learning learning machine.

Are these two things really the same machinery under two different names, used in two different ways?

Best Answer

Similarity

Fundamentally both types of algorithms were developed to answer one general question in machine learning applications:

Given predictors (factors) $x_1, x_2, \ldots, x_p$ - how to incorporate the interactions between this factors in order to increase the performance?

One way is to simply introduce new predictors: $x_{p+1} = x_1x_2, x_{p+2} = x_1x_3, \ldots$ But this proves to be bad idea due to huge number of parameters and very specific type of interactions.

Both Multilevel modelling and Deep Learning algorithms answer this question by introducing much smarter model of interactions. And from this point of view they are very similar.

Difference

Now let me try to give my understanding on what is the great conceptual difference between them. In order to give some explanation, let's see the assumptions that we make in each of the models:

Multilevel modelling:$^1$ layers that reflect the data structure can be represented as a Bayesian Hierarchical Network. This network is fixed and usually comes from domain applications.

Deep Learning:$^2$ the data were generated by the interactions of many factors. The structure of interactions is not known, but can be represented as a layered factorisation: higher-level interactions are obtained by transforming lower-level representations.

The fundamental difference comes from the phrase "the structure of interactions is not known" in Deep Learning. We can assume some priors on the type of interaction, but yet the algorithm defines all the interactions during the learning procedure. On the other hand, we have to define the structure of interactions for Multilevel modelling (we learn only vary the parameters of the model afterwards).

Examples

For example, let's assume we are given three factors $x_1, x_2, x_3$ and we define $\{x_1\}$ and $\{x_2, x_3\}$ as different layers.

In the Multilevel modelling regression, for example, we will get the interactions $x_1 x_2$ and $x_1 x_3$, but we will never get the interaction $x_2 x_3$. Of course, partly the results will be affected by the correlation of the errors, but this is not that important for the example.

In Deep learning, for example in multilayered Restricted Boltzmann machines (RBM) with two hidden layers and linear activation function, we will have all the possible polinomial interactions with the degree less or equal than three.

Common advantages and disadvantages

Multilevel modelling

(-) need to define the structure of interactions

(+) results are usually easier to interpret

(+) can apply statistics methods (evaluate confidence intervals, check hypotheses)

Deep learning

(-) requires huge amount of data to train (and time for training as well)

(-) results are usually impossible to interpret (provided as a black box)

(+) no expert knowledge required

(+) once well-trained, usually outperforms most other general methods (not application specific)

Hope it will help!