Is there a method for constructing decision trees that takes account of structured/hierarchical/multilevel predictors, that would allow me to impose domain knowledge or constraints on interactions for example?
Solved – method for constructing decision trees that takes account of structured/hierarchical/multilevel predictors
cartmachine learningmultilevel-analysisregression
Related Solutions
Similarity
Fundamentally both types of algorithms were developed to answer one general question in machine learning applications:
Given predictors (factors) $x_1, x_2, \ldots, x_p$ - how to incorporate the interactions between this factors in order to increase the performance?
One way is to simply introduce new predictors: $x_{p+1} = x_1x_2, x_{p+2} = x_1x_3, \ldots$ But this proves to be bad idea due to huge number of parameters and very specific type of interactions.
Both Multilevel modelling and Deep Learning algorithms answer this question by introducing much smarter model of interactions. And from this point of view they are very similar.
Difference
Now let me try to give my understanding on what is the great conceptual difference between them. In order to give some explanation, let's see the assumptions that we make in each of the models:
Multilevel modelling:$^1$ layers that reflect the data structure can be represented as a Bayesian Hierarchical Network. This network is fixed and usually comes from domain applications.
Deep Learning:$^2$ the data were generated by the interactions of many factors. The structure of interactions is not known, but can be represented as a layered factorisation: higher-level interactions are obtained by transforming lower-level representations.
The fundamental difference comes from the phrase "the structure of interactions is not known" in Deep Learning. We can assume some priors on the type of interaction, but yet the algorithm defines all the interactions during the learning procedure. On the other hand, we have to define the structure of interactions for Multilevel modelling (we learn only vary the parameters of the model afterwards).
Examples
For example, let's assume we are given three factors $x_1, x_2, x_3$ and we define $\{x_1\}$ and $\{x_2, x_3\}$ as different layers.
In the Multilevel modelling regression, for example, we will get the interactions $x_1 x_2$ and $x_1 x_3$, but we will never get the interaction $x_2 x_3$. Of course, partly the results will be affected by the correlation of the errors, but this is not that important for the example.
In Deep learning, for example in multilayered Restricted Boltzmann machines (RBM) with two hidden layers and linear activation function, we will have all the possible polinomial interactions with the degree less or equal than three.
Common advantages and disadvantages
Multilevel modelling
(-) need to define the structure of interactions
(+) results are usually easier to interpret
(+) can apply statistics methods (evaluate confidence intervals, check hypotheses)
Deep learning
(-) requires huge amount of data to train (and time for training as well)
(-) results are usually impossible to interpret (provided as a black box)
(+) no expert knowledge required
(+) once well-trained, usually outperforms most other general methods (not application specific)
Hope it will help!
When you have panel data, there are a different tasks that you can try to solve, e.g. time series classification/regression or panel forecasting. And for each task, there are numerous approaches to solve it.
When you want to use machine learning methods to solve panel forecasting, there are a number of approaches:
Regarding your input data (X), treating units (e.g. countries, individuals, etc) as i.i.d. samples, you can
- bin the time series and treat each bin as a separate column, ignoring any temporal ordering, with equal bins for all units, the bin size could of course simply be the observed time series measurement, or you could upsample and aggregate into larger bins, then use standard machine learning algorithms for tabular data,
- or extract features from the time series for each unit, and use each extracted feature as a separate columns, again combined with standard tabular algorithms,
- or use specialised time series regression/classification algorithms depending on whether you observe continuous or categorical time series data, this includes support vector machines with special kernels that compare time series with time series.
Regarding your output data (y), if you want to forecast multiple time points in the future, you can
- fit an estimator for each step ahead that you want to forecast, always using the same input data,
- or fit a single estimator for the first step ahead and in prediction, roll the input data in time, using the first step predictions to append to the observed input data to make the second step predictions and so on.
All of the approaches above basically reduce the panel forecasting problem to a time series regression or tabular regression problem. Once your data is in the time series or tabular regression format, you can also append any time-invariant features for users.
Of course there are other options to solve the panel forecasting problem, like for example using classical forecasting methods like ARIMA adapted to panel data or deep learning methods that allow you to directly make sequence to sequence predictions.
Best Answer
If you have metric responses, there is RE-EM tree by Sela and Simonoff (Machine Learning, 86, 169-207). The R package is called
REEMtree
. It is intended for panel data with random effects, but you should be able to use it for other hierarchically nested/multilevel data as well.If you are fine with including the domain expertise in a fixed effect model, you can also use model-based recursive partioning with the
party::mob
function.