Solved – Machine learning algorithms for panel data

cartmachine learningpanel datarsvm

In this question –
Is there a method for constructing decision trees that takes account of structured/hierarchical/multilevel predictors? – they mention a panel data method for trees.

Are there specific panel data methods for support Vector Machines and Neural Networks? If so, could you cite some papers for the algorithms and (if available) R packages implementing it?

Best Answer

When you have panel data, there are a different tasks that you can try to solve, e.g. time series classification/regression or panel forecasting. And for each task, there are numerous approaches to solve it.

When you want to use machine learning methods to solve panel forecasting, there are a number of approaches:

Regarding your input data (X), treating units (e.g. countries, individuals, etc) as i.i.d. samples, you can

  • bin the time series and treat each bin as a separate column, ignoring any temporal ordering, with equal bins for all units, the bin size could of course simply be the observed time series measurement, or you could upsample and aggregate into larger bins, then use standard machine learning algorithms for tabular data,
  • or extract features from the time series for each unit, and use each extracted feature as a separate columns, again combined with standard tabular algorithms,
  • or use specialised time series regression/classification algorithms depending on whether you observe continuous or categorical time series data, this includes support vector machines with special kernels that compare time series with time series.

Regarding your output data (y), if you want to forecast multiple time points in the future, you can

  • fit an estimator for each step ahead that you want to forecast, always using the same input data,
  • or fit a single estimator for the first step ahead and in prediction, roll the input data in time, using the first step predictions to append to the observed input data to make the second step predictions and so on.

All of the approaches above basically reduce the panel forecasting problem to a time series regression or tabular regression problem. Once your data is in the time series or tabular regression format, you can also append any time-invariant features for users.

Of course there are other options to solve the panel forecasting problem, like for example using classical forecasting methods like ARIMA adapted to panel data or deep learning methods that allow you to directly make sequence to sequence predictions.