Solved – Time series prediction – what is Autoregressive Tree model ? (Python)

algorithmsautoregressivedata miningmodelingtime series

Our problem: model evolution of values of a continuous variable over time.

I came through a paper presenting an approach for predicting the next values for a time series. Whereas ARIMA model is more accurate for long term prediction, ARTXP model is preferred to infer the next values.

Microsoft library for Data Mining algorithms implements ARTXP, a variation of Autoregressive Tree model.

How does algorithm works? What is a Python implementation for this model ?

Best Answer

We can refer to this paper, and explications below sum up approach in this paper.

First, autoregressive models can be described as follows.

Model for time series

Given a temporal sequence of vaiables, $Y=(Y_{1},...,Y_{T})$, a time series is a sequence of values for these variables, $y=(y_{1},...,y_{T})$. If $f(.|.,\theta)$ is a probability distribution or the model, we retict to models with form

$ p(y_{t}|y_{1},...,y_{t-1},\theta) = f(y_{t}|y_{t-p},...,y_{t-1},\theta)$

Model is probabilistic, stationary, and has p-Markov property.

Autoregressive Tree Model

First, an AR model is of the form

$f(y_{t}|y_{t-p},...,y_{t-1},\theta) = \mathit{N}( m + \sum_{j=1}^{p}b_{j}y_{t-j}, \sigma^{2}) $

where $\mathit{N}(\mu,\theta)$ is normal distribution with obvious notation.

That is, at each time, probability for a value has mean 'autoregressively' dependent of the last p values for the series.

An ART model is an AR model that is piecewise linear, and therefore can be represented as a tree. Each non leaf is a boolean formula, and each leaf is an AR model.

This is simple: branching along the tree operates depending on past values for the series. Each leaf is then an AR model for predicting the next time series value.

An AR model is a degenerated ART model, where there is one 'boolean' decision node, and one leaf AR model.

ART model over AR model

  • ART models non-linearities in time series data
  • ART models periodicity in time series data

An alternative for ART are neural networks BUT they are difficult to interpret and/or expensive to learn.