Time Series – How to Determine Forecastability?

entropyforecastabilityforecastingmaximum-entropytime series

One of the important issues being faced by forecasters is if the given series can be forecasted or not ?

I stumbled on an article entitled "Entropy as an A Priori Indicator of Forecastability" by Peter Catt that uses Approximate Entropy (ApEn) as a relative measure to determine of a given time series is forecastable.

The article says,

"Smaller ApEn values indicate a greater chance that a set of data will
be followed by similar data (regularity). Conversely, a larger value
of ApEn indicates a lower chance of similar data being repeated
(irregularity). Hence, larger values convey more disorder, randomness
and system complexity."

And is followed by mathematical formulas for calculating ApEn. This is an interesting approach because it provides a numerical value that can be used to assess forecastablity in relative sense. I don't know what Approximate Entropy means, I'm reading more about it.

There is a package called pracma in R that lets you calculate ApEn. For an illustrative purpose, I used 3 different time series and calculated the ApEn numbers.

  1. Series 1: The famous AirPassenger time series – is highly deterministic and we should be able to forecast easily.
  2. Series 2: Sunspot Time Series – is very well defined but should be less forecastable than series 1.
  3. Series 3: Random Number There is no way to forecast this series.

So if we calculate ApEn, Series 1 should be less than Series 2 should be very very less Series 3.

Below is the R snippet that calculates ApEn for all the three series.

library("pracma")
> series1 <- approx_entropy(AirPassengers)
> series1
[1] 0.5157758
> series2 <- approx_entropy(sunspot.year)
> series2
[1] 0.762243
> series3 <- approx_entropy(rnorm(1:30))
> series3
[1] 0.1529609

This is not what I expected. The random series has a lower number than the well defined AirPassenger series. Even if I increase the random number to 100, I still get the following which is less than the well defined series 2/Sunspot.yealry series.

> series3 <- approx_entropy(rnorm(1:100))
> series3
[1] 0.747275

Below are my questions:

  1. There are 2 parameters in calculating ApEn (m and r) ? How to determine them. Iused defaults in the R code above.
  2. What am I doing incorrectly that is showing that incorrectly that ApEn is lower for random numbers vs. a well defined series such as sunspot.yearly.
  3. Should I deseasonalize/detrend the series and then estimate ApEn. The author however has applied ApEn directly to the series.
  4. Is there any other way to determine if the series is forecastable ?

Best Answer

Parameters m and r, involved in calculation of approximate entropy (ApEn) of time series, are window (sequence) length and tolerance (filter value), correspondingly. In fact, in terms of m, r as well as N (number of data points), ApEn is defined as "natural logarithm of the relative prevalence of repetitive patterns of length m as compared with those of length m + 1" (Balasis, Daglis, Anastasiadis & Eftaxias, 2011, p. 215):

$$ ApEn(m, r, N) = \Phi^m(r) - \Phi^{m+1}(r), $$

$\text{where }$

$$ \Phi^m(r) = {\LARGE{\Sigma}_i} lnC^m_i(r)/(N - m + 1) $$

Therefore, it appears that changing the tolerance r allows to control the (temporal) granularity of determining time series' entropy. Nevertheless, using the default values for both m and r parameters in pracma package's entropy function calls works fine. The only fix that needs to be done to see the correct entropy values relation for all three time series (lower entropy for more well-defined series, higher entropy for more random data) is to increase the length of random data vector:

 library(pracma)
 set.seed(10)
 all.series <- list(series1 = AirPassengers,
                    series2 = sunspot.year,
                    series3 = rnorm(500)) # <== size increased
 sapply(all.series, approx_entropy)
  series1   series2   series3 
  0.5157758 0.7622430 1.4741971 

The results are as expected - as the predictability of fluctuations decreases from most determined series1 to most random series 3, their entropy consequently increases: ApEn(series1) < ApEn(series2) < ApEn(series3).

In regard to other measures of forecastability, you may want to check mean absolute scaled errors (MASE) - see this discussion for more details. Forecastable component analysis also seems to be an interesting and new approach to determining forecastability of time series. And, expectedly, there is an R package for that, as well - ForeCA.

library(ForeCA)
sapply(all.series,
       Omega, spectrum.control = list(method = "wosa"))
 series1   series2   series3 
 41.239218 25.333105  1.171738 

Here $\Omega \in [0, 1]$ is a measure of forecastability where $\Omega(white noise) = 0\%$ and $\Omega(sinusoid) = 100 \%$.

References

Balasis, G., Daglis, I. A., Anastasiadis, A., & Eftaxias, K. (2011). Detection of dynamical complexity changes in Dst time sSeries using entropy concepts and rescaled range analysis. In W. Liu and M. Fujimoto (Eds.), The Dynamic Magnetosphere, IAGA Special Sopron Book, Series 3, 211. doi:10.1007/978-94-007-0501-2_12. Springer. Retrieved from http://members.noa.gr/anastasi/papers/B29.pdf

Georg M. Goerg (2013): Forecastable Component Analysis. JMLR, W&CP (2) 2013: 64-72. http://machinelearning.wustl.edu/mlpapers/papers/goerg13

Related Question