Solved – Interpretation of spectral entropy of a timeseries

entropyforecastinginformation theoryspectral analysistime series

The tsfeatures package for R has an entropy() function. The vignette for the package describes it as:

The spectral entropy is the Shannon entropy $$-\int_{\pi}^{\pi} \hat{f}(\lambda)\log\hat{f}(\lambda) d\lambda$$

where $\hat{f}(\lambda)$ is an estimate of the spectral density of the data. This measures the “forecastability” of a time series, where low values indicate a high signal-to-noise ratio, and large values occur when a series is difficult to forecast.

The documentation for the ForeCA::spectral_entropy() which is used by entropy() function suggests the density is calculated such that
$$\int_{-\pi}^{\pi} f_x(\lambda) d\lambda = 1$$.

I'm wondering what the most accurate interpretation of this calculated quantity is (I'm sure there's a good reason why 'forecastability' is in quotation marks). I've got a suspicion it's based on a narrow interpretation of what is forecastable.

  1. Is it correct in saying that the spectral density is obtained using a (discrete?) Fourier transform on the time-series?

  2. Would it be more accurate to say that this calculation measures 'forecastability' by testing whether the time series is a linear combination of signals at different frequencies and at different levels of power?

  3. What other assumptions are made? Surely this measure has very limited ability to account for path dependence and other complex/non-linear behavior which may or may not be forecastable given a sophisticated understanding and model?

Best Answer

Assuming you're limiting yourself to stationary processes:

  1. Usually that's how it's done, yes.

  2. No, it's not whether. It's really more like how few frequencies. The fewer the better. Any stationary process can be written in terms of a sum of fourier frequencies with random weights. That's the spectral representation theorem. I would say this entropy is like any other entropy. The higher the value, the larger the average “surprise” (log density), and in this case, the flatter the density. A completely flat periodogram is the periodogram of white noise, which is completely unpredictable.

  3. Regarding other assumptions, I'm not sure. I can't rule out things with any certainty, but I would venture to guess they're assuming stationarity.