[Math] Empirical Kullback-Leibler divergence of two time series

information theoryprobabilitystatistical-inferencestatisticstime series

I have an two vectors (time series) with the same length (1200 elements) $x$ and $y$. Further both time series are stationary. I don't know the theoretical distribution of $x$ and $y$. I would like to calculate relative entropy of these r.v.-s. I think I have to use an empirical distribution function. Have you any ideas how to calculate Kullback-Leibler divergence of two time series, with different distribution?

Thanks a lot in advance!

Best Answer

Typically, one would define a window size $w$, and look at the distribution generated by considering symbols of the form $x_k^{k+w}$. The distributions on $X_1^w$ and $Y_1^w$ so found can be plugged into the formula.

$$\frac{1}{w}\sum_{x \in \mathcal{X}^w} P_w(x)\log\frac{P_w(x)}{Q_w(x)}$$

Where $P$ was determined using $x$, and $Q$ using $y$. Note the normalisation by the window size, which is giving you a single-letter type figure.

The windowing is essentially because you don't know if the symbols are i.i.d. or not, and if you do, you can forego it. The window size should be significantly smaller than the length of the time series, otherwise the likelihood of getting symbols in one time series that don't occur in the other are pretty high, which means you'll either drop samples or have the divergence blow up, both of which are bad. There is an obvious tradeoff between raising and lowering the window size, and you'll probably need to do a bit of lit survey if you want to pick an optimal one, but given a $1200$ step time series, $10-20$ gives you a fairly long range as well as a good number of samples.

Edit: this stats.se link seems relevant, especially with respect to having to actually compute the horrible thing.

Related Question