Solved – Transfer entropy calculation

entropyinformation theory

My question is about the interpretation of the summation in the formula.

The transfer entropy from process Y to process X can be calculated using the following formula:

$t(x|y) = \displaystyle\sum_{x_{n+1}, x_{n}, y_{n}} p(x_{n+1}, x_{n}, y_{n}) \cdot \log \left( \frac{p(x_{n+1}|x_{n},y_{n})}{p(x_{n+1}|x_{n})} \right)$

For given time series $x, y$ each having $n$ elements, is it correct that the summation comprises $(n-1)$ tuples $(x_{i+1}, x_i, y_i)$ as follows:

$(x_{2}, x_{1}, y_{1}),
(x_{3}, x_{2}, y_{2}),
\cdots
(x_{n}, x_{n-1}, y_{n-1}),$

Or how should I interpret the summation over the states of the two processes?

Best Answer

For given time series $x, y$ each having $n$ elements, is it correct that the summation comprises $(n−1)$ tuples $(x_{i+1},x_i,y_i)$.

Not quite. That sum is actually over the different states of the variables involved. For example, if both $X$ and $Y$ are binary variables, then the summation will have $2^3 = 8$ terms. In general, if $X$ and $Y$ have $k$ possible states, the summation will have $k^3$ terms.

The interpretation of the sum over states is not that different from the usual sum over states when calculating entropy or mutual information. The term $(x_{n+1}, x_n, y_n)$ represents how much information the particular state $y_n$ provides about the particular state $x_{n+1}$ of the future of $X$ when the past state of $X$ is $x_n$. For more information you can read Joe Lizier's work on local information measures.

The source of your confusion could be that if you want to estimate transfer entropy from a given time series, then you (typically) have to loop through the whole time series to estimate $p(x_{n+1}, x_n, y_n)$, usually by counting occurrences of all $(x_{n+1}, x_n, y_n)$ tuples. Then, once you have estimated your PDF, you have to go back to the $t(x|y)$ formula and calculate those $k^3$ terms I mentioned above.

Best Answer

Related Solutions

Solved – Why is Entropy maximised when the probability distribution is uniform

Solved – Is it possible to use SD instead of entropy

Related Question