R Entropy – Calculating the Transfer Entropy in R

entropyinformation theorymathematical-statisticsr

The transfer entropy, from information theory, is an effective way to measure the one-way information dependence between two variables. A nice high-level summary is here:
http://lizier.me/joseph/presentations/20060503-Schreiber-MeasuringInfoTransfer.pdf

I see that there is a package for entropy and mutual information estimation (http://strimmerlab.org/software/entropy/), but not the one-way transfer metric.

What is an efficient way to calculate this in R? Perhaps I can use a chart output or metric from the mutual information package as a startpoint.

Best Answer

the same as above from the same page http://users.utu.fi/attenka/trent.R

###############################
###############################
## FUNCTION TRANSFER ENTROPY ##
###############################
###############################

# 070527 (ver. 081126), Atte Tenkanen
# s, time shift
trent<-function(Y,X,s=1){

    #---------------------------------#
    # Transition probability vectors: #
    #---------------------------------#

    L4=L1=length(X)-s # Lengths of vector Xn+1.
    L3=L2=length(X) # Lengths of vector Xn (and Yn).

    #-------------------#
    # 1. p(Xn+s,Xn,Yn): #
    #-------------------#

    TPvector1=rep(0,L1) # Init.

    for(i in 1:L1)
    {
            TPvector1[i]=paste(c(X[i+s],"i",X[i],"i",Y[i]),collapse="") # "addresses"
    }

    TPvector1T=table(TPvector1)/length(TPvector1) # Table of probabilities.

    #-----------#
    # 2. p(Xn): #
    #-----------#

    TPvector2=X
    TPvector2T=table(X)/sum(table(X))

    #--------------#
    # 3. p(Xn,Yn): #
    #--------------#

    TPvector3=rep(0,L3)

    for(i in 1:L3)
    {
            TPvector3[i]=paste(c(X[i],"i",Y[i]),collapse="") # addresses
    }

    TPvector3T=table(TPvector3)/length(TPvector2)

    #----------------#
    # 4. p(Xn+s,Xn): #
    #----------------#

    TPvector4=rep(0,L4)

    for(i in 1:L4)
    {
            TPvector4[i]=paste(c(X[i+s],"i",X[i]),collapse="") # addresses
    }

    TPvector4T=table(TPvector4)/length(TPvector4)

    #--------------------------#
    # Transfer entropy T(Y->X) #
    #--------------------------#

    SUMvector=rep(0,length(TPvector1T))
    for(n in 1:length(TPvector1T))
    {
        SUMvector[n]=TPvector1T[n]*log10((TPvector1T[n]*TPvector2T[(unlist(strsplit(names(TPvector1T)[n],"i")))[2]])/(TPvector3T[paste((unlist(strsplit(names(TPvector1T)[n],"i")))[2],"i",(unlist(strsplit(names(TPvector1T)[n],"i")))[3],sep="",collapse="")]*TPvector4T[paste((unlist(strsplit(names(TPvector1T)[n],"i")))[1],"i",(unlist(strsplit(names(TPvector1T)[n],"i")))[2],sep="",collapse="")]))
    }
    return(sum(SUMvector))
} # End of the trent-function.

Related Solutions

Solved – LLR with Positive and Negative Values vs. Dunning method with Entropy-based Calculation

I may have an answer, borrowed from a non-entropy form of the calculation.

Reviewing http://scg.unibe.ch/archive/papers/Kuhn09aLogLikelihoodRatio.pdf (end of page 1, start of page 2), they mention:

"By multiplying ... with the signum of p2 − p1 we can further distinguish between terms specific to the first corpus and ... the second"

Signum is just fancy for "is the result greater than or less than zero".

Revisiting the original Contingency Table:

                Corpus A   Corpus B
Target Word       k_11       k_12
Other Words       k_21       k_22
Column totals   col1Total  col2Total

Calculating p1 and p2:

p1 = k_11 / col1Total
p2 = k_12 / caol2Total

I believe signum( p2 − p1 ) is just a fancy way of saying if p2 < p1 then multiply the answer by -1.0.

If a term is used 20% of the time in corpus A and only 10% in B I believe the number should be positive. If it's % use is higher in B than in A then the number should be negative.

Staring at this, it seems like signum(p2-p1) give the opposite of that... but the Adrian Kuhn paper shows the equation in the form "−2 log λ", so maybe that flips it from what you start with using the Dunning model....

Or I'm otherwise confused about the meaning of +/-.

From http://ucrel.lancs.ac.uk/llwizard.html

Positive = more prominent in A, "+ indicates overuse in A relative to B"
Negative = more prominent in B, "- indicates underuse in A relative to B"

Well a bit of progress at least:

I have the sign changing between +/-, which is some progress.
Now I just need to confirm which direction means what. ;-)

Cart Likelihood-Ratio Information-Theory – Understanding the Relationship Between GINI Score and Log-Likelihood Ratio

I will use the same notation I used here: Mathematics behind classification and regression trees

Gini Gain and Information Gain ($IG$) are both impurity based splitting criteria. The only difference is in the impurity function $I$:

$\textit{Gini}: \mathit{Gini}(E) = 1 - \sum_{j=1}^{c}p_j^2$
$\textit{Entropy}: H(E) = -\sum_{j=1}^{c}p_j\log p_j$

They actually are particular values of a more general entropy measure (Tsallis' Entropy) parametrized in $\beta$:

$$H_\beta (E) = \frac{1}{\beta-1} \left( 1 - \sum_{j=1}^{c}p_j^\beta \right) $$

$\textit{Gini}$ is obtained with $\beta = 2$ and $H$ with $\beta \rightarrow 1$.

The log-likelihood, also called $G$-statistic, is a linear transformation of Information Gain:

$$G\text{-statistic} = 2 \cdot |E| \cdot IG$$

Depending on the community (statistics/data mining) people prefer one measure or the the other (Related question here). They might be pretty much equivalent in the decision tree induction process. Log-likelihood might give higher scores to balanced partitions when there are many classes though [Technical Note: Some Properties of Splitting Criteria. Breiman 1996].

Gini Gain can be nicer because it doesn't have logarithms and you can find the closed form for its expected value and variance under random split assumption [Alin Dobra, Johannes Gehrke: Bias Correction in Classification Tree Construction. ICML 2001: 90-97]. It is not as easy for Information Gain (If you are interested, see here).

Best Answer

Related Solutions

Solved – LLR with Positive and Negative Values vs. Dunning method with Entropy-based Calculation

Cart Likelihood-Ratio Information-Theory – Understanding the Relationship Between GINI Score and Log-Likelihood Ratio

Related Question