I will use the same notation I used here: Mathematics behind classification and regression trees
Gini Gain and Information Gain ($IG$) are both impurity based splitting criteria. The only difference is in the impurity function $I$:
- $\textit{Gini}: \mathit{Gini}(E) = 1 - \sum_{j=1}^{c}p_j^2$
- $\textit{Entropy}: H(E) = -\sum_{j=1}^{c}p_j\log p_j$
They actually are particular values of a more general entropy measure (Tsallis' Entropy) parametrized in $\beta$:
$$H_\beta (E) = \frac{1}{\beta-1} \left( 1 - \sum_{j=1}^{c}p_j^\beta \right)
$$
$\textit{Gini}$ is obtained with $\beta = 2$ and $H$ with $\beta \rightarrow 1$.
The log-likelihood, also called $G$-statistic, is a linear transformation of Information Gain:
$$G\text{-statistic} = 2 \cdot |E| \cdot IG$$
Depending on the community (statistics/data mining) people prefer one measure or the the other (Related question here). They might be pretty much equivalent in the decision tree induction process. Log-likelihood might give higher scores to balanced partitions when there are many classes though [Technical Note: Some Properties of Splitting Criteria. Breiman 1996].
Gini Gain can be nicer because it doesn't have logarithms and you can find the closed form for its expected value and variance under random split assumption [Alin Dobra, Johannes Gehrke: Bias Correction in Classification Tree Construction. ICML 2001: 90-97]. It is not as easy for Information Gain (If you are interested, see here).
The randomForest package in R by A. Liaw is a port of the original code being a mix of c-code(translated) some remaining fortran code and R wrapper code.
To decide the overall best split across break points and across mtry variables, the code uses a scoring function similar to gini-gain:
$GiniGain(N,X)=Gini(N)-\frac{\lvert N_{1} \rvert }{\lvert N \rvert }Gini(N_{1})-\frac{\lvert N_{2} \rvert }{\lvert N \rvert }Gini(N_{2})$
Where $X$ is a given feature, $N$ is the node on which the split is to be made, and $N_{1}$ and $N_{2}$ are the two child nodes created by splitting $N$. $\lvert . \rvert $ is the number of elements in a node.
And $Gini(N)=1-\sum_{k=1}^{K}p_{k}^2$, where $K$ is the number of categories in the node
But the applied scoring function is not the exactly same, but instead a equivalent more computational efficient version. $Gini(N)$ and |N| are constant for all compared splits and thus omitted.
Also lets inspect the part if the sum of squared prevalence in a node(1) is computed as $\frac{\lvert N_{2} \rvert }{\lvert N \rvert }Gini(N_{2}) \propto
|N_2| Gini(N_{2}) = |N_2| (1-\sum_{k=1}^{K}p_{k}^2 ) = |N_2| \sum \frac{nclass_{2,k}^2}{|N_2|^2}$
where $nclass_{1,k}$ is the class count of target-class k in daughter node 1. Notice $|N_2|$ is placed both in nominator and denominator.
removing the trivial constant $1-$ from equation such that best split decision is to maximize nodes size weighted sum of squared class prevalence...
score=
$|N_1| \sum_{k=1}^{K}p_{1,k}^2 + |N_2| \sum_{k=1}^{K}p_{2,k}^2 = |N_1|\sum_{k=1}^{K}\frac{nclass_{1,k}^2}{|N_1|^2} +
|N_2|\sum_{k=1}^{K}\frac{nclass_{2,k}^2}{|N_2|^2}$
$ = \sum_{k=1}^{K}\frac{nclass_{2,k}^2}{1} |N_1|^{-1} +
\sum_{k=1}^{K}\frac{nclass_{2,k}^2}{1} |N_1|^{-2} $
$= nominator_1/denominator_1 + nominator_2/denominator_2$
The implementation also allows for classwise up/down weighting of samples. Also very important when the implementation update this modified gini-gain, moving a single sample from one node to the other is very efficient. The sample can be substracted from nominators/denominators of one node and added to the others.
I wrote a prototype-RF some months ago, ignorantly recomputing from scratch gini-gain for every break-point and that was slower :)
If several splits scores are best, a random winner is picked.
This answer was based on inspecting source file "randomForest.x.x.tar.gz/src/classTree.c" line 209-250
Best Answer
Gini index here ($G$, say) just calculates diversity or heterogeneity (or uncertainty if you will) from the sum of squared category probabilities. If every value is in the same category, then the measure is $1 - 1^2 = 0$. If every value of $n$ values is in a distinct category, then the measure is $1 - n(1/n)^2 = 1 - 1/n$. The complement is in some ways easier to think about, e.g. the reciprocal of the complement $1 / (1 - G)$ returns the "numbers equivalent", i.e. the equivalent number of equally common classes. Thus, the extremes for that are clearly $1/1$ and $1/(1/n)$, i.e. $1$ and $n$.
Your columns $a_1$ and $a_2$ have 4 T and 5 F and 5T and 4F, respectively, which I get to be the same index, namely $1 - (4/9)^2 - (5/9)^2 = .4938271605$; that's a ridiculous number of decimal places, but it suggests that you have a gross error for one column and a rounding error for the other. With your $a_3$ the principle does not change, as the index ignores labels on the categories: whatever metric meaning they might have is not considered. By my calculation you have $1 - 5((1/9)^2) - 2 ((2/9)^2) = .8395061728$.
Other names for this measure $G$ (or its complement, or the reciprocal of that) are Simpson, Herfindahl and repeat rate. Gini appears to have got there first, but its applications across ecology, economics, linguistics and many other fields are legion.