Information Gain and Mutual Information – Key Measures in Information Theory

information theory

Andrew More defines information gain as:

$IG(Y|X) = H(Y) – H(Y|X)$

where $H(Y|X)$ is the conditional entropy. However, Wikipedia calls the above quantity mutual information.

Wikipedia on the other hand defines information gain as the Kullback–Leibler divergence (aka information divergence or relative entropy) between two random variables:

$D_{KL}(P||Q) = H(P,Q) – H(P)$

where $H(P,Q)$ is defined as the cross-entropy.

These two definitions seem to be inconsistent with each other.

I have also seen other authors talking about two additional related concepts, namely differential entropy and relative information gain.

What is the precise definition or relationship between these quantities? Is there a good text book that covers them all?

  • Information gain
  • Mutual information
  • Cross entropy
  • Conditional entropy
  • Differential entropy
  • Relative information gain

Best Answer

I think that calling the Kullback-Leibler divergence "information gain" is non-standard.

The first definition is standard.

EDIT: However, $H(Y)−H(Y|X)$ can also be called mutual information.

Note that I don't think you will find any scientific discipline that really has a standardized, precise, and consistent naming scheme. So you will always have to look at the formulae, because they will generally give you a better idea.

Textbooks: see "Good introduction into different kinds of entropy".

Also: Cosma Shalizi: Methods and Techniques of Complex Systems Science: An Overview, chapter 1 (pp. 33--114) in Thomas S. Deisboeck and J. Yasha Kresh (eds.), Complex Systems Science in Biomedicine http://arxiv.org/abs/nlin.AO/0307015

Robert M. Gray: Entropy and Information Theory http://ee.stanford.edu/~gray/it.html

David MacKay: Information Theory, Inference, and Learning Algorithms http://www.inference.phy.cam.ac.uk/mackay/itila/book.html

also, "What is “entropy and information gain”?"