Information / mutual information does not depend on the possible values, it depends only on the probabilities therefore it is less sensitive. Distance correlation is more powerful and simpler to compute. For a comparision see
http://www-stat.stanford.edu/~tibs/reshef/comment.pdf
Cross correlation assumes a linear relationship between 2 sets of data. Whereas mutual information only assumes that one value of one dataset says something about the value of the other dataset.
So mutual information makes much weaker assumptions.
A traditional problem solved with mutual information is aligning (registration) of two types of medical images, for example an ultrasound and a x-ray image.
(typically, the types of images are called modalities, so the problem is named multi-modal image registration).
For both X-ray and ultrasound, a specific material, say bone, leads to a certain 'brightness' in the image. Whereas some materials lead to a bright x-ray and ultrasound image, for other materials (e.g. fat) it might be the opposite, one is bright, the other is dark.
Therefore, it is not the case that bright parts of the X-ray image are also bright parts of the ultrasound.
Therefore, mutual information is still a useful criterion for aligning the images, but cross correlation is not.
Best Answer
Let's consider one fundamental concept of (linear) correlation, covariance (which is Pearson's correlation coefficient "un-standardized"). For two discrete random variables $X$ and $Y$ with probability mass functions $p(x)$, $p(y)$ and joint pmf $p(x,y)$ we have
$$\operatorname{Cov}(X,Y) = E(XY) - E(X)E(Y) = \sum_{x,y}p(x,y)xy - \left(\sum_xp(x)x\right)\cdot \left(\sum_yp(y)y\right)$$
$$\Rightarrow \operatorname{Cov}(X,Y) = \sum_{x,y}\left[p(x,y)-p(x)p(y)\right]xy$$
The Mutual Information between the two is defined as
$$I(X,Y) = E\left (\ln \frac{p(x,y)}{p(x)p(y)}\right)=\sum_{x,y}p(x,y)\left[\ln p(x,y)-\ln p(x)p(y)\right]$$
Compare the two: each contains a point-wise "measure" of "the distance of the two rv's from independence" as it is expressed by the distance of the joint pmf from the product of the marginal pmf's: the $\operatorname{Cov}(X,Y)$ has it as difference of levels, while $I(X,Y)$ has it as difference of logarithms.
And what do these measures do? In $\operatorname{Cov}(X,Y)$ they create a weighted sum of the product of the two random variables. In $I(X,Y)$ they create a weighted sum of their joint probabilities.
So with $\operatorname{Cov}(X,Y)$ we look at what non-independence does to their product, while in $I(X,Y)$ we look at what non-independence does to their joint probability distribution.
Reversely, $I(X,Y)$ is the average value of the logarithmic measure of distance from independence, while $\operatorname{Cov}(X,Y)$ is the weighted value of the levels-measure of distance from independence, weighted by the product of the two rv's.
So the two are not antagonistic—they are complementary, describing different aspects of the association between two random variables. One could comment that Mutual Information "is not concerned" whether the association is linear or not, while Covariance may be zero and the variables may still be stochastically dependent. On the other hand, Covariance can be calculated directly from a data sample without the need to actually know the probability distributions involved (since it is an expression involving moments of the distribution), while Mutual Information requires knowledge of the distributions, whose estimation, if unknown, is a much more delicate and uncertain work compared to the estimation of Covariance.