Why the log of the mean is so much different than the mean of logs

logarithmsmeans

I analyse some data, where I have two vectors of values, vector A and vector B. I compute the following two things:

for x, y in A, B:
    ratios.append(x / y)

for x, y in A, B:
    log_ratios.append(ln(x/y))

Next I compute

log_mean = ln(mean(ratios))
mean_logs = mean(log_ratios)

So in simply words, given my data I compute two mean values – a mean of logs, and a log of means. I noted that those two values are very far away, i.e.,

log_means = 20.148329613107876
mean_logs = 1.6568702569456684

I did a small computation, comparing how the equations look like, so for the log of mean I would get
$$log\_mean = log\left(\frac{\sum_{i=0}^n\frac{x_i}{y_i}}{n}\right) =
log\left(\sum_{i=0}^n\frac{x_i}{y_i}\right) – log(n)$$
whereas
$$ mean\_log = \frac{\sum_{i=0}^n \log\left(\frac{x_i}{y_i}\right)}{n} =
\frac{\sum_{i=0}^n\left(log(x_i) – log(y_i)\right)}{n} $$

So I see, that mathematically, I obtain two different values, however I'm having hard time to intuitively understand why this difference is so big. Can someone help me a bit with it?

Best Answer

This is the AM-GM inequality in disguise.

You have, setting $a_i = \frac{x_i}{y_i}$, $$ \left( a_1 a_2\dots a_n \right)^{1/n} \leq \frac{a_1+\dots+ a_n}{n} \tag{AM-GM} $$ Taking the log on each side, $$ \frac{1}{n}\log \left( a_1 a_2\dots a_n \right) \leq \log \frac{a_1+\dots+ a_n}{n} $$ or, equivalently, $$ \frac{\log a_1 + \dots + \log a_n}{n} \leq \log \frac{a_1+\dots+ a_n}{n} \,. $$ (Note that you can also prove your inequality with Jensen's inequality, since $\log$ is concave).

In both cases (AM-GM or Jensen's inequalities), you get that the inequality holds if, and only if, $a_1=\dots = a_n$. That tells you why you have that inequality, and why it's not an equality.

As to see why it's that "much" of a strict inequality in your case, intuitively you can think of it as a robust version of the above statement: "the more the $a_i$'s are different (the least balanced), the more this inequality will be far from an equality." (This is very handwavy, but true here.)

Related Question