I analyse some data, where I have two vectors of values, vector A and vector B. I compute the following two things:
for x, y in A, B:
ratios.append(x / y)
for x, y in A, B:
log_ratios.append(ln(x/y))
Next I compute
log_mean = ln(mean(ratios))
mean_logs = mean(log_ratios)
So in simply words, given my data I compute two mean values – a mean of logs, and a log of means. I noted that those two values are very far away, i.e.,
log_means = 20.148329613107876
mean_logs = 1.6568702569456684
I did a small computation, comparing how the equations look like, so for the log of mean I would get
$$log\_mean = log\left(\frac{\sum_{i=0}^n\frac{x_i}{y_i}}{n}\right) =
log\left(\sum_{i=0}^n\frac{x_i}{y_i}\right) – log(n)$$
whereas
$$ mean\_log = \frac{\sum_{i=0}^n \log\left(\frac{x_i}{y_i}\right)}{n} =
\frac{\sum_{i=0}^n\left(log(x_i) – log(y_i)\right)}{n} $$
So I see, that mathematically, I obtain two different values, however I'm having hard time to intuitively understand why this difference is so big. Can someone help me a bit with it?
Best Answer
This is the AM-GM inequality in disguise.
You have, setting $a_i = \frac{x_i}{y_i}$, $$ \left( a_1 a_2\dots a_n \right)^{1/n} \leq \frac{a_1+\dots+ a_n}{n} \tag{AM-GM} $$ Taking the log on each side, $$ \frac{1}{n}\log \left( a_1 a_2\dots a_n \right) \leq \log \frac{a_1+\dots+ a_n}{n} $$ or, equivalently, $$ \frac{\log a_1 + \dots + \log a_n}{n} \leq \log \frac{a_1+\dots+ a_n}{n} \,. $$ (Note that you can also prove your inequality with Jensen's inequality, since $\log$ is concave).
In both cases (AM-GM or Jensen's inequalities), you get that the inequality holds if, and only if, $a_1=\dots = a_n$. That tells you why you have that inequality, and why it's not an equality.
As to see why it's that "much" of a strict inequality in your case, intuitively you can think of it as a robust version of the above statement: "the more the $a_i$'s are different (the least balanced), the more this inequality will be far from an equality." (This is very handwavy, but true here.)