[Math] Proving Minkowski’s inequality with homogenization

inequalityreal-analysis

The standard proof of Minkowski's inequality in $L^p$ space using Hölder's inequality seems to be pretty unmotivated (see here: http://en.wikipedia.org/wiki/Minkowski%27s_inequality). Breaking up that factor into two parts, and applying Hölder's to each of them separately makes no geometrical intuitive sense to me. Rather, it would appear that the following proof (which I came up with, though is highly probable isn't in fact mine), appears much more motivated:

We need to show that $(\int_{X} |f+g|^p d\mu)^{\frac{1}{p}} \le (\int_{X} |f|^p d\mu)^{\frac{1}{p}} +(\int_{X} |g|^p d\mu)^{\frac{1}{p}}$.

Let $A=(\int_{X} |f|^p d\mu)^{\frac{1}{p}}$, and $B=(\int_{X} |g|^p)^{\frac{1}{p}} d\mu$.

Then setting $f_1=\frac{f}{A}$, and $g_1=\frac{g}{B}$, we get the equivalent inequalities

$\int_{X} (Af_1+Bg_1)^p d\mu \le (A+B)^p$

$\int_{X} (\frac{Af_1+Bg_1}{A+B})^p d\mu \le 1$

$\int_{X} (\frac{Af_1+Bg_1}{A+B})^p d\mu \le \int_{X} \frac{A|f_1|^p+B|g_1|^p}{A+B} d\mu$

Which follows immediately from the convexity of $x^p$.

Now, there are a few reasons that I find this more appealing. First, each of the steps is motivated. The first step is an attempt at homogenization, then after rewriting it in the form with the R.H.S.=1, it becomes very apparent how to finish it off using the convexity of $x^p$.

Second, a similar line of reasoning is used to prove Hölder's inequality, and is used in the following quite elegant proof of Cauchy-Schwarz for finite sequences:

To prove $(\sum a_i^2)(\sum b_i^2) \ge (\sum a_ib_i)^2$, we first homogenize by setting $\sum a_i^2=\sum b_i^2=1$. Then after square rooting, we see it is equivalent to the inequality $\frac{\sum a_i^2 + \sum b_i^2}{2} \ge \sum a_ib_i$, which is a direct result of AM-GM. Note that the use of homogenization is specifically used so that we can replace the L.H.S. with something much stronger while keeping the R.H.S. the same, almost as if by magic (which was essentially what my proof amounted to).

Third, the equality case almost becomes trivial to see, since only one inequality was applied in the entire process, and that was at the very end of the proof. Further, it's very easy to see geometrically the idea of convexity being used here, so the equality case also seems natural.

My first question is, whether there are merits to the standard proof involving Holder's inequality that I seem to be missing which either make it more general, or make it of more interest than the proof I presented.

My second question is to whether the briefly summarized proof involving dual spaces presented on the same Wikipedia page is in actuality more general than my proof. That is, whether there are spaces to which we simply cannot appeal to a convexity argument, and instead have to resort to the supremum proof. Or whether my proof can somehow be modified to prove the result in a more general class of spaces which encompasses all of those which the supremum argument works for.

Cheers,

Rofler

Best Answer

The excellent book The Cauchy-Schwarz Master Class has already been mentioned in the comments by Theo.

Since I cannot stand open questions where a lot of people know the answer to I'll just summarize what is in chapter 9 of the referred book.

You're right that you can prove it the way you do but usually when people take a course in measure and integration theory first Hölder's inequality is proven and then Minkowski's inequality. In that way it can be instructive to use Hölder's inequality to prove Minkowksi's inequality.

There is another advantage to this approach. We can quite easily deduce from the proof when equality arises. To see this let me quickly recall how the proof goes (this can be found in the book by Steele).

First write by using the triangle inequality

$$\sum_{k = 1}^n |x_k + y_k|^p \leq \sum_{k = 1}^n |x_k||x_k + y_k|^{p - 1} + \sum_{k = 1}^n |x_k||x_k + y_k|^{p - 1}.$$

So now we can assume $p > 1$ otherwise we are done. We can now apply Hölder to both of the terms on the right hand side so we find

$$\sum_{k = 1}^n |x_k||x_k + y_k|^{p - 1} \leq \left (\sum_{k = 1} |x_k|^p \right )^{1/p} \left (\sum_{k = 1}^n |x_k + y_k|^{p} \right )^{(p - 1)/p}$$

and

$$\sum_{k = 1}^n |y_k||x_k + y_k|^{p - 1} \leq \left (\sum_{k = 1} |y_k|^p \right )^{1/p} \left (\sum_{k = 1}^n |x_k + y_k|^{p} \right )^{(p - 1)/p}$$

Now we can assume that the left hand side of the first inequality is non-zero so we can divide by $\displaystyle \left (\sum_{k = 1}^n |x_k + y_k|^{p} \right )^{(p - 1)/p}$ to obtain the proof.

Fine. So now if we would have equality in Minkowski's inequality the first inequality written here would also be an equality. This implies $|x_k + y_k| = |x_k| + |y_k|$ for all $1 \leq k \leq n$. Thinking for a bit we can conclude that $x_k$ and $y_k$ must be of the same sign for all $k$. Actually, there is no problem to assume $x_k, y_k \geq 0$ because we can factor the - out and it gets lost in the absolute value.

But equality in Minkowksi's inequality also means that we have equality in the two lines where Hölder's inequality is used. Now you can recall what it means to have equality in Hölder's inequality. We have that there exists $\lambda, \lambda' \geq 1$ such that $$\lambda |x_k|^p = (|x_k + y_k|^{p - 1})^q = |x_k + y_k|^p \text{ and } \lambda' |y_k|^p = (|x_k + y_k|^{p - 1})^q = |x_k + y_k|^p.$$

Dividing both equalities we get that $\frac{\lambda}{\lambda'} |x_k|^p = |y_k|^p$. So this proof can be easily backtraced.

But again, credit must be given where credit is due: This is just what is written in Steele in my own words. I don't think I'm plagiarizing because this method can be considered to be common knowledge.

So check out the book in the library or buy it, it is quite cheap for a math book and it contains fun exercises.

Best Answer

Related Solutions

[Math] An information theory inequality which relates to Shannon Entropy

[Math] Inductive proof of Cauchy’s inequality for complex numbers

Related Question