Proving Jensen’s inequality for the general case starting from the finite case

jensen-inequalitylebesgue-integralmeasure-theoryprobabilitysolution-verification

For a finite convex combination, Jensen’s inequality is given by: $$f\left(\sum_i^n a_i x_i\right) \leq \sum_i^n a_i f(x_i)$$ for a convex $f$. Proving this is not so bad starting from the definition of a convex function (iteratively break the convex combination into a convex combination between just two points and apply the definition of a convex function).

For the case of general probability measures: $$f(\mathbb E X) \leq \mathbb E f(X)$$ where $X$ is a real valued random variable, the approach I know is pretty different (show that a tight affine lower bound of $f$ exists then apply linearity of expectation) but I feel that the method for the finite case could be made to work here.

Here is my attempt, drawing inspiration from the construction of the Lebesgue integral:

Given a real valued random variable $X$, let $X^+,X^-$ be such that $X = X^+ – X^-$ so that it suffices to show the claim for non-negative random variables (apply the definition of convexity to the convex combination $\mathbb P [X>0] X^+ + \left(1-\mathbb P[X>0]\right) X^-$). Let $X_j$ be a sequence of simple random variables (defined as a finite sum of indicator functions on the sample space) dominated by $X$ such that $$\mathbb E X_j \rightarrow \sup \left\{\mathbb E Y: Y \leq X,\, Y\text{ is simple}\right\}$$

Now we have:
$$f\left(\mathbb E X\right) = f\left(\mathbb E X_j + o(1)\right) = f\left(\mathbb E X_n\right) + o(1) \leq \mathbb E f\left( X_j\right) + o(1)$$

Where the first equality is by definition of the sequence $X_j$, the second equality is from the fact that $f$ is convex and so continuous on the interior of its domain, the inequality is from the finite case of Jensen's inequality.

Next, I would like to claim that $X_j \rightarrow ^p X$ and apply the continuous mapping theorem to complete the argument, but I'm not sure how to justify that claim.

I could use feedback on my reasoning as well as some help justifying the last claim.

EDIT: there is a simpler argument. Instead of defining $X_j$ as I have, I could instead define it as a sequence of simple random variables increasing towards $X$ pointwise (for every event in the sample space). I had not realize that we could define such a sequence in general, I am not sure how to show that this is true. In that case we have:

$$\mathbb E f\left( X_j\right) = \mathbb E f\left( X\right) + o(1)$$

by the continuous mapping theorem.

Any feedback on this followup approach would be very appreciated.

Best Answer

I think your argument goes through fine if you choose the $X_j$ in a nice way. Don't just take any $X_j$, take $X_j$ nonnegative, simple, and increasing to $X$ with $X_j \to X$ a.s. (and hence in probability as well, but the need for the continuous mapping theorem goes away since monotone convergence applies.) The existence of such an approximating sequence is usually part of the process of defining the Lebesgue integral. Split $\mathbb{R}$ into multiples of $2^{-k}$ from $0$ to $k$ and take $X_k$ to be the largest multiple of $2^{-k}$ that is less than $X$ and $k$. Then $0 \leq X_k \leq \min(X, k)$ and $X_k$ increases to $X$ pointwise.

I would like to note, however, that proving the general case of Jensen directly is somewhat simpler than your extension of the finite case. For any line $L \leq f$, one has $L(E[X]) = E[L(X)] \leq E[f(X)]$, now take the sup over lines (with rational slope and intercept, say, so the sup is countable) to get $f(E[X]) \leq E[f(X)]$.

Related Question