Jensen’s Inequality Proof for Conditional Expectation (Durrett)

conditional probabilityconditional-expectationconvex-analysisjensen-inequalityprobability theory

I have some doubts as I read over Durrett's proof on Jensen's inequality for conditional expectation.

The statement is that if $\varphi$ is convex and $E|X|, E|\varphi(X)|<\infty$, then $\varphi(E(X|\mathcal{F})) \leq E(\varphi(X)|\mathcal{F})$.

In his proof he worked through the case where $\varphi$ is non-linear (as the linear case is trivial). I also know that any convex function is the supremum of some collection of affine functions.

However, what I do not understand is that he let $S = \{(a, b): a,b\in \mathbb{Q}, ax+b \leq \varphi(x), \forall x\}$ and let $\varphi(x) = \sup\{ax + b:(a, b) \in S\}$. He also said that there is an exceptional set for each a, b, which violates the Jensen's inequality, so we have to take the $\sup$ over a countable set (which is why he defined $S$ in such a way I suppose).

My questions are:

  1. Why is this definition of $\varphi$ (by using rational a and b) sufficient to cover all possible non-linear convex functions?

  2. I understand that there might be $a$ and $b$ such that $E(\varphi(X) | \mathcal{F}) < aE(X | \mathcal{F}) + b$, but such an event will have 0 measure (which is why he stressed A.S.).
    But why is taking $\sup$ over a countable set solves the problem that there is an exceptional set for each a, b?

Here is the screenshot of the theorem and the proof; problematic statements are highlighted in blue:
enter image description here

Thank you very much.

Best Answer

  1. It is essentially because, for each interior point $x_0$ of the domain of $\varphi$, we can find $m$ such that $$ \varphi(x) \geq \varphi(x_0) + m(x - x_0). $$ Any such line is called a supporting line. Now, each supporting line can be approximated by the family of lines $y=ax+b$ parametrized by $S$.

    (Actually, the claim, $\varphi(x) = \sup\{ax+b : (a, b) \in S \}$, may fail at points of discontinuity of $\varphi$. However, such points are necessarily the endpoints of the domain of $\varphi$, and such case can be treated separately.)

  2. Consider a simpler claim:

    If $X_n\leq Y$ a.s. for each $n$, then $\sup_n X_n\leq Y$ a.s.

    This is because $$ \mathbb{P}(\sup X_n > Y) = \mathbb{P}\bigl(\cup_n \{X_n > Y\}\bigr) \leq \sum_n \mathbb{P}(X_n > Y) = 0. $$ Now the highlighted part follows from exactly the same reason.