I have some doubts as I read over Durrett's proof on Jensen's inequality for conditional expectation.
The statement is that if $\varphi$ is convex and $E|X|, E|\varphi(X)|<\infty$, then $\varphi(E(X|\mathcal{F})) \leq E(\varphi(X)|\mathcal{F})$.
In his proof he worked through the case where $\varphi$ is non-linear (as the linear case is trivial). I also know that any convex function is the supremum of some collection of affine functions.
However, what I do not understand is that he let $S = \{(a, b): a,b\in \mathbb{Q}, ax+b \leq \varphi(x), \forall x\}$ and let $\varphi(x) = \sup\{ax + b:(a, b) \in S\}$. He also said that there is an exceptional set for each a, b, which violates the Jensen's inequality, so we have to take the $\sup$ over a countable set (which is why he defined $S$ in such a way I suppose).
My questions are:
-
Why is this definition of $\varphi$ (by using rational a and b) sufficient to cover all possible non-linear convex functions?
-
I understand that there might be $a$ and $b$ such that $E(\varphi(X) | \mathcal{F}) < aE(X | \mathcal{F}) + b$, but such an event will have 0 measure (which is why he stressed A.S.).
But why is taking $\sup$ over a countable set solves the problem that there is an exceptional set for each a, b?
Here is the screenshot of the theorem and the proof; problematic statements are highlighted in blue:
Thank you very much.
Best Answer
It is essentially because, for each interior point $x_0$ of the domain of $\varphi$, we can find $m$ such that $$ \varphi(x) \geq \varphi(x_0) + m(x - x_0). $$ Any such line is called a supporting line. Now, each supporting line can be approximated by the family of lines $y=ax+b$ parametrized by $S$.
(Actually, the claim, $\varphi(x) = \sup\{ax+b : (a, b) \in S \}$, may fail at points of discontinuity of $\varphi$. However, such points are necessarily the endpoints of the domain of $\varphi$, and such case can be treated separately.)
Consider a simpler claim:
This is because $$ \mathbb{P}(\sup X_n > Y) = \mathbb{P}\bigl(\cup_n \{X_n > Y\}\bigr) \leq \sum_n \mathbb{P}(X_n > Y) = 0. $$ Now the highlighted part follows from exactly the same reason.