Jensen's inequality
Let $\phi : \mathbb{R} \rightarrow \mathbb{R}$ be a convex function and $X$ be a random variable. Then
$$\phi(E[X]) \leq E[\phi(X)],$$ if $E[X]$ and $E[\phi(X)]$ exist.
Exercise
Let $\phi : \mathbb{R} \rightarrow \mathbb{R}$ be a strictly convex function, that is,
$$\phi(x) \geq ax+ b \forall x \in \mathbb{R} (1) $$
Then if
$$\phi(E[X]) = E[\phi(X)]$$
$\Rightarrow X = c$ almost everywhere, where $c$ is c a constant.
Question 1
I have found that a function $f:X \rightarrow \mathbb{R}$ is called strictly convex iff
$$f(tx_1 + (1-t)x_2) < tf(x_1) + (1-t)f(x_2) \forall t \in (0,1), \forall x_1,x_2 \in X$$
Why does the exercise mention $(1)$? Is it an equivalent definition?
Question 2
How should I approach this exercise?
Best Answer
Equation $(1)$ requires a little more context. A function $\phi:\mathbb R\to\mathbb R$ is convex if and only if, for all $x_0\in\mathbb R$, $\phi$ has a subderivative at $x_0$, i.e. there exists a number $c\in\mathbb R$ such that
$$ \phi(x) - \phi(x_0) \ge c(x-x_0) \qquad\qquad\qquad(*)$$
for all $x\in\mathbb R$. Moreover, $\phi$ is strictly convex if and only if equation $(*)$ is a strict inequality for $x\neq x_0$.
Translating this into the language of equation $(1)$, given $x_0\in\mathbb R$, there exists $a,b\in\mathbb R$ such that:
(Specifically, one can take $a=c$ and $b=\phi(x_0)-cx_0$.) This can be taken as an equivalent definition of strict convexity.
To use this to solve your exercise, let $x_0=E[X]$. Then we have
$$\phi(X) \ge aX + b, \qquad\qquad\qquad(\dagger)$$
and so taking expectation, we find
$$E[\phi(X)] \ge aE[X] + b = ax_0 + b = \phi(x_0) = \phi(E[X]).$$
This proves Jensen's inequality. Moreover, since $\phi$ is strictly convex, we know that either $X=x_0$ almost surely, or the inequality in $(\dagger)$ is strict with positive probability. In the latter case, this of course implies that the inequality above is also strict, completing the proof.