Equality in Jensen’s inequality for strictly convex functions

probability theory

Jensen's inequality

Let $\phi : \mathbb{R} \rightarrow \mathbb{R}$ be a convex function and $X$ be a random variable. Then
$$\phi(E[X]) \leq E[\phi(X)],$$ if $E[X]$ and $E[\phi(X)]$ exist.

Exercise
Let $\phi : \mathbb{R} \rightarrow \mathbb{R}$ be a strictly convex function, that is,
$$\phi(x) \geq ax+ b \forall x \in \mathbb{R} (1) $$

Then if
$$\phi(E[X]) = E[\phi(X)]$$

$\Rightarrow X = c$ almost everywhere, where $c$ is c a constant.

Question 1

I have found that a function $f:X \rightarrow \mathbb{R}$ is called strictly convex iff
$$f(tx_1 + (1-t)x_2) < tf(x_1) + (1-t)f(x_2) \forall t \in (0,1), \forall x_1,x_2 \in X$$

Why does the exercise mention $(1)$? Is it an equivalent definition?

Question 2

How should I approach this exercise?

Best Answer

Equation $(1)$ requires a little more context. A function $\phi:\mathbb R\to\mathbb R$ is convex if and only if, for all $x_0\in\mathbb R$, $\phi$ has a subderivative at $x_0$, i.e. there exists a number $c\in\mathbb R$ such that

$$ \phi(x) - \phi(x_0) \ge c(x-x_0) \qquad\qquad\qquad(*)$$

for all $x\in\mathbb R$. Moreover, $\phi$ is strictly convex if and only if equation $(*)$ is a strict inequality for $x\neq x_0$.

Translating this into the language of equation $(1)$, given $x_0\in\mathbb R$, there exists $a,b\in\mathbb R$ such that:

  • $\phi(x_0) = ax_0+b$, and
  • $\phi(x) > ax + b$ for all $x\neq x_0$.

(Specifically, one can take $a=c$ and $b=\phi(x_0)-cx_0$.) This can be taken as an equivalent definition of strict convexity.

To use this to solve your exercise, let $x_0=E[X]$. Then we have

$$\phi(X) \ge aX + b, \qquad\qquad\qquad(\dagger)$$

and so taking expectation, we find

$$E[\phi(X)] \ge aE[X] + b = ax_0 + b = \phi(x_0) = \phi(E[X]).$$

This proves Jensen's inequality. Moreover, since $\phi$ is strictly convex, we know that either $X=x_0$ almost surely, or the inequality in $(\dagger)$ is strict with positive probability. In the latter case, this of course implies that the inequality above is also strict, completing the proof.