Jensen’s inequality for composition of functions

convex-analysisfunctionsjensen-inequality

I want to prove (or find a counterexample for) the following variant of Jensen's inequality.

Let $f$ and $g$ be convex functions (then $f(g(x))$ and $g(f(x))$ are convex functions). From the standard Jensen's inequality, we have

$$
\mathbb{E_{\sim i}}[f(g(x_i))] \geq f(g(\mathbb{E_{\sim i}}[x_i]))
$$

or alternatively
$$
\mathbb{E_{\sim i}}[f(g(x_i))] \geq f(\mathbb{E_{\sim i}}[g(x_i)])
$$

where in the second case we are only "extracting" the first function, but we can take the first as well since the composition of $f,g$ is convex.

I would like to know what necessary assumptions on $f,g$ are required such that the following holds:

$$
\mathbb{E_{\sim i}}[f(g(x_i))] \geq g(\mathbb{E_{\sim i}}[f(x_i)])
$$

A sufficient condition, of course is that $f\circ g \geq g \circ f$:
$$
\mathbb{E_{\sim i}}[f(g(x_i))] \geq \mathbb{E_{\sim i}}[g(f(x_i))] \geq g(\mathbb{E_{\sim i}}[f(x_i)])
$$

but this is not very interesting and I was hoping for something more general.

Edit: Some additional constraints of interest to consider: $f$ is monotonic, $g$ is sublinear.

Best Answer

Unless I'm misunderstand your question, your sufficient condition for the inequality $$ \mathbb{E}_{\sim\mathbb{i}}\left[f\big(g(x_i)\big)\right]\ge g\left(\mathbb{E}_{\sim\mathbb{i}}\left[f\big(x_i\big)\right]\right) $$ to hold is also necessary. I'm presuming $\ f\ $ and $\ g\ $ are defined (and therefore finite and continuous) on the whole of some Euclidean space $\ \mathbb{R}^m\ $, and you require the inequality to hold for all distributions $\ \mathbb{i}\ $. If you choose the distribution $\ \mathbb{i}\ $ to have its entire weight concentrated at the single point $\ x\ $, then \begin{align} \mathbb{E}_{\sim\mathbb{i}}\left[f\big(g(x_i)\big)\right]&=f\big(g(x)\big)\ \ \text{and}\\ g\left(\mathbb{E}_{\sim\mathbb{i}}\left[f\big(x_i\big)\right]\right)&=g\big(f(x)\big)\ , \end{align} so the inequality implies that $\ f\big(g(x)\big)\ge g\big(f(x)\big)\ $. Even if you restrict the inequality to holding only for distributions $\ \mathbb{i}\ $ with strictly positive variance, you can still get the same result by taking a sequence $\ \mathbb{i}_n\ $ of distributions with mean $\ x\ $ and variances which tend to $\ 0\ $ as $\ n\rightarrow\infty\ $.

Related Question