Strict convexity of log-sum-exp function

calculusconvex-analysis

The Wikipedia page mention that the log-sum-exp function, $f(x)=\log \sum_{i=1}^n\exp(x_i)$ is strictly convex when we add $x_0=0$, i.e., $g(x)=\log (1+\sum_{i=1}^n\exp(x_i))$ I want to verify this claim but I'm not sure how to do so.

I can show that $f$ is convex by showing the Hessian is positive semi-definite, but it is not necessary that $g$ is positive definite to have that the Hessian is strictly convex. Indeed, the Hessian of $g$ is not positive definite. (When we take $b^\top Hb$, where $H$ is the Hessian and $b$ has the same element in all entry, $b^\top H b=0$).

I understand that $f$ is affine along the "45-degree line", as explained in this post. But, I'm not sure how to prove that $g$ is indeed convex when we add $0$ and eliminate the possibility that $x_i=x,\forall i$ (except ($x_i = 0$)).

Best Answer

For this answer, I'll be thinking of $f \colon \mathbf{R}^{n+1} \rightarrow \mathbf{R}$ and $g \colon \mathbf{R}^{n} \rightarrow \mathbf{R}$ so that we can consider $g$ to be $f$, just with the first argument fixed at $0$, ie $g(x) = f(0, x)$.

So the thing to note here is that the statement "$f$ is convex" is almost exacly the same as the Hölder inequality for sequence spaces. In particular, if $\lambda \in (0, 1)$, then we can take $p = 1/\lambda, q = 1/(1 - \lambda)$ and see that, for $x, y \in \mathbf{R}^{n+1}$, \begin{align*} f(\lambda x + (1 - \lambda) y) &= \log\Bigl[\sum_{j=0}^n e^{\lambda x_j} \cdot e^{(1 - \lambda)y_j}\Bigr] \\ &\leq \log\Bigl[\Bigl(\sum_{j=0}^n e^{\lambda p x_j}\Bigr)^{1/p} \cdot \Bigl(\sum_{j=0}^n e^{(1 - \lambda)qy_j}\Bigr)^{1/q}\Bigr] \\ &= \log\Bigl[\Bigl(\sum_{j=0}^n e^x_j\Bigr)^{\lambda} \cdot \Bigl(\sum_{j=0}^n e^{y_j}\Bigr)^{1 - \lambda}\Bigr] \\ &= \lambda f(x) + (1 - \lambda) f(y). \end{align*}

Moreover, we know from facts about the Hölder inequality that equality holds if and only if $(e^{x_i})_i$ and $(e^{y_i})_i$ are linearly dependent. That is, that there is a $A > 0$ such that, for all $i$, \begin{align*} e^{x_i} &= A e^{y_i}, \\ x_i - y_i &= \log A. \end{align*}

This is exacly the "45 degree line" along which $f$ is affine. But note that in the case of $g$, $x_0 = y_0$, which would require $A= 1$ and so that $x = y$. Hence, for any $x' \neq y' \in \mathbf{R}^n$ the corresponding equality never occurs, and we must have that $$ g(\lambda x' + (1 - \lambda) y') < \lambda g(x') + (1 - \lambda)g(y'), $$ rendering $g$ striclty convex.