Convexity of Matrix Soft-Max (Log Trace of Matrix Exponential)

convex optimizationconvex-analysismatrix-calculus

In convex optimization it is often convenient to use the following smooth approximation to $\max\{x_1, \ldots, x_n\}$:

$$
f_\lambda(x_1, \ldots, x_n) = \frac{1}{\lambda}\log \sum_{i = 1}^n{e^{\lambda x_i}}.
$$

Above $\lambda > 0$ is a parameter. It's easy to see that

$$
\max\{x_1, \ldots, x_n\} \leq f_\lambda(x_1, \ldots, x_n) \leq \max\{x_1, \ldots, x_n\} + \frac{1}{\lambda}\log n.
$$

It is also easy to verify that $f_\lambda(x_1, \ldots, x_n)$ is convex. For example, the second derivative at $x = (x_1, \ldots, x_n)$ in the direction of $y = (y_1, \ldots, y_n)$ is

$$
\left(\frac{d^2}{(dt)^2}f_\lambda(x + ty)\right)_{t = 0} = \lambda \left(\frac{\sum_i{e^{\lambda x_i}y_i^2}}{\sum_i{e^{\lambda x_i}}} – \frac{(\sum_i{e^{\lambda x_i}y_i})^2}{(\sum_i{e^{\lambda x_i)}})^2} \right) \ge 0,
$$
for $\lambda > 0$, by Cauchy-Schwarz (notice this is the variance of a random variable that takes value $y_i$ with probability proportional to $e^{\lambda x_i})$.

There is a natural matrix equivalent to the soft max which is an approximation of the largest eigenvalue of a real symmetric matrix $X$:

$$
g_\lambda(X) = \frac{1}{\lambda}\log \mathrm{tr\ } e^{\lambda X}.
$$

Let's assume that $X$ is real symmetric $n\times n$ matrix, with eigenvalues $\mu_1 \geq \mu_2 \ldots \ge \mu_n$. The matrix exponential is, as usual, $e^M = \sum_{i = 0}^\infty{\frac{1}{i!}M^i}$. Then, $\mu_1 \leq g_\lambda(X) \leq \mu_1 + \frac{1}{\lambda}\log n$.

Is $g_\lambda(X)$ convex over real symmetric $X$, for $\lambda > 0$? What's an easy proof?

The proof of convexity for $f_\lambda$ does not adapt easily, because of commutativity issues.

Best Answer

$ \DeclareMathOperator{\tr}{tr}$ You can get some information using the concepts of Schur-convexity and majorization:

Let $X$ be an $n \times n$ real symmetric matrix, with spectral decomposition $ X = U \Lambda U^T$, $U$ is an orthogonal matrix, $\Lambda$ is diagonal with the eigenvalues. Then your function is $$ g_\lambda(X)= \frac{1}{\lambda} \log \tr e^{\lambda X} = \frac{1}{\lambda} \log\sum_{k=1}^n e^{\lambda \lambda_k}. $$ (the matrix exponential function)

Now, $g_\lambda$ is a Schur-convex function (the proof is easy), the proof method given in the wikipedia article works. And, if now $X$ and $Y$ are two real symmetric matrices as above, then, if the eigenvalues of $X$ are $\lambda_i$ and the eigenvalues of $Y$ is $\omega_i$, we have the majorization result that the vector of eigenvalues of $\mu X + (1-\mu) Y$ $(0 \le \mu \le 1)$ is majorized by the vector $\mu \lambda + (1-\mu) \omega$. This gives the result that $$ g_\lambda(\mu X + (1-\mu) Y) \le \frac{1}{\lambda} \log \sum_{k=1}^n e^{\lambda(\mu \lambda_k + (1-\mu) \omega_k)} $$ which might be of some help in further analysis.

EDIT The following paper seems to give an answer: http://people.orie.cornell.edu/aslewis/publications/96-convex.pdf

Your function is convex. The result proved there is that a spectral matrix function (of Hermitian or real symmetric argument) ($g_\lambda$ is spectral, meaning that it depends only on the eigenvalues of the matrix argument): convex spectral functions can be characterized as symmetric convex functions of the eigenvalues. $g_\lambda$ is clearly symmetric, and you proved yourself it is convex. That is all which is needed (after reading that paper above).