Hessian of the log sum exp with affine input

derivativesexponential functionhessian-matrixlinear algebrapartial derivative

Say I have this term $f(x) = log(\sum_{i=1}^{m} exp(K(a_i^Tx + b_i)))$, where $f: \mathbb{R}^d \rightarrow \mathbb{R}$, $x \in \mathbb{R}^d$, $A \in \mathbb{R}^{d \times m}$ and $a_i^T$ is column vector of A and I need to find a Hessian of this term.
For first partial derivative I get this: $$\frac{\partial f(x)}{\partial x_j} = \frac{\sum_{i=1}^{m}a_{ik} \cdot e^{K(a_i^Tx + b_i))}}{\sum_{i=1}^{m} e^{K(a_i^Tx + b_i))}} $$

And for second partial derivative I get this : $$\frac{\partial^2 f(x)}{\partial x_j \partial x_k} = K \frac{\sum_{i=1}^{m} a_{ij} a_{ik} e^{K(a_i^Tx + b_i))} \sum_{i=1}^{m} e^{K(a_i^Tx + b_i))} – \sum_{i=1}^{m} a_{ij} e^{K(a_i^Tx + b_i))} \sum_{i=1}^{m} a_{ik} e^{K(a_i^Tx + b_i))} }{(\sum_{i=1}^{m} e^{K(a_i^Tx + b_i))})^2} $$.

My question is, what would you think could be the clue to derive out those $a_{jk}$ terms inside sums so I get matrix $A$ and the end on the Hessian level, since I have no idea how it should be worked on, I just need a clue (either in a form of an equality or $\le$ inequality?

Best Answer

$\def\r#1{\color{red}{#1}}\def\o{{\tt1}}\def\p#1#2{\frac{\partial #1}{\partial #2}}$For ease of typing, define the auxiliary variables $$\eqalign{ w &= \kappa(A^Tx+b) &\implies\quad dw = \kappa A^Tdx \\ e &= \exp(w) &\implies\quad e = E^T\o \\ E &= {\rm Diag}(e) &\implies\quad de = E\,dw \\ \phi &= \o:e \\ d\phi &= \o:E\,dw \\&= E^T\o:dw \\&= e:dw \;=\; e^Tdw \\ }$$ where a colon denotes the trace product, i.e. $$A:B = {\rm Tr}(A^TB) \;=\; \sum_{i=1}^m \sum_{j=1}^n A_{ij} B_{ij}$$

Write your function in terms of these new variables. Then calculate the differential and gradient. $$\eqalign{ f &= \log(\phi) \\ df &= \phi^{-1}d\phi \\ &= \phi^{-1}e:dw \\ &= \phi^{-1}e:\kappa A^Tdx \\ &= \kappa\phi^{-1}Ae:dx \\ g\doteq\p{f}{x} &= \kappa\phi^{-1}Ae \\ }$$ Now calculate the differential of $g$ and thence the Hessian. $$\eqalign{ dg &= \kappa\phi^{-1}\,A\;\r{de} + \kappa Ae\,\r{d\phi^{-1}} \\ &= \kappa\phi^{-1}A\r{E\,dw} + \kappa Ae\,\r{(-\phi^{-2}d\phi)} \\ &= \kappa\phi^{-1}AE\,dw - \kappa\phi^{-2}Aee^Tdw \\ &= \kappa\phi^{-2}A\left(\phi E - ee^T \right)dw \\ &= \kappa\phi^{-2}A\left(\phi E - ee^T \right)(\kappa A^Tdx) \\ H\doteq\p{g}{x} &= \kappa^2\phi^{-2}A\left(\phi E - ee^T \right)A^T \\ }$$

Related Question