Vector calculus and undefined operations

calculuslinear algebramultivariable-calculusvector analysis

This is perhaps a somewhat philosophical question. I'm working with multivariate equations and require the calculation of their Jacobians/Hessians. In order to work it out, I am resorting to representing the equations in their scalar forms to be able to achieve this.

Example

For $\vec{y}, \vec{x} \in \mathbb{R}^2$ and $X \in \mathbb{R}^{2 \times 2}$ we have
\begin{equation} f(\vec{x}) = \vec{y}^T \left( \frac{1}{1 + e^{-\vec{x}^T X}} \right) \end{equation}

In the above, $\frac{1}{1 + e^{-\vec{x}^T X}} = \vec{u}$ and $\vec{u} \in \mathbb{R}^2$.

Now in order to compute the gradient using the vector-matrix notation, I would end up having to expand the derivative so ($\hat{e}_1$, $\hat{e}_2$ are basis vectors $\in \mathbb{R}^2$):

\begin{equation} \nabla f(\vec{x}) = \sum_{i=1}^2 \frac{\partial}{\partial x_i} \left( y_1 \left(\frac{1}{1 + e^{-X_{11} x_1 – X_{12} x_2}}\right) + y_2 \left(\frac{1}{1 + e^{-X_{21} x_1 – X_{22} x_2}}\right) \right) \hat{e}_i \end{equation}

because, if I were to attempt to derive the gradient using the vector notations, I would end up with:
\begin{equation} \nabla f(\vec{x}) = \vec{y}^T \frac{X e^{-\vec{x}^T X}}{(1 + e^{-\vec{x}^T X})^2} \end{equation}

which does not make any sense because there are no operations defined for the squaring of a vector and finding its inverse etc etc.

Question: Which convention/formulation would I have to choose to reduce the work involved in calculating the gradient vector of such a function in multivariate/vector calculus?

Best Answer

Let's use a colon to denote the trace/Frobenius product, i.e. $$A:B = {\rm Tr}(A^TB) = {\rm Tr}((A^TB)^T) = {\rm Tr}(B^TA) = B:A$$

To avoid confusion with the vector $x$, rename the matrix $X$ to $A$.
Also define the column vectors $$\eqalign{ p &= -A^Tx \quad&\implies\quad &dp = -A^Tdx \\ q &= \exp(p) \quad&\implies\quad &dq = q\odot dp \qquad\big({\rm Hadamard\,product}\big) \\ r &= {\tt1}+q \quad&\implies\quad &dr = dq \\ s &= \frac{1}{r} \quad&\implies\quad &ds = \frac{(-1)dr}{r^{\odot 2}} = -s\odot s\odot dr \\ }$$ Write the function of interest $(\phi)$ in terms of these new vectors, then calculate its gradient. $$\eqalign{ \phi &= y:s \\ d\phi &= y:ds \\ &= -y:(s\odot s\odot dr) \\ &= -(s\odot s\odot y):dq \\ &= +(s\odot s\odot y):(q\odot A^Tdx) \\ &= A(q\odot s\odot s\odot y):dx \\ \frac{\partial\phi}{\partial x} &= A(q\odot s\odot s\odot y) \;=\; g \qquad\big({\rm the\,gradient}\big) \\ }$$ By forming diagonal matrices from the vectors, e.g.
$$\eqalign{Q = {\rm Diag}(q),\quad R = {\rm Diag}(r),\quad etc.}$$ the Hadamard products can be replaced, and the gradient can be written as $$\eqalign{ g &= AQS^2 y \;=\; AQ(I+Q)^{-2} y \\ }$$ The Hessian is simply the gradient of the gradient, therefore $$\eqalign{ dg &= A\,dQ(I+Q)^{-2}y + AQ\,d(I+Q)^{-2}\,y \\ &= A(I+Q)^{-2}\,dQ\,y -2AQ(I+Q)^{-3}\,dQ\,y \\ &= A(I+Q)^{-2}Y\,dq -2AQ(I+Q)^{-3}Y\,dq \\ &= A\Big((I+Q)-2Q\Big)(I+Q)^{-3}Y\,dq \\ &= A(I-Q)(I+Q)^{-3}YQ\,dp \\ &= A(Q-I)(I+Q)^{-3}YQA^Tdx \\ &= A(Q^2-Q)(I+Q)^{-3}YA^Tdx \\ \frac{\partial g}{\partial x} &= A(Q^2-Q)(I+Q)^{-3}YA^T \;=\; H \qquad\big({\rm the\,Hessian}\big) \\ }$$ NB:   Some steps take advantage of the fact that diagonal matrices commute, as do their products with vectors, e.g. $$\eqalign{ Qy &= q\odot y = y\odot q = Yq \\ }$$

Related Question