[Math] Gradient of a scalar function with respect to a matrix

calculusderivativesmatricespartial derivative

I need to calculate $\dfrac{\partial}{\partial K}f(K)$, with:
$$
f(K)=-\frac{1}{2}(u-Kx)^T\Sigma^{-1}(u-Kx)$$
$K$ and $\Sigma$ are $n\times n$ matrices, $\Sigma$ is symmetric, $u$ and $x$ are column vectors of size $n$.

The result should be a matrix like:
$$
\begin{bmatrix}
\frac{\partial f(K)}{\partial k_{11}} & \frac{\partial f(K)}{\partial k_{12}} & \ldots
\\
\frac{\partial f(K)}{\partial k_{21}} & \ldots & \ldots
\\
\ldots & \ldots & \ldots
\end{bmatrix}
$$
Am I right?

Following Petersen's Matrix Cookbook, I obtain the following matrix:
$$
\Sigma^{-1}(u-Kx)x^T
$$
My problem is that, choosing both $K$ and $\Sigma$ $2 \times 2$ diagonal, I get two different results:

  • if I derive it step by step, that is finding the scalar $f(K)$ and then deriving wrt of all $k_{ij}$ I obtain this matrix:
    $$
    \begin{pmatrix}
    \frac{(u_1-k_1x_1)x_1}{\sigma_1^2} & 0
    \\
    0 & \frac{(u_2-k_2x_2)x_2}{\sigma_2^2}
    \end{pmatrix}
    $$
  • following Petersen's formula:
    $$
    \begin{pmatrix}
    \frac{(u_1-k_1x_1)x_1}{\sigma_1^2} & \frac{(u_1-k_1x_1)x_2}{\sigma_1^2}
    \\
    \frac{(u_2-k_2x_2)x_1}{\sigma_2^2} & \frac{(u_2-k_2x_2)x_2}{\sigma_2^2}
    \end{pmatrix}
    $$
    What am I doing wrong?

Best Answer

The formula from matrix cookbook does not assume any special structure on $K$, e.g. diagonal. If you want to impose this structure when taking the derivative, you should use eq. (122): $$ \dfrac{\partial f}{\partial K_{ij}} = \mathrm{Tr}\left[\left[\dfrac{\partial f}{\partial K}\right]^T\dfrac{\partial K}{\partial K_{ij}}\right]. $$ The terms $ \dfrac{\partial K}{\partial K_{ij}} $ will be 1 in the diagonal and 0 otherwise. This connects the two formulas that you have derived.