Matrix Calculus: $\nabla_{\mathbf{c}}(-2\mathbf{x}^T \mathbf{D} \mathbf{c} + \mathbf{c}^T \mathbf{c}) = -2\mathbf{D}^T \mathbf{x} + 2\mathbf{c}$

In an example of Principle Component Analysis, my textbook uses vector calculus to do the following:

$\nabla_{\mathbf{c}}(-2\mathbf{x}^T \mathbf{D} \mathbf{c} + \mathbf{c}^T \mathbf{c}) = \mathbf{0}$

$\rightarrow -2\mathbf{D}^T \mathbf{x} + 2\mathbf{c} = \mathbf{0}$

$\rightarrow c = \mathbf{D}^T \mathbf{x}$

Where $\nabla_{\mathbf{c}}$ is the gradient with respect to $\mathbf{c}$, $\mathbf{D} \in \mathbb{R}^{n \times l}$, $\mathbf{c} \in \mathbb{R}^l$, and the columns of $\mathbf{D}$ are orthogonal to each other.

I have the following questions:

How did the authors get from $\mathbf{x}^T \mathbf{D} \mathbf{c}$ to $\mathbf{D}^T \mathbf{x}$? I haven't studied matrix calculus, but I'm assuming that $\dfrac{ \partial }{\partial{\mathbf{c}}} \mathbf{c} = I$? What about $\mathbf{x}^T \mathbf{D}$ to $\mathbf{D}^T \mathbf{x}$?
How did the authors go from $\mathbf{c}^T \mathbf{c}$ to $2\mathbf{c}$? I found the following in Matrix Differentiation by Randal J. Barnes:

If we assume that $\mathbf{A} = I$, then isn't this what we're looking for? But wouldn't that leave us with $2\mathbf{c}^T$ instead of just $2\mathbf{c}$?

I would greatly appreciate it if people could please take the time to clarify this.

Best Answer

There are (unfortunately) different conventions in matrix calculus. The most common is $$\nabla_{c} \equiv \frac{\partial}{\partial c}$$ However, in you example the convention is $$\nabla_{c} \equiv \frac{\partial}{\partial c^{T}}$$ With this in mind, we perform the calculation \begin{align*} \frac{\partial}{\partial c^{T}}\left(-2x^{T}Dc + c^{T}c\right) = 0 \tag{1}\\ \frac{\partial}{\partial c^{T}}\left(-2c^{T}D^{T}x + c^{T}c\right) = 0 \tag{2}\\ -2\frac{\partial c^{T}}{\partial c^{T}}D^{T}x + \frac{\partial}{\partial c^{T}}(c^{T}c) = 0 \tag{3}\\ -2D^{T}x + \frac{\partial c^{T}}{\partial c^{T}}c + c^{T}\frac{\partial c}{\partial c^{T}}= 0 \tag{4}\\ -2D^{T}x + c + c^{T}\frac{\partial c}{\partial c^{T}}= 0 \tag{5}\\ -2D^{T}x + 2c= 0 \tag{6} \end{align*}

Now, to get from from $(5)$ to $(6)$ we need to show that $\displaystyle c^{T}\frac{\partial c}{\partial c^{T}} = c$.

We can see this using index notation. Here, $c$ has an upper index, and $c^{T}$ has a lower index. They are related by $c^{i} = \delta^{ij}(c^{T})_{j}$ and $(c^{T})_{i} = \delta_{ij}c^{j}$ (where $\delta$ is the kronecker delta).

So we have $$\left(c^{T}\frac{\partial c}{\partial c^{T}}\right)^j = (c^{T})_{i}\frac{\partial c^{i}}{\partial (c^{T})_j}\\ = (c^{T})_{i}\frac{\partial (c^{T})_{k}\delta^{ki}}{\partial (c^{T})_j}\\ = (c^{T})_{i}\frac{\partial (c^{T})_{k}}{\partial (c^{T})_j}\delta^{ki}\\ = (c^{T})_{i}\delta^{j}_{k}\delta^{ki} = (c^{T})_{i}\delta^{ji} = c^{j} $$

And we are done.

Best Answer

Related Solutions

Gradient Descent: Minimising the Directional Derivative in Direction $\mathbf{u}$

Taking the gradient of $f(\mathbf{x}) = \frac{1}{2}\|\mathbf{A} \mathbf{x} – \mathbf{b}\|_2^2$

Related Question