How to take the derivative of $(A+B \cdot C)^{T}(A+B \cdot C)$ with respect to matrix C

linear algebramatricesmatrix-calculusmultivariable-calculus

Where T is the transpose operator

A is a matrix of shape (10, 1)

B is a matrix of shape (10, 3)

C is a matrix of shape (3, 1)

I am trying to find the derivative of this expression with respect to matrix C using vector calculus. I would like to know how to compute this without reducing it to element by element operations.

I am trying to follow the rules in the matrix cookbook: https://www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf

but I cannot seem to come up with the right answer.

Overall, I use the product rule:

The partial of the first term becomes $B^{T}$.
The partial of the second term becomes $B$.

Then we apply the product rule giving us:
$B^{T}(A+B\cdot C)+(A+B\cdot C)^{T}B$

But this expression is adding a 3 by 1 matrix to a 1 by 3 matrix.

What am I doing wrong?

Thanks!!

Best Answer

The easiest way to calculate the derivatives of matrix valued functions is to go back to the definition of derivative in the usual sense of limits, so if $$f(A,B,C) = (A + BC)^{T}(A+BC),$$ then for some small $t > 0$, and some $E \in M_{3\times 1}(\mathbb{R})$, we calculate the directional derivative in the "direction" of $E$: $$\frac{\partial f(A,B,C)}{\partial C}(E) = \lim_{t\rightarrow 0}\frac{f(A,B,C+tE) - f(A,B,C)}{t}\\ = \lim_{t\rightarrow 0}\frac{1}{t}\bigg[ \big(A + B(C+tE)\big)^{T}\big(A+B(C+tE)\big) -(A + BC)^{T}(A+BC)\bigg]\\ \lim_{t\rightarrow 0}\frac{1}{t}\bigg[ (A + BC)^{T}(A+BC) + t(BE)^{T}(A+BC) + t(A+BC)^{T}(BE) + t^{2}(BE)^{T}(BE)\\ - (A+BC)^{T}(A+BC)\bigg]\\ =\lim_{t\rightarrow 0}\bigg[(BE)^{T}(A+BC) + (A+BC)^{T}(BE) + t(BE)^{T}(BE)\bigg]\\ \\ =(BE)^{T}(A+BC) + (A+BC)^{T}(BE), $$ i.e. $$\frac{\partial f(A,B,C)}{\partial C}(E) = (BE)^{T}(A+BC) + (A+BC)^{T}(BE) $$ which is similar to your answer just now the (3,1) matrix $E \in M_{3 \times 1}(\mathbb{R})$ sorts out the dimension problem you were having.

Best Answer

Related Solutions

Change from differentiation wrt to matrix to wrt to inverse of matrix for symmetric matrices

Derivative of matrix inverse with respect to a vector

Related Question