Compute derivative of log-determinant of matrix with respect to a vector

derivativeslinear algebramatricesreal-analysis

I have a problem to solve:

Firstly, I have totally $N$ observations $W_i\in\mathbb{R}^{D\times D}$, including $N_1$ positive class samples and $N_0$ negative class samples.

Then, there is an arbitrary vector $b\in\mathbb{R}^{D}$. I compute $W_i b\in\mathbb{R}^{D}$, and get $N$ vectors totally. Mean of the vectors can be computed as $$\frac{1}{N}\sum\limits_{i = 1}^N {{{{W}}_i}{{b}} = } {{\bar Wb}}.$$ With respect to all, positive and negative class samples, the mean vectors are $\bar Wb$, $\bar W_0b$, $\bar W_1b$.

Next, we also compute three empirical covariance matrices $S$, $S_0$, $S_1$, following the formula as below: $$\begin{array}{l}
{{S}} = \frac{1}{N}\sum\limits_{i = 1}^N {\left( {{{{W}}_i}{{b}} – {{\bar Wb}}} \right)} {\left( {{{{W}}_i}{{b}} – {{\bar Wb}}} \right)^T}\\
{\rm{ = }}\frac{1}{N}\sum\limits_{i = 1}^N {\left( {{{{W}}_i} – {{\bar W}}} \right){{b}}{{{b}}^T}} {\left( {{{{W}}_i} – {{\bar W}}} \right)^T}
\end{array}.$$

Finally, we compute $$M = {\log}\frac{{det(S)}}{{\prod\limits_{class} {{{ {det({S}_{class})}}^{\frac{{{N_{class}}}}{N}}}} }},$$

My question is: how to compute $\frac{{\partial M}}{{\partial b}}$.

I have already done some tests. It is easy to prove that $M$ doesn't change when $b$ is multiplied by a constant. So I sampled some points $b$ on the unit sphere (3 dimensions for easier to visualization), calculate $M$, and get the result

It seems that $M$ is a constant with respect to arbitrary $b$, if my code is right.

I would like to know why it is happen.

This problem may take some patience to solve, thanks in advance.

Best Answer

Define $$\eqalign{ x_k &= W_kb \quad &X = \left[\matrix{x_1&x_2 \ldots x_n}\right] \\ e_k &\in{\mathbb R}^{n\times 1} \quad &\big({\rm standard\, basis\, vectors}\big) \\ J &= \sum_{i=1}^n\sum_{k=1}^ne_ie_k^T \quad &\big({\rm all\, ones\, matrix}\big) \\ C &= I-\tfrac{1}{n}J \quad &\big({\rm centering\, matrix}\big) \\ S &= \tfrac{1}{n}XCX^T \quad &\big({\rm covariance\, matrix}\big) \\ \lambda &= n\log\det S \\ }$$ Assume that the observations are arranged such that the two classes are simple contiguous block-partitions $(n=n_1+n_2)$ of the above matrices. $$\eqalign{ X_1 &= X\left[\matrix{I_1\\0}\right] = XP_1 &= \left[\matrix{x_1\ldots x_{n_1}}\right] \\ X_2 &= X\left[\matrix{0\\I_2}\right] = XP_2 &= \left[\matrix{x_{n_1+1}\ldots x_{n}}\right] \\ X &= X\left[\matrix{I_1&0\\0&I_2}\right] &=\left[\matrix{x_1\ldots x_{n_1}&x_{n_1+1}\ldots x_n}\right]\\ C_1 &= I_1-\tfrac{1}{n_1}J_1,\qquad &C_2 = I_2-\tfrac{1}{n_2}J_2 \\ S_1 &= \tfrac{1}{n_1}X_1C_1X_1^T,\qquad &S_2 = \tfrac{1}{n_2}X_2C_2X_2^T \\ \lambda_1 &= n_1\log\det S_1,\qquad &\lambda_2 = n_2\log\det S_2 \\ }$$ Write the function of interest in terms of these new variables.
Then calculate the differential and gradient. $$\eqalign{ M &= \frac{1}{n}(\lambda - \lambda_1 - \lambda_2) \\ n\,dM &= d\lambda - d\lambda_1 - d\lambda_2 \\ &= (nS^{-1}:dS) - (n_1S_1^{-1}:dS_1) - (n_2S_2^{-1}:dS_2) \\ &= S^{-1}:2\operatorname{Sym}(dX\,CX^T) - (\ldots) \\ &= 2S^{-1}XC:dX - (\ldots) \\ dM &= \frac{2}{n}\left( S^{-1}XC:dX - S_1^{-1}X_1C_1:dX_1 - S_2^{-1}X_2C_2:dX_2 \right)\\ &= \frac{2}{n}\left( S^{-1}XC:dX - S_1^{-1}XP_1C_1:dX\,P_1 - S_2^{-1}XP_2C_2:dX\,P_2 \right)\\ &= \frac{2}{n}\left( S^{-1}XC - S_1^{-1}XP_1C_1P_1^T - S_2^{-1}XP_2C_2P_2^T \right):dX\\ &= A:dX \\ }$$ At this point, notice that $X$ can be expanded as a sum, and substituted $$\eqalign{ X &= \sum_{k=1}^nx_ke_k^T = \sum_{k=1}^nW_kbe_k^T \\ dX &= \sum_{k=1}^nW_k\,db\,e_k^T \\ dM &= A:\sum_{k=1}^nW_k\,db\,e_k^T \\ &= \left(\sum_{k=1}^nW_k^TAe_k\right):db \\ \frac{\partial M}{\partial b} &= \sum_{k=1}^nW_k^TAe_k \\ }$$ Given this expression for the gradient, it doesn't appear to be equal to zero.
Perhaps if the sum is expanded out and simplified, everything will cancel?

Note that a colon is used in some of the steps above as a product notation for the trace, i.e. $$\eqalign{ A:B = \operatorname{Tr}(A^TB) }$$ The Sym operation was also used at one point $$\operatorname{Sym}(A) = \frac{A+A^T}{2}$$

Related Question