I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it.
Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as:
$$
S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu_i)(x_t^i – \mu_i)^T
$$
$$
S_B = \sum_{i=1}^CN(\mu_i-\mu)(\mu_i-\mu)^T
$$
Total scatter matrix $S_T$ is given as:
$$
S_T = \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu)(x_t^i – \mu)^T = S_W + S_B
$$
where C is number of classes and N is number of samples $x$ are samples, $\mu_i$ is ith class mean, $\mu$ is overall mean.
While trying to derive $S_T$ I came up to a point where I had:
$$
(x-\mu_i)(\mu_i-\mu)^T + (\mu_i-\mu)(x-\mu_i)^T
$$
as a term. This needs to be zero, but why?
Indeed:
\begin{align}
S_T &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu)(x_t^i – \mu)^T \\
&= \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu_i + \mu_i – \mu)(x_t^i – \mu_i + \mu_i – \mu)^T \\
&= S_W + S_B + \sum_{i=1}^C\sum_{t=1}^N\big[(x_t^i – \mu_i)(\mu_i – \mu)^T + (\mu_i – \mu)(x_t^i – \mu_i)^T\big]
\end{align}
Best Answer
If you assume
$$\frac{1}{N}\sum_{t=1}^Nx_t^{i}=\mu_i$$
Then
$$\sum_{i=1}^C\sum_{t=1}^N(x_t^i-\mu_i)(\mu_i-\mu)^T=\sum_{i=1}^C\left(\sum_{t=1}^N(x_t^i-\mu_i)\right)(\mu_i-\mu)^T=0$$
and formula holds. You deal with the second term in the similar way.