Discriminant Analysis – Deriving Total (Within Class + Between Class) Scatter Matrix

discriminant analysis

I was fiddling with PCA and LDA methods and I am stuck at a point, I have a feeling that it is so simple that I can't see it.

Within-class ($S_W$) and between-class ($S_B$) scatter matrices are defined as:

$$
S_W = \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu_i)(x_t^i – \mu_i)^T
$$

$$
S_B = \sum_{i=1}^CN(\mu_i-\mu)(\mu_i-\mu)^T
$$

Total scatter matrix $S_T$ is given as:

$$
S_T = \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu)(x_t^i – \mu)^T = S_W + S_B
$$

where C is number of classes and N is number of samples $x$ are samples, $\mu_i$ is ith class mean, $\mu$ is overall mean.

While trying to derive $S_T$ I came up to a point where I had:

$$
(x-\mu_i)(\mu_i-\mu)^T + (\mu_i-\mu)(x-\mu_i)^T
$$

as a term. This needs to be zero, but why?


Indeed:

\begin{align}
S_T &= \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu)(x_t^i – \mu)^T \\
&= \sum_{i=1}^C\sum_{t=1}^N(x_t^i – \mu_i + \mu_i – \mu)(x_t^i – \mu_i + \mu_i – \mu)^T \\
&= S_W + S_B + \sum_{i=1}^C\sum_{t=1}^N\big[(x_t^i – \mu_i)(\mu_i – \mu)^T + (\mu_i – \mu)(x_t^i – \mu_i)^T\big]
\end{align}

Best Answer

If you assume

$$\frac{1}{N}\sum_{t=1}^Nx_t^{i}=\mu_i$$

Then

$$\sum_{i=1}^C\sum_{t=1}^N(x_t^i-\mu_i)(\mu_i-\mu)^T=\sum_{i=1}^C\left(\sum_{t=1}^N(x_t^i-\mu_i)\right)(\mu_i-\mu)^T=0$$

and formula holds. You deal with the second term in the similar way.