Solved – Why is the number of components in Linear Discriminant Analysis bounded by the number of classes

dimensionality reductiondiscriminant analysislinear modelpca

Resources about LDA usually say the number of components is bounded by the number of classes – 1. E.g, in the binary case, only one component can be found.

In LDA, the first discriminant direction $\Phi_1$ is calculated as argmax of $\frac{\Phi_1^T S_b \Phi_1}{\Phi_1^T S_w \Phi_1}$ where $S_b$ and $S_w$ are the between-class and within-class covariance matrices, respectively. Why can't we continue this way and compute the $i$th direction $\Phi_i$ to be the argmax of $\frac{\Phi_i^T S_b \Phi_i}{\Phi_i^T S_w \Phi_i}$ under the constraint of orthogonality to $\Phi_1, \Phi_2 \dots \Phi_{i-1}$, as is done in PCA?

Ostenbily, in the binary case, where each $\Phi_i$ is a vector, one can do it $n$ times, if the inputs are $n$ dimensional vectors.

Best Answer

The rank of between-class scatter matrix $S_B$ for the whole data set is at most $c-1$. ($c$ is the number of classes.) The individual between-class scatter matrix $S_{Bi}$ for one class is at most $1$. The former matrix is the weighted sum of the latter.

Since $rank(AB)\le{min(rank(A), rank(B))}$, you have $rank(S^{-1}_WS_B)\le{rank(S_B)}\le{c-1}$

Related Question