Solved – How to calculate Fisher criterion weights

classificationdiscriminant analysismachine learningself-study

I am studying pattern recognition and machine learning, and I ran into the following question.

Consider a two-class classification problem with equal prior class probability $$P(D_1)=P(D_2)= \frac{1}{2}$$

and the distribution of instances in each classes given by

$$ p(x|D_1)= {\cal N} \left( \begin{bmatrix} 0 \\0 \end{bmatrix},
\begin{bmatrix} 2 & 0 \\ 0 & 1 \end{bmatrix} \right),$$

$$ p(x|D_2)= {\cal N} \left( \begin{bmatrix} 4 \\ 4 \end{bmatrix},
\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \right).$$

How to calculate Fisher criterion weights?

Update 2: The calculated weight provided by my book is:
$W=\begin{bmatrix} \frac{-4}{3} \\ \frac{-2}{9} \end{bmatrix}$.

Update 3: As hinted by @xeon, I understand that I should determine the projection line for the Fisher’s discriminant.

Update 4: Let $W$ be the direction of the projection line, then the Fisher linear discriminant method finds that the best $W$ is the one for which the criterion function is maximized. The remaining challenge is how can we get numerically $W$ vector?

Best Answer

Following the paper you linked to (Mika et al., 1999), we have to find the $\mathbf{w}$ which maximizes the so called generalized Rayleigh quotient,

$$\frac{\mathbf{w}^\top \mathbf{S}_B \mathbf{w}}{\mathbf{w}^\top \mathbf{S}_W \mathbf{w}},$$

where for means $\mathbf{m}_1, \mathbf{m}_2$ and covariances $\mathbf{C}_1, \mathbf{C}_2$,

\begin{align} \mathbf{S}_B &= (\mathbf{m}_1 - \mathbf{m}_2)(\mathbf{m}_1 - \mathbf{m}_2)^\top, & \mathbf{S}_W &= \mathbf{C}_1 + \mathbf{C}_2. \end{align}

The solution can be found by solving the generalized eigenvalue problem \begin{align} \mathbf{S}_B\mathbf{w} = \lambda \mathbf{S}_W\mathbf{w}, \end{align} by first computing the eigenvalues $\lambda$ by solving \begin{align} \det(\mathbf{S}_B - \lambda \mathbf{S}_W) = 0 \end{align} and then solving for the eigenvector $\mathbf{w}$. In your case, $$\mathbf{S}_B - \lambda \mathbf{S}_W = \begin{pmatrix}16 - 3\lambda & 16 \\ 16 & 16 - 2\lambda\end{pmatrix}.$$ The determinant of this 2x2 matrix can be computed by hand.

The eigenvector with the largest eigenvalue maximizes the Rayleigh quotient. Instead of doing the calculations by hand, I solved the generalized eigenvalue problem in Python using scipy.linalg.eig and got $$w_1 \approx 0.5547, w_2 \approx 0.8321,$$ which is different from the solution you found in your book. Below I plotted the optimal hyperplane of the weight vector I found (black) and the hyerplane of the weight vector found in your book (red).

$\hskip1in$enter image description here