Solved – How does Fisher LDA work

classificationdiscriminant analysisintuition

Intuitively, how does Fisher LDA work? From this Linear discriminant analysis and Bayes rule: classification I completely understood the Bayesian approach but I'm not able to relate it to the Fisher's one described there.
What's the relation between the linear combination of predictor and posterior probability in Bayesian approach? Why do the coefficients' estimate have that form?

I know that this might be a trivial questions but I'm studying this on my own and my mathematical background is weak.

Best Answer

If you have two class each with data distributed with densities f$_1$ and f$_2$ then the Bayes rule that minimizes the expected classification error loss function with equal loss for each error selects class 1 for observation vector x if f$_1$(x)/f$_2$(x)>1 and selects class 2 otherwise. The LDA becomes this Bayes rule under special conditions on the multivariate distributions f$_1$ and f$_2$. Those conditions are that f$_1$ and f$_2$ have to both be multivariate normal distributions with the same covariance matrix and presumably different mean vectors.

The details of this can be found in any of the following three sources.

  1. Duda, Hart and Stork: Pattern Classification

  2. My book, Bootstrap Methods Chapter 2

  3. McLachlan's Discriminant Analysis book

I have previously given an explanation like this on the post you referenced. Is this clear to you now? Why was it not the first time?

I am adding to the answer because it is now clear that the OP is asking how to compute the LDA and not how it works in theory.

LDA creates a separating hyperplane. The hyperplane is defined as a linear combination of variables equal to a constant. f$_1$(x)/f$_2$(x)=1 defines the hyperplane. So you multiply the variables by their coefficients and sum. Then this sum is compared to the defined constant to determine which side of the hyperplane the vector x lies (i.e. whether or not f$_1$(x)>f$_2$(x) or not.

Related Question