Discriminant Analysis – Differences and How to Use Each Version

classificationdiscriminant analysis

Can anybody explain differences and give specific examples how to use these three analyses?

  • LDA – Linear Discriminant Analysis
  • FDA – Fisher's Discriminant Analysis
  • QDA – Quadratic Discriminant Analysis

I searched everywhere, but couldn't find real examples with real values to see how these analyses are used and data calculated, only lots of formulas which are hard to understand without any real examples. As I tried to understand it was hard to distinguish which equations/formulas belonged to LDA and which to FDA.

For example let's say there is such data:

x1 x2 class
1  2  a
1  3  a
2  3  a
3  3  a
1  0  b
2  1  b
2  2  b

And let's say some testing data:

x1 x2
2  4
3  5
3  6

So how to use such data with all these three approaches? It would be best to see how to calculate everything by hand, not using some math package which calculates everything behind the scenes.

P.S. I only found this tutorial: http://people.revoledu.com/kardi/tutorial/LDA/LDA.html#LDA.
It shows how to use LDA.

Best Answer

"Fisher's Discriminant Analysis" is simply LDA in a situation of 2 classes. When there is only 2 classes computations by hand are feasible and the analysis is directly related to Multiple Regression. LDA is the direct extension of Fisher's idea on situation of any number of classes and uses matrix algebra devices (such as eigendecomposition) to compute it. So, the term "Fisher's Discriminant Analysis" can be seen as obsolete today. "Linear Discriminant analysis" should be used instead. See also. Discriminant analysis with 2+ classes (multi-class) is canonical by its algorithm (extracts dicriminants as canonical variates); rare term "Canonical Discriminant Analysis" usually stands simply for (multiclass) LDA therefore (or for LDA + QDA, omnibusly).

Fisher used what was then called "Fisher classification functions" to classify objects after the discriminant function has been computed. Nowadays, a more general Bayes' approach is used within LDA procedure to classify objects.

To your request for explanations of LDA I may send you to these my answers: extraction in LDA, classification in LDA, LDA among related procedures. Also this, this, this questions and answers.

Just like ANOVA requires an assumption of equal variances, LDA requires an assumption of equal variance-covariance matrices (between the input variables) of the classes. This assumption is important for classification stage of the analysis. If the matrices substantially differ, observations will tend to be assigned to the class where variability is greater. To overcome the problem, QDA was invented. QDA is a modification of LDA which allows for the above heterogeneity of classes' covariance matrices.

If you have the heterogeneity (as detected for example by Box's M test) and you don't have QDA at hand, you may still use LDA in the regime of using individual covariance matrices (rather than the pooled matrix) of the discriminants at classification. This partly solves the problem, though less effectively than in QDA, because - as just pointed - these are the matrices between the discriminants and not between the original variables (which matrices differed).

Let me leave analyzing your example data for yourself.


Reply to @zyxue's answer and comments

LDA is what you defined FDA is in your answer. LDA first extracts linear constructs (called discriminants) that maximize the between to within separation, and then uses those to perform (gaussian) classification. If (as you say) LDA were not tied with the task to extract the discriminants LDA would appear to be just a gaussian classifier, no name "LDA" would be needed at all.

It is that classification stage where LDA assumes both normality and variance-covariance homogeneity of classes. The extraction or "dimensionality reduction" stage of LDA assumes linearity and variance-covariance homogeneity, the two assumptions together make "linear separability" feasible. (We use single pooled $S_w$ matrix to produce discriminants which therefore have identity pooled within-class covariance matrix, that give us the right to apply the same set of discriminants to classify to all the classes. If all $S_w$s are same the said within-class covariances are all same, identity; that right to use them becomes absolute.)

Gaussian classifier (the second stage of LDA) uses Bayes rule to assign observations to classes by the discriminants. The same result can be accomplished via so called Fisher linear classification functions which utilizes original features directly. However, Bayes' approach based on discriminants is a little bit general in that it will allow to use separate class discriminant covariance matrices too, in addition to the default way to use one, the pooled one. Also, it will allow to base classification on a subset of discriminants.

When there are only two classes, both stages of LDA can be described together in a single pass because "latents extraction" and "observations classification" reduce then to the same task.