Solved – Conceptual undersanding of linear discriminant analysis

classificationdiscriminant analysis

Can someone explain to a newbie the concepts of linear discriminant analysis? I am not looking for a technical implementation like this. I wish to understand it conceptually. I understand logistic regression and a little bit about naive Bayes classification, but cannot make any sense of LDA.

What problem does LDA solve that is not suited for Naive Bayes or logistic regression? More than two output categories?

Best Answer

1. LDA or logistic regression?

LDA and logistic regression can both be used to 'predict' the class of a subject, both can handle the case of more than two classes.

They both differ in the way of solving the classification problem and therefore they make different assumptions: logistic regression assumes the well-known S-shape, while LDA assumes that in each class your data are (1) multivariate normal and (2) with the same var-covar matrix in each class.

If the assumptions of multivariate normality and same var-covar are fulfilled then, in general, LDA will perform better.

2. Intuition behind LDA

You have 'subjects' that are characterized by features $x_1, x_2, \dots x_n$. The goal is to decide on the class of the subject, knowing the value of its features.

As said, LDA assumes that, in each class '$c$', your features have a multivariate distribution with a mean that depends on the class, so $\mu_c$ (note that this is a vector) and var-covar $\Sigma$ ( the same for all the classes), so for each class we know the multivariate density $\Phi_c(\mu_c,\Sigma)$ that allows us to calculate the probabilities.

Now, given the features, we can compute the $\Phi_c$ for the featurs $x_i$ and we will put the subject in that class $c$ where this yields the highest value (i.e. Where the 'probability' is heighest).

3. Why is LDA a dimension reduction technique?

LDA assumes multivariate normality in each class with the same var-covar. Therefore the classes are all the 'same' except for their mean. So you can 'feel' that the number of means will be important.

If you have $n$ features, then, in the end, the solution will depend on $C$ means, $C$ being the number of classes. In fact it can be show mathematically that LDA 'solves' the classification problem in a 'subspace' of the $n$-dimensional feature space, and that this subspace has dimension that is lower than $C$.

To make it more concrete, assume that you have subjects with $25$ features and you want to classify them in two classes, then you can 'solve' the problem in a one-dimesnional space (thus on a line). This is why LDA is said to be a dimension reduction technique, in this case it reduces the dimension from $25$ to one .

Related Question