Solved – the difference between SVM and LDA

classification

What is the difference between Support Vector Machines and Linear Discriminant Analysis?

Best Answer

LDA: Assumes: data is Normally distributed. All groups are identically distributed, in case the groups have different covariance matrices, LDA becomes Quadratic Discriminant Analysis. LDA is the best discriminator available in case all assumptions are actually met. QDA, by the way, is a non-linear classifier.

SVM: Generalizes the Optimally Separating Hyperplane(OSH). OSH assumes that all groups are totally separable, SVM makes use of a 'slack variable' that allows a certain amount of overlap between the groups. SVM makes no assumptions about the data at all, meaning it is a very flexible method. The flexibility on the other hand often makes it more difficult to interpret the results from a SVM classifier, compared to LDA.

SVM classification is an optimization problem, LDA has an analytical solution. The optimization problem for the SVM has a dual and a primal formulation that allows the user to optimize over either the number of data points or the number of variables, depending on which method is the most computationally feasible. SVM can also make use of kernels to transform the SVM classifier from a linear classifier into a non-linear classifier. Use your favorite search engine to search for 'SVM kernel trick' to see how SVM makes use of kernels to transform the parameter space.

LDA makes use of the entire data set to estimate covariance matrices and thus is somewhat prone to outliers. SVM is optimized over a subset of the data, which is those data points that lie on the separating margin. The data points used for optimization are called support vectors, because they determine how the SVM discriminate between groups, and thus support the classification.

As far as I know, SVM doesn't really discriminate well between more than two classes. An outlier robust alternative is to use logistic classification. LDA handles several classes well, as long as the assumptions are met. I believe, though (warning: terribly unsubstantiated claim) that several old benchmarks found that LDA usually perform quite well under a lot of circumstances and LDA/QDA are often goto methods in the initial analysis.

LDA can be used for feature selection when $p>n$ with sparse LDA: https://web.stanford.edu/~hastie/Papers/sda_resubm_daniela-final.pdf. SVM cannot perform feature selection.

In short: LDA and SVM have very little in common. Luckily, they are both tremendously useful.