Solved – Linear discriminant analysis- generative or discriminative

discriminant analysisgenerative-models

According to this link LDA is a generative classifier. But the name itself has got the word 'discriminant'. Also, the motto of LDA is to model a discriminant function to classify. Then why is this a generative model?

Best Answer

Classification is the problem of assigning data samples $x \in X$ to classes $k \in G$. To solve classification, we need to model the probability of class conditional on data, $P(G=k \mid X=x)$. Discriminative models do just that, directly.

Generative models model the joint probability of data and class $P(G=k, X=x)$.

How are joints and conditionals related? By the equality $P(G=k, X=x) = P(G=k \mid X=x) P(X=x)$

In other words, the joint probability of data and class modeled by generative models, $P(G=k, X=x)$, can be factored into, on one hand, the probability of class conditional on data, $P(G=k \mid X=x)$, and a data model $P(X=x)$ on the other hand. The first factor is necessary and sufficient to solve the classification problem. The second factor, the data model, can be used to generate new samples (thus the name.)

By drawing samples from the data model, one is in fact producing synthetic data, or new data points with similar statistics to the original data used to fit the generative model. How similar the synthetic and real data are will depend on how well the model captures the data distribution$-$but that is beyond the scope of the question.

Why does LDA bother with data modeling when its aim is classification (discrimination)? Because it takes into account data structure (specifically, data covariance in addition to cluster means) when computing the decision rule, and does so in a way that a data model is obtained for free (since the data model is Gaussian, mean and covariance is all it takes).