Generative Models – Generative vs. Discriminative

generative-modelsmachine learning

I know that generative means "based on $P(x,y)$" and discriminative means "based on $P(y|x)$," but I'm confused on several points:

  • Wikipedia (+ many other hits on the web) classify things like SVMs and decision trees as being discriminative. But these don't even have probabilistic interpretations. What does discriminative mean here? Has discriminative just come to mean anything that isn't generative?

  • Naive Bayes (NB) is generative because it captures $P(x|y)$ and $P(y)$, and thus you have $P(x,y)$ (as well as $P(y|x)$). Isn't it trivial to make, say, logistic regression (the poster boy of discriminative models) "generative" by simply computing $P(x)$ in a similar fashion (same independence assumption as NB, such that $P(x) = P(x_0) P(x_1) … P(x_d)$, where the MLE for $P(x_i)$ are just frequencies)?

  • I know that discriminative models tend to outperform generative ones. What's the practical use of working with generative models? Being able to generate/simulate data is cited, but when does this come up? I personally only have experience with regression, classification, collab. filtering over structured data, so are the uses irrelevant to me here? The "missing data" argument ($P(x_i|y)$ for missing $x_i$) seems to only give you an edge with training data (when you actually know $y$ and don't need to marginalize over $P(y)$ to get the relatively dumb $P(x_i)$ which you could've estimated directly anyway), and even then imputation is much more flexible (can predict based not just on $y$ but other $x_i$'s as well).

  • What's with the completely contradictory quotes from Wikipedia? "generative models are typically more flexible than discriminative models in expressing dependencies in complex learning tasks" vs. "discriminative models can generally express more complex relationships between the observed and target variables"

Related question that got me thinking about this.

Best Answer

The fundamental difference between discriminative models and generative models is:

  • Discriminative models learn the (hard or soft) boundary between classes
  • Generative models model the distribution of individual classes

To answer your direct questions:

  • SVMs (Support Vector Machines) and DTs (Decision Trees) are discriminative because they learn explicit boundaries between classes. SVM is a maximal margin classifier, meaning that it learns a decision boundary that maximizes the distance between samples of the two classes, given a kernel. The distance between a sample and the learned decision boundary can be used to make the SVM a "soft" classifier. DTs learn the decision boundary by recursively partitioning the space in a manner that maximizes the information gain (or another criterion).

  • It is possible to make a generative form of logistic regression in this manner. Note that you are not using the full generative model to make classification decisions, though.

  • There are a number of advantages generative models may offer, depending on the application. Say you are dealing with non-stationary distributions, where the online test data may be generated by different underlying distributions than the training data. It is typically more straightforward to detect distribution changes and update a generative model accordingly than do this for a decision boundary in an SVM, especially if the online updates need to be unsupervised. Discriminative models also do not generally function for outlier detection, though generative models generally do. What's best for a specific application should, of course, be evaluated based on the application.

  • (This quote is convoluted, but this is what I think it's trying to say) Generative models are typically specified as probabilistic graphical models, which offer rich representations of the independence relations in the dataset. Discriminative models do not offer such clear representations of relations between features and classes in the dataset. Instead of using resources to fully model each class, they focus on richly modeling the boundary between classes. Given the same amount of capacity (say, bits in a computer program executing the model), a discriminative model thus may yield more complex representations of this boundary than a generative model.

Related Question