Solved – Neural networks vs support vector machines: are the second definitely superior

machine learningneural networkssvm

Many authors of papers I read affirm SVMs is superior technique to face their regression/classification problem, aware that they couldn't get similar results through NNs. Often the comparison states that

SVMs, instead of NNs,

  • Have a strong founding theory
  • Reach the global optimum due to quadratic programming
  • Have no issue for choosing a proper number of parameters
  • Are less prone to overfitting
  • Needs less memory to store the predictive model
  • Yield more readable results and a geometrical interpretation

Is it seriously a broadly accepted thought? Don't quote No-Free Lunch Theorem or similar statements, my question is about practical usage of those techniques.

On the other side, which kind of abstract problem you definitely would face with NN?

Best Answer

It is a matter of trade-offs. SVMs are in right now, NNs used to be in. You'll find a rising number of papers that claim Random Forests, Probabilistic Graphic Models or Nonparametric Bayesian methods are in. Someone should publish a forecasting model in the Annals of Improbable Research on what models will be considered hip.

Having said that for many famously difficult supervised problems the best performing single models are some type of NN, some type of SVMs or a problem specific stochastic gradient descent method implemented using signal processing methods.


Pros of NN:

  • They are extremely flexible in the types of data they can support. NNs do a decent job at learning the important features from basically any data structure, without having to manually derive features.
  • NN still benefit from feature engineering, e.g. you should have an area feature if you have a length and width. The model will perform better for the same computational effort.

  • Most of supervised machine learning requires you to have your data structured in a observations by features matrix, with the labels as a vector of length observations. This restriction is not necessary with NN. There is fantastic work with structured SVM, but it is unlikely it will ever be as flexible as NNs.


Pros of SVM:

  • Fewer hyperparameters. Generally SVMs require less grid-searching to get a reasonably accurate model. SVM with a RBF kernel usually performs quite well.

  • Global optimum guaranteed.


Cons of NN and SVM:

  • For most purposes they are both black boxes. There is some research on interpreting SVMs, but I doubt it will ever be as intuitive as GLMs. This is a serious problem in some problem domains.
  • If you're going to accept a black box then you can usually squeeze out quite a bit more accuracy by bagging/stacking/boosting many many models with different trade-offs.

    • Random forests are attractive because they can produce out-of-bag predictions(leave-one-out predictions) with no extra effort, they are very interpretable, they have an good bias-variance trade-off(great for bagging models) and they are relatively robust to selection bias. Stupidly simple to write a parallel implementation of.

    • Probabilistic graphical models are attractive because they can incorporate domain-specific-knowledge directly into the model and are interpretable in this regard.

    • Nonparametric(or really extremely parametric) Bayesian methods are attractive because they produce confidence intervals directly. They perform very well on small sample sizes and very well on large sample sizes. Stupidly simple to write a linear algebra implementation of.