Solved – the difference between Informative (IVM) and Relevance (RVM) vector machines

bayesianmachine learningsvm

I'm trying to understand if there is any specific difference between Informative IVMs and Relevance RVMs other than the terminology. I've not seen anything explicit.

When I'm reading about vector machines, it is easy to see the difference of IVM/RVMs from Support (SVM) vector machines [colloquially, for classification, the SVM finds those points (vectors) that define the DMZ (de-militatized zone 😉 between the categories, while the RVM finds those that are the 'middle' of the crowd, and an associated crowd 'size' (e.g. in gaussian globs)], but I don't see any special difference between I/R vector machines beyond a choice of terminology by their proponents.

Is there a difference?

Best Answer

The RVM places an Automatic Relevance Determination (ARD) prior on the weights in a regularized regression/logistic regression setup. (The ARD prior is a just a weak gamma prior on the precision of a gaussian random variable). Marginalizing out the weights and maximizing the likelihood of the data with respect to the precision causes many of the precision parameters to become large, which would push the associated weights to zero. If you use feature vectors given by a design matrix, then this strategy selects a small set of examples that predict the target variable well.

The IVM strategy is fundamentally different from the RVM's strategy. The IVM is a Gaussian Process method that selects a small set of points from the training set using a greedy selection criterion (based on change in entropy of the posterior GP) and combines this strategy with standard GP regression/classification on the sparse set of points.

Unlike the SVM, for both the IVM and RVM there is not an obvious geometric interpretation of relevant or informative vectors. Basically, both of the algorithms find sparse (the SVM and IVM are dual sparse, but the RVM should probably be considered primal sparse) solutions for regression/classification problems but they use different approaches to do so.