Solved – Gaussian Process Kernel and Ridge Regression

gaussian processmachine learningridge regression

Can a Dual Ridge Regression produce the same prediction results as a Gaussian Process with a polynomial kernel $K(x,x')=(x^Tx'+1)^2$ in less time complexity (GP is $O(n^3)$ ) using Cholesky decomposition?

If yes what would the complexity of the Ridge Regression be with a kernel that produces the same results?

If not would an SVM model be suggested?

I want to achieve linear to the number features time complexity.

Best Answer

Cholesky decomposition also has a complexity of $O(n^3)$ operations, so it has the same computational scaling properties as Gaussian Process regression. If you want to reduce time complexity, you could investigate a sparse approximation of the least-squares support vector machine (LSSVM) which is equivalent to dual ridge regression, which I have found useful, see e.g.

Gavin C Cawley and Nicola LC Talbot, "Improved sparse least-squares support vector machines", Neurocomputing, volume 48, number 1, pages 1025-1031, 2002

and

Gavin C. Cawley and Nicola LC Talbot, "A greedy training algorithm for sparse least-squares support vector machines", International Conference on Artificial Neural Networks—ICANN 2002, pages 681 686, 2002

If greedy selection of representation vectors is too slow, simply picking a subset of the data at random tends to work quite well (as long as all of the training data appear in the loss function).

I am not very keen on support vector regression as the $\epsilon$-insensitive loss function doesn't have a straightforward probabilistic interpretation, which is often important in regression applications.

Related Question