I heard Kernel Logistic Regression is a classical combination of kernel methods and Logistic regression, but I cannot find any major reference (book, or paper) on this topic. Can you give me any suggestions? Thanks.
Solved – Kernel logistic regression
kernel tricklogisticmachine learningsvm
Related Solutions
There is a book on "Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach" by Green and Silverman, that is probably a good start, but my copy is in my office, so I can't get it until the new year. Essentially IIRC, the link between kernel methods and smoothing splines is that a regularisation term is used that penalises particular properties of the function implemented by the model, commonly (as the name suggests) the roughness (as measured by the second derivative or curvature). For kernel methods, the regularisation operator depends on the choice of kernel, but the regularisation operator is not dependent on the particular sample of data, which is the connection with splines rather than other non-parametric models. Smoothing splines can be used with more or less any loss function (including the logistic loss), just as kernel methods can use more or less any (convex) loss. Often the squared error or hinge losses are not the best ones, but they get most of the attention, and you can sometimes incorporate useful expert knowledge about the task via the loss just as in GLMs. Hopefully I can give a better answer once I have a chance to refer to my books!
P.S. there is a paper on regularisation networks by Poggio and Girosi that may well be quite relevant as well.
KLRs and SVMs
- Classification performance is almost identical in both cases.
- KLR can provide class probabilities whereas SVM is a deterministic classifier.
- KLR has a natural extension to multi-class classification whereas in SVM, there are multiple ways to extend it to multi-class classification (and it is still an area of research whether there is a version which has provably superior qualities over the others).
- Surprisingly or unsurprisingly, KLR also has optimal margin properties that the SVMs enjoy (well in the limit at least)!
Looking at the above it almost feels like kernel logistic regression is what you should be using. However, there are certain advantages that SVMs enjoy
- KLR is computationally more expensive than SVM - $O(N^3)$ vs $O(N^2k)$ where $k$ is the number of support vectors.
- The classifier in SVM is designed such that it is defined only in terms of the support vectors, whereas in KLR, the classifier is defined over all the points and not just the support vectors. This allows SVMs to enjoy some natural speed-ups (in terms of efficient code-writing) that is hard to achieve for KLR.
Best Answer
I've written a couple ;o)
G. C. Cawley and N. L. C. Talbot, Efficient approximate leave-one-out cross-validation for kernel logistic regression, Machine Learning, vol, 71, no. 2-3, pp. 243--264, June 2008.
Which gives a reasonable method for choosing kernel and regularisation parameters and an empirical evaluation
G. C. Cawley, G. J. Janacek and N. L. C. Talbot, Generalised kernel machines, in Proceedings of the IEEE/INNS International Joint Conference on Neural Networks (IJCNN-2007), pages 1732-1737, Orlando, Florida, USA, August 12-17, 2007.
Which basically documents a MATLAB toolbox for making kernel versions of generalised linear models with kernel logistic regression as one of the examples. The library includes code for model selection (but sadly no manual yet, just some demos)
However the earliest paper I know of that uses that particular name is "Kernel logistic regression and the import vector machine" by Zhu and Hastie, Advances in Neural Information Processing Systems (2001) (available via google scholar)