Solved – the connection between Kernel Logistic Regression and Smoothing Splines

kernel tricklogisticmachine learningsmoothingsvm

Working on probabilistic outputs of kernel methods I found the formulation of the SVM as a Penalized Method using the Binomial Deviance (described for example in "The Elements of Statistical Learning Theory, 2nd edition" by Hastie et al., pp 426–430). The model is called "Kernel Logistic Regression (KLR)". References on "The well studied KLR" lead to literature on smoothing splines, but KLR is never mentioned there.

What is the connection between KLR and Smoothing Splines?

Thank you in advance for your answer!

Best Answer

There is a book on "Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach" by Green and Silverman, that is probably a good start, but my copy is in my office, so I can't get it until the new year. Essentially IIRC, the link between kernel methods and smoothing splines is that a regularisation term is used that penalises particular properties of the function implemented by the model, commonly (as the name suggests) the roughness (as measured by the second derivative or curvature). For kernel methods, the regularisation operator depends on the choice of kernel, but the regularisation operator is not dependent on the particular sample of data, which is the connection with splines rather than other non-parametric models. Smoothing splines can be used with more or less any loss function (including the logistic loss), just as kernel methods can use more or less any (convex) loss. Often the squared error or hinge losses are not the best ones, but they get most of the attention, and you can sometimes incorporate useful expert knowledge about the task via the loss just as in GLMs. Hopefully I can give a better answer once I have a chance to refer to my books!

P.S. there is a paper on regularisation networks by Poggio and Girosi that may well be quite relevant as well.