I've written a couple ;o)
G. C. Cawley and N. L. C. Talbot, Efficient approximate leave-one-out cross-validation for kernel logistic regression, Machine Learning, vol, 71, no. 2-3, pp. 243--264, June 2008.
Which gives a reasonable method for choosing kernel and regularisation parameters and an empirical evaluation
G. C. Cawley, G. J. Janacek and N. L. C. Talbot, Generalised kernel machines, in Proceedings of the IEEE/INNS International Joint Conference on Neural Networks (IJCNN-2007), pages 1732-1737, Orlando, Florida, USA, August 12-17, 2007.
Which basically documents a MATLAB toolbox for making kernel versions of generalised linear models with kernel logistic regression as one of the examples. The library includes code for model selection (but sadly no manual yet, just some demos)
However the earliest paper I know of that uses that particular name is
"Kernel logistic regression and the import vector machine" by Zhu and Hastie, Advances in Neural Information Processing Systems (2001) (available via google scholar)
The reference age is the age that your function cuts the x-axis (around 67, maybe, by eyeballing your graph). Whatever, let's say that it is 67. The odds ratio is the odds of the event (probability of the event divided by one minus the probability of the event), given the person's age divided by the odds of the event given the age=67:
\begin{equation}
\frac{\frac{P\{E|age\}}{1-P\{E|age\}}}{\frac{P\{E|67\}}{1-P\{E|67\}}}= \frac{exp(f(age))}{exp(f(67))}=exp(f(age))
\end{equation}
So, the odds ratio for an 18-year-old relative to a 67-year-old would be $exp(0.8)=2.22$, or the 18-year-old has odds of the event 222% as high as the 67-year-old.
If you don't want the reference age to be 67, then you can make it anything you like via subtraction. If you want the reference age to be 18, then just subtract 0.8 from the value of f(age) for every value of age. Then the odds ratio for a 27-year-old, compared to (now) an 18-year-old is $exp(0.39-0.80)=0.66$, or the 27-year-old has odds 66% as high as the 18-year-old to have the event.
Best Answer
There is a book on "Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach" by Green and Silverman, that is probably a good start, but my copy is in my office, so I can't get it until the new year. Essentially IIRC, the link between kernel methods and smoothing splines is that a regularisation term is used that penalises particular properties of the function implemented by the model, commonly (as the name suggests) the roughness (as measured by the second derivative or curvature). For kernel methods, the regularisation operator depends on the choice of kernel, but the regularisation operator is not dependent on the particular sample of data, which is the connection with splines rather than other non-parametric models. Smoothing splines can be used with more or less any loss function (including the logistic loss), just as kernel methods can use more or less any (convex) loss. Often the squared error or hinge losses are not the best ones, but they get most of the attention, and you can sometimes incorporate useful expert knowledge about the task via the loss just as in GLMs. Hopefully I can give a better answer once I have a chance to refer to my books!
P.S. there is a paper on regularisation networks by Poggio and Girosi that may well be quite relevant as well.