Equation 5.30 of The Elements of Statistical Learning states that the penalized log-likelihood for a nonparametric logistic regression is:
\begin{align}
l(f;\lambda) &= \sum_{i=1}^{N}\big[y_i\log p(x_i) +(1-y_i)\log(1-p(x_i))\big] – \frac12\lambda \int{f''(t)}^2dt \\
&= \sum_{i=1}^{N}\big[y_if(x_i) – \log(1+e^{f(x_i)}\big] – \frac12\lambda \int{f''(t)}^2dt
\end{align}
I've read through Negative binomial log-likelihood in penalized regression and What is penalized logistic regression but I haven't come across this formulation yet.
I'm familiar with LASSO and ridge, but the penalized parameter is a vector of $\beta$ coefficients. In this non-parametric scheme, can someone explain why we penalize $\int{f''(t)}^2dt$ instead? I understand that for the non-parametric case, $f(x)$ acts similar to $\beta$ in the parametric case. But it's not obvious to me why this is how it's formulated. Note that
$$f(x) = \log\frac{P(Y = 1|X=x)}{P(Y=0|X=x)}$$
Best Answer
Per @seanv507's link to Wikipedia:
This formulation is based on the class of twice differentiable functions, and
This answers my question.