Solved – Penalized Log-Likelihood – Logistic regression

logisticnonparametricregularization

Equation 5.30 of The Elements of Statistical Learning states that the penalized log-likelihood for a nonparametric logistic regression is:

\begin{align}
l(f;\lambda) &= \sum_{i=1}^{N}\big[y_i\log p(x_i) +(1-y_i)\log(1-p(x_i))\big] – \frac12\lambda \int{f''(t)}^2dt \\
&= \sum_{i=1}^{N}\big[y_if(x_i) – \log(1+e^{f(x_i)}\big] – \frac12\lambda \int{f''(t)}^2dt
\end{align}

I've read through Negative binomial log-likelihood in penalized regression and What is penalized logistic regression but I haven't come across this formulation yet.

I'm familiar with LASSO and ridge, but the penalized parameter is a vector of $\beta$ coefficients. In this non-parametric scheme, can someone explain why we penalize $\int{f''(t)}^2dt$ instead? I understand that for the non-parametric case, $f(x)$ acts similar to $\beta$ in the parametric case. But it's not obvious to me why this is how it's formulated. Note that

$$f(x) = \log\frac{P(Y = 1|X=x)}{P(Y=0|X=x)}$$

Best Answer

Per @seanv507's link to Wikipedia:

This formulation is based on the class of twice differentiable functions, and

The roughness penalty based on the second derivative is the most common in modern statistics literature, although the method can easily be adapted to penalties based on other derivatives.

This answers my question.