Solved – How do Shrinkage Methods change flexibility of a model

lassoregularizationridge regression

While working through An Introduction to Statistical Learning, I had difficulty clarifying how flexibility relates to Ridge Regression and Lasso. I recognize that both impose penalties on the coefficient estimates using either L1 or L2 norms, but how do they ultimately affect the flexibility of the model?

Best Answer

LASSO and ridge regression are typically written in the Lagrangian form, with a penalty on the $l_1$ or squared $l_2$ norms. But, there's an equivalent form with a constraint on the norms instead of a penalty. For ridge regression:

$$\underset{w}{\min} \|y - Xw\|_2^2 \quad \text{s.t.} \quad \|w\|_2^2 \le c$$

For LASSO:

$$\underset{w}{\min} \|y - Xw\|_2^2 \quad \text{s.t.} \quad \|w\|_1 \le c$$

We can interpret these constraints geometrically, by thinking of the weight vector as a point in the space of all possible choices of weights. In the case of ridge regression, the constraint $\|w\|_2^2 \le c$ means that the weight vector $w$ is restricted to lie within a hypersphere of radius $\sqrt{c}$. Similarly, the $l_1$ constraint in LASSO means that the weights are restricted to lie within a polytope whose size scales with $c$ (the vertices of the polytope lie along the axes and $c$ gives the distance of each vertex from the origin).

Informally, a more flexible model is able to represent a wider variety of functional forms. Each form corresponds to a particular choice of parameters. For LASSO and ridge regression, we can see that the set of allowable parameters shrinks as we decrease $c$, because the hypersphere/polytope becomes smaller. Decreasing $c$ corresponds to increasing the penalty term (often called $\lambda$) in the Lagrangian form. Therefore, tightening the constraint (or increasing the penalty) corresponds to decreasing the model flexibility.