Solved – Selection of alpha in elastic net: overfitting

elastic netmachine learningoverfittingregularization

Earlier I asked whether grid fineness of $\lambda$ is related to overfitting in LASSO, ridge regression and elastic net models. I got an answer that it is not the case. Now I am asking,

Question: Is grid fineness of $\alpha$ in elastic net related to overfitting?
($\alpha$ is the parameter governing the balance between $L_1$ and $L_2$ penalty.)

The argumentation in the answer to the linked question goes like this:

we definitely want to optimize our penalized likelihood function for values $\lambda$, and it doesn't matter how many values of $\lambda$ we test, because out-of-sample performance for a fixed data set and fixed partitioning is entirely deterministic. More to the point, the out-of-sample metric is not at all altered by how many values $\lambda$ you test.

I would guess that the same applies to $\alpha$ in place of $\lambda$, and hence a finer grid can only help but not hurt. Is that right?

(A note may be due that when doing cross validation, I fix $\alpha$ first and then do a search over a grid of $\lambda$s.)

Some related questions are this, this and this.

Best Answer

It is true that the out-of-sample performance on a fixed data set is deterministic, the question is whether the performance is generalizable. And that is not the case if you optimize your model to perform best on that fixed test data set.

Whenenver you optimize hyperparameters (here lambda) by checking the out-of-sample performance on a specific data set, you can no longer use that data set to get unbiased performance estimates. This does not mean that you don't want to optimize the parameter, it just means you can't use the performance estimates.

But don't believe me blindly, better to run a simulation yourself.

In the case of the grid size, you won't make overfitting much worse by making it finer. The important thing is whether you tune $\lambda$.

Related Question