Solved – Cross validation : hyper-parameter tuning ? or model validation

cross-validation

I have been seaching internet for exact definition of cross validation . I have come acrossed a few different ideas, with different terminology. I don't know if I have understand correctly.

Basically, what I understand now is, there are two major applications of cross-validation.

  1. Hyper-parameter tuning. Lasso has a parameter $\lambda$ . We don't know which $\lambda$ we should use. So we split the into training set and testing set. Try different $\lambda$ with these 'sub-problems' and see which $\lambda$ gives the best performance.

  2. Model validation. Imaging that I have implemented both Lasso and Gradient Boosted Regression Tree. I want to know which one would work better in real-life ( predicting new , unseen data). So I split the data into training/testing parts. I will choose the one that yields better out-of-sample performance in cross validation.

Is my understanding correct?

Thanks

Best Answer

I'd say mostly the first (i.e. Hyper-parameter tuning).

If you have a sufficiently large hold-out test set you can evaluate the models pretty reliably. When wanting to select hyperparameters, having a validation set could cause your model to overfit on that. CV makes it much harder to do so.