Solved – When is simple train/test split better than cross-validation or train/validate/test

cross-validationmodel-evaluation

So, for the purpose of my master thesis I'm trying to predict pfofitabilty on times series data using Elastic net and XGBoost. I split the data 80/20 (50k instances, 3k+ features). I do not use cross validation (tried it, system crashed everytime) or a validation set. I train and tune the model on the training set and evaluate the performance on the test set.

As i have quite a lot of data I was wondering would my simple technique be appropriate or should I try to have an extra validation set for hyper parameter tuning? I would really appreciate any helpful answers or pointers to relevent papers.

Best Answer

It is generally a better practise to use cross-validation (e.g. 10-fold CV) that just a random split to your data. It would be even better if you could use CV and then test your model's performance on a completely independent validation test.You have enough instances to do the later.

Hope this helps.