Solved – Is cross validation needed

cross-validationlogistic

Suppose we have training data set and a test data set. The outcome variable is binary. Is it usually necessary to split the training data set so that there is a cross validation data set? Or can you use the whole training data set to build a model and the use this model on the test data set? For logistic regression, for example, would cross validation really help? If so, what type would be best?

Best Answer

Cross validation has two purposes :

  • when you don't use cross validation and randomly select a part of data as train and other part as test, you may have a high accuracy in that part for train and test but when you select another train and test data you may have lower accuracy. Cross validation methods like n-fold cross validation or etc. will help to find best fit model based on your database. with lowest error on all parts of data.

  • In some cases cross validation will help to find some parameters of model like C in logistic regression that you can find some documentation about it in MATLAB help center or in R documentation files.

So as we discoursed cross validation has a critical rule to find a reliable model for your database. You should select best cross-validation technique based on your model structure and your sample size. 5-fold cross validation is a well known technique. You can increase the k in k-fold cross validation If you have more sample size.