Solved – Using constant versus changing random seeds for cross validation hyperparam optimisation

cross-validationhyperparameteroptimization

If I understand things correctly, in a nested cross validation the inner cross validation is for optimising over the search space of hyperparams, and the outer loop is validating the accuracy the optimal hyperparams determined by the inner loop. i.e.

outer cross validation
- hyperparameter search
  - inner cross validation

We then have the choice of either reusing the same inner cross validation splits (i.e. the same random seed) or we can randomise (i.e. change the random seed) for each inner cross validation of every hyperparam vector candidate we investigate.

On the one hand, I can see that by keeping the seed the same we are only changing one variable (namely the chosen vector of hyperparam candidates), which makes the hyperparam optimisation easier. i.e. if both the hyperparams and data are changing, optimisation is having to optimise over more free variables.

On the other hand, if we randomise the folds for each hyperparam vector candidate there is less chance we will find a local minima / maxima due to the chance of an "unlucky" single choice of inner cross validation split producing a model that is optimal for that single split, but not for other possible splits.

How does the choice of randomise versus not randomise for the inner cross validation affect the hyperparameter vector search optimisation?

I suspect that the answer is very dependent on the size of the hyperparam search space (i.e. the difficulty of optimising the hyperparams) versus the distribution of the data (i.e. the probability of choosing a really "bad" split and wrongly concluding we've found the best vector of hyperparams).

Best Answer

If you search for a zillion hyperparameter combinations, you will start to overfit whatever you're testing those against.

Therefore, I'd be tempted to take one single train/validate split, do your hyperparameter search on that. And then, and only then, evaluate it against some other split or fold or test/validation data set.

Using random splits for each set of parameters sounds like something to avoid because:

as you say, for some hyperparameter options, you'll 'get lucky', jsut because of the train/validate split giving a randomly high score
you're basically overfitting to every possible train/validate split of your data, leaving you no novel splits to validate against

Related Solutions

Solved – Nested cross-validation for classification in MATLAB

The purpose of the outer cross-validation (CV) is to get an estimate of the classifier's performance on genuinely unseen data. If the hyperparameters are tuned based on a cross-validation statistic this can lead to a biased performance estimate and so an outer loop, which was not involved in any aspect of feature or model selection is needed to determine the performance estimate. Conversely if you do not tune the hyperparameters (and use default hyperparameters in SVM_train and SVM-classify) you do not need an outer cross-validation loop.

Here is an example of some code that will implement nested CV, this implementation uses Nelder-Mead optimization (NMO) and sequential forward feature selection in the inner loop to find the optimum feature set and hyperparameters (box-constraint (C) and RBF sigma).

Data are the data to be classified (Dimension: Cases x Features)

Labels are the class labels for each case

%************** Nested cross-validation ******************
Results = classperf(Labels, 'Positive', 1, 'Negative', 0);      % Initialize the classifier performance object
for i = 1:length(Labels)
    test = zeros(size(Labels));
    test(i) = 1; test = logical(test); train = ~test;
    disp(sprintf('Fold: %d of %d.\n',i,length(Labels)))

    %************** Perform feature selection ************
    z0 = [0,0];    % z=[rbf_sigma,boxconstraint] - set to default exp(z) = [0,0]
    [rbf_sigma_Acc(i) boxconstraint_Acc(i) maxAcc Features{i}] = SVM_NMO(z0,Data(train,:),Labels(train),num_folds);

    %***************** Outer loop CV *********************
    svmStruct = svmtrain(Data(train,Features{i}),Labels(train),'Kernel_Function','rbf','rbf_sigma',rbf_sigma_Acc(i),'boxconstraint',boxconstraint_Acc(i));
    class = svmclassify(svmStruct,Data(test,Features{i}));    % updates the CP object with the current classification results
    classperf(Results,class,test);
    Acc_fold(i) = Results.LastCorrectRate;    
    disp(sprintf('Test set Accuracy (Fold %d): %2.2f',i,Acc_fold(i)))
    disp(sprintf('Test set Accuracy (running mean): %2.2f\n',100*Results.CorrectRate))
end

function [rbf_sigma boxconstraint Acc Features_opt] = SVM_NMO(z0,Data,Labels,num_folds)
opts = optimset('TolX',1e-1,'TolFun',1e-1);
fun = @(z)SVM_min_fn(Data,Labels,exp(z(1)),exp(z(2)),num_folds);
[z_opt,Crit] = fminsearch(fun,z0,opts);
[~, Features_opt] = fun(z_opt);

%************ Get optimal results **************
Acc = 1 - Crit;                       % Accuracy for model  
rbf_sigma = exp(z_opt(1));
boxconstraint = exp(z_opt(2));
disp(sprintf('Max Acc: %2.2f, RBF sigma: %1.2f. Boxconstraint: %1.2f',Acc,rbf_sigma,boxconstraint))


function [Crit Features] = SVM_min_fn(Data,Labels,rbf_sigma,boxconstraint,num_folds)
direction = 'forward';
opts = statset('display','iter');
kernel = 'rbf';

disp(sprintf('RBF sigma: %1.4f. Boxconstraint: %1.4f',rbf_sigma,boxconstraint))
c = cvpartition(Labels,'k',num_folds);
opts = statset('display','iter','TolFun',1e-3);
fun = @(x_train,y_train,x_test,y_test)SVM_class_fun(x_train,y_train,x_test,y_test,kernel,rbf_sigma,boxconstraint);
[fs,history] = sequentialfs(fun,Data,Labels,'cv',c,'direction',direction,'options',opts);

Features = find(fs==1);        % Features selected for given sigma and C
[Crit,h] = min(history.Crit);  % Mean classification error

Hope this helps

Time Series Data – Choosing Inner Cross Validation Strategy for Modeling Time Series Data

I do not quite see why the time series cross validation (TSCV) technique/design should depend on whether it is used for training the model or for evaluating its performance. But perhaps I am ignorant of something?

One rather simple and easy-to-use TSCV technique is the use of rolling windows. If we have a sample of $T$ observations, we may estimate the model using a window of $T_1<T$ consecutive observations and test or evaluate the model's performance by examining how well the model predicts the subsequent one or more observations for each window. So if you have a sample of 100, you could take

1 though 70 as the first rolling window,
2 though 71 as the second rolling windows,
...,
30 through 99 as the last rolling window,

and assess the predictive accuracy for observations 71, 72, ..., 100, respectively. This is just an example, the proportions of training and testing as well as the forecast horizons could be varied. Rob J. Hyndman provides an illustration in his blog post "Time series cross-validation: an R example".

However, there are alternatives. For example, the standard standard $K$-fold CV may be sensible even for time series data in certain setups. This is discussed in detail in Bergmeir et al. "A Note on the Validity of Cross-Validation for Evaluating Time Series Prediction" (working paper).

Best Answer

Related Solutions

Solved – Nested cross-validation for classification in MATLAB

Time Series Data – Choosing Inner Cross Validation Strategy for Modeling Time Series Data

Related Question