Solved – Using the standard deviation in Cross Validation

boostingcross-validationhyperparametermachine learningscikit learn

I'm running a Grid Search to find the optimal parameters for xgboost via sklearn.

I can see that the grid search picks the set of parameters with lowest mean MSE.

The problem is that upon inspecting the standard deviations they are in the range of the mean, which suggests that there is not statistical significance between the choice of parameters.

Can a means difference test be ran on these means? How do we make sure there is a statistically significant difference between the means?

And I find it weird that these packages work picking the lowest mean without paying attention to the standard deviation. Do you know any package that looks into this issue?

Best Answer

"How do we make sure there is a statistically significant difference between the means?...these packages work picking the lowest mean without paying attention to the standard deviation..."

Searches for parameter values are looking for an optimum solution, based on specified criteria, in a particular learning/modeling task. There is no assurance that the parameter values at the optimum solution can be distinguished statistically from other sets of parameter values in the search space. One could argue that with finite data sets and smooth optimization criteria it would be impossible to find "significant" differences between the identified optimum values and sets of parameter values that are sufficiently close to that optimum.

What matters is how well the model based on the chosen parameter values performs for the purpose that you have in mind. You will be better served by focusing more on the significance of the model performance overall (via cross-validation, bootstrapping, etc.) and paying less attention to whether your optimum set of parameter values has a "statistically significant" difference from some other set of values.

Related Question