I think you are describing nested cross validation and you can use it to select your best hyperparameters. R already has some packages implementing this, for example for support vector machines you could use the package e1071 and do something like this assuming you have two independent variables:
svmTuning <- tune.svm(Y~X1+X2.,type="nu-regression",kernel="radial", data = dat, gamma = seq(from=0,to=3,by=0.1), cost=seq(from=2,to=16,by=2),
tunecontrol= tune.control(sampling="cross",cross=1000))
If you had 1000 observations the previous would perform leave-one-out cross validation, sweeping through the possible combinations of selected gammas and costs (but only one kernel in this case). You can see the best parameters by doing:
svmTuning$best.parameters
I'm pretty sure the optimal is chosen using the mean squared error calculated based on the cross validation you chose (in the case of regression) and average classification error.
Here's another example with kernel k-nearest neighbours
knnTuning <- train.kknn(Y~X1+X2., data=dat, kmax = 40, distance = 2, kernel = c("rectangular", "triangular", "epanechnikov","gaussian", "rank", "optimal"),
ykernel = NULL, scale = TRUE,kcv=1000)
Which sweeps through all combinations of neighbors up to 40 and the different kernels but using the euclidean distance (distance=2). You may plot all these results and again obtain the best parameters:
plot(knnTuning)
knnTuning$best.parameters
You could do the same for random forest:
rfTuning <- tune.randomForest(Y~X1+X2, data = dat,ntree=1000, mtry=seq(from=2,to=10,by=1),
tunecontrol= tune.control(sampling="cross",cross=1000))
Where you just sweep through possible values for the amount of variables in the candidates for each split. This is known to overfit if not done carefully.
And so on and so forth. Since you appear to have a small sample size maybe leave-one-out is the way to go. Maybe you can also look into the caret package which has good capabilities for model building and the actual documentation is very solid (theoretical descriptions and all).
The most important downside for searching along single parameters instead of optimizing them all together is that you ignore interactions. It is quite common that e.g. more than one parameter influences model complexity. In that case, you need to look at the interaction in order to sucessfully optimize the hyperparameters.
Depending on how large your data set is and how many models you compare, optimization strategies that return the maximum observed performance run into trouble (true for both grid search and your strategy). The reason is that searching through a large number of performance estimates for the maximum "skims" the variance of the performance estimate: you may just end up with a model and train/test split combination that accidentally happens to look good. Even worse, you may get several perfect looking combinations, and the optimization then cannot know which model to choose and thus becomes unstable.
Best Answer
Many dedicated optimization methods exist for hyperparameter tuning. Sequential model based optimization (a Bayesian inspired method) is a particularly popular research topic, for instance here. Metaheuristic approaches like genetic algorithms, particle swarm optimization and simulated annealing are also common, see for instance here.
If you want to model the effect of hyperparameters, random search is a good sampling strategy to start from.
You can find implementations of such optimization methods in tuning libraries like Optunity, HyperOpt and Spearmint.