Now, after repeating the steps 10 times, I will have 10 different optimized models.
yes.
Cross validation (like other resampling based validation methods) implicitly assumes that these models are at least equivalent in their predictions, so you are allowed to average/pool all those test results.
Usually there is a second, stronger assumption: that those 10 "surrogate models" are equvalent to the model built on all 100 cases:
To predict for an unknown dataset (200 points), should I use the model which gave me minimum error OR should I do step 2 once again on the full data (run grid.py on full data) and use it as model for prediction of unknowns?
Usually the latter is done (second assumption).
However, personally I would not do a grid optimization on the whole data again (though one can argue about that) but instead use cost and γ parameters that turned out to be a good choice from the 10 optimizations you did already (see below).
However, there are also so-called aggregated models (e.g. random forest aggregates decision trees), where all 10 models are used to obtain 10 predictions for each new sample, and then an aggregated prediction (e.g. majority vote for classification, average for regression) is used. Note that you validate those models by iterating the whole cross validation procedure with new random splits.
Here's a link to a recent question what such iterations are good for: Variance estimates in k-fold cross-validation
Also I would like to know, is the procedure same for other machine-learning methods (like ANN, Random Forest, etc.)
Yes, it can be applied very generally.
As you optimize each of the surrogate models, I recommend to look a bit closer into those results:
are the optimal cost and γ parameters stable (= equal or similar for all models)?
The difference between the error reported by the grid optimization and the test error you observe for the 10% unknown data is also important: if the difference is large, the models are likely to be overfit - particularly if the optimization reports very small error rates.
I think you can make such estimation. since different model use the same dataset, so the accuracy can be used to be compared. However, one important question, maybe, you need take the parameters of the models. whether these parameter will influence your conclusion.
Best Answer
You surely found the very similar question: Choice of K in K-fold cross-validation ?
(Including the link to Ron Kohavi's work)
If your sample size is already small I recommend avoiding any data driven optimization. Instead, restrict yourself to models where you can fix hyperparameters by your knowledge about model and application/data. This makes one of the validation/test levels unnecessary, leaving more of your few cases for training of the surrogate models in the remaining cross validation.
IMHO, you anyways cannot afford very fancy models with that sample size. And almost certainly you cannot afford to do any meaningful model comparisons (for sure not unless you use proper scoring rules and paired analysis techniques).
This decision is far more important than the precise choice of $k$ (say, 5-fold vs. 10-fold) - with the important exception that leave one out is not recommended in general.
Interestingly, with these very-small-sample-size classification problems, validation is often more difficult (in terms of sample size needs) compared to training of a decent model. If you need any literature on this, see e.g. our paper on sample size planning:
Beleites, C. and Neugebauer, U. and Bocklitz, T. and Krafft, C. and Popp, J.: Sample size planning for classification models. Anal Chim Acta, 2013, 760, 25-33.
DOI: 10.1016/j.aca.2012.11.007
accepted manuscript on arXiv: 1211.1323
Another important point is to make good use of the possibility to iterate/repeat the cross validation (which is one of the reasons against LOO): this allows you to measure the stability of the predictions against perturbations (i.e. few different cases) of the training data.
Literature:
DOI: 10.1007/s00216-007-1818-6
DOI: 10.1016/j.chemolab.2009.07.016
If you decide for a single run on a hold-out test set (no iterations/repetitions),