There is a recently proposed method to speed up grid search:
"Fast Cross validation via sequential analysis"
http://www.scribd.com/doc/76134034/Fast-Cross-Validation-Via-Sequential-Analysis-Talk
Basically, they're doing a normal grid search, but try to eliminate bad parameters early in the process and not waste too much computation on them. It's fairly new and I don't know independent evaluations of their method, but I'm currently implementing it and want to give it a try.
Now, after repeating the steps 10 times, I will have 10 different optimized models.
yes.
Cross validation (like other resampling based validation methods) implicitly assumes that these models are at least equivalent in their predictions, so you are allowed to average/pool all those test results.
Usually there is a second, stronger assumption: that those 10 "surrogate models" are equvalent to the model built on all 100 cases:
To predict for an unknown dataset (200 points), should I use the model which gave me minimum error OR should I do step 2 once again on the full data (run grid.py on full data) and use it as model for prediction of unknowns?
Usually the latter is done (second assumption).
However, personally I would not do a grid optimization on the whole data again (though one can argue about that) but instead use cost and γ parameters that turned out to be a good choice from the 10 optimizations you did already (see below).
However, there are also so-called aggregated models (e.g. random forest aggregates decision trees), where all 10 models are used to obtain 10 predictions for each new sample, and then an aggregated prediction (e.g. majority vote for classification, average for regression) is used. Note that you validate those models by iterating the whole cross validation procedure with new random splits.
Here's a link to a recent question what such iterations are good for: Variance estimates in k-fold cross-validation
Also I would like to know, is the procedure same for other machine-learning methods (like ANN, Random Forest, etc.)
Yes, it can be applied very generally.
As you optimize each of the surrogate models, I recommend to look a bit closer into those results:
are the optimal cost and γ parameters stable (= equal or similar for all models)?
The difference between the error reported by the grid optimization and the test error you observe for the 10% unknown data is also important: if the difference is large, the models are likely to be overfit - particularly if the optimization reports very small error rates.
Best Answer
It comes down to variance and bias (as usual). CV tends to be less biased but K-fold CV has fairly large variance. On the other hand, bootstrapping tends to drastically reduce the variance but gives more biased results (they tend to be pessimistic). Other bootstrapping methods have been adapted to deal with the bootstrap bias (such as the 632 and 632+ rules).
Two other approaches would be "Monte Carlo CV" aka "leave-group-out CV" which does many random splits of the data (sort of like mini-training and test splits). Variance is very low for this method and the bias isn't too bad if the percentage of data in the hold-out is low. Also, repeated CV does K-fold several times and averages the results similar to regular K-fold. I'm most partial to this since it keeps the low bias and reduces the variance.
Edit
For large sample sizes, the variance issues become less important and the computational part is more of an issues. I still would stick by repeated CV for small and large sample sizes.
Some relevant research is below (esp Kim and Molinaro).
References
Bengio, Y., & Grandvalet, Y. (2005). Bias in estimating the variance of k-fold cross-validation. Statistical modeling and analysis for complex data problems, 75–95.
Braga-Neto, U. M. (2004). Is cross-validation valid for small-sample microarray classification Bioinformatics, 20(3), 374–380. doi:10.1093/bioinformatics/btg419
Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 316–331.
Efron, B., & Tibshirani, R. (1997). Improvements on cross-validation: The. 632+ bootstrap method. Journal of the American Statistical Association, 548–560.
Furlanello, C., Merler, S., Chemini, C., & Rizzoli, A. (1997). An application of the bootstrap 632+ rule to ecological data. WIRN 97.
Jiang, W., & Simon, R. (2007). A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Statistics in Medicine, 26(29), 5320–5334.
Jonathan, P., Krzanowski, W., & McCarthy, W. (2000). On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing, 10(3), 209–229.
Kim, J.-H. (2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics and Data Analysis, 53(11), 3735–3745. doi:10.1016/j.csda.2009.04.009
Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. International Joint Conference on Artificial Intelligence, 14, 1137–1145.
Martin, J., & Hirschberg, D. (1996). Small sample statistics for classification error rates I: Error rate measurements.
Molinaro, A. M. (2005). Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15), 3301–3307. doi:10.1093/bioinformatics/bti499
Sauerbrei, W., & Schumacher1, M. (2000). Bootstrap and Cross-Validation to Assess Complexity of Data-Driven Regression Models. Medical Data Analysis, 26–28.
Tibshirani, RJ, & Tibshirani, R. (2009). A bias correction for the minimum error rate in cross-validation. Arxiv preprint arXiv:0908.2904.