Solved – Appropriate non-parametric post-hoc test for baseline comparisons

bonferronimachine learningpost-hocwilcoxon-signed-rank

I want to evaluate several "classifiers" (machine-learning algorithms) with paired samples. I do not want to compare each algorithms' performance to every other (n x m comparison) but only compare the performance of each algorithm to one baseline (n x 1 comparison).

An often quoted paper in the field [1] uses the Friedman test for omnibus testing and suggests the following for post-hoc tests:

When all classifiers are compared with a control classifier, we can
instead of the Nemenyi test use one of the general procedures for
controlling the family-wise error in multiple hypothesis test-ing,
such as the Bonferroni correction or similar procedures.

Can I thus use any test for the comparison of 2 groups with paired samples and apply a bonferroni (or less conservative) correction to the p-values? Is a Wilcoxon Signed-Rank test appropriate here?

[1] Demsar, J. (2006). Statistical Comparisons of Classifiers over Multiple Data Sets. Journal of Machine Learning Research, 7, 1–30.

Best Answer

You can use Friedman test (or the correction named Iman-Davenport) to the the ranking of the methods. Then for the post-hoc procedure, for a 1xn comparison, you use a post-hoc with a control method.

Nemenyi is valid but not reccommended because it is a very conservative procedure and many of the obvious differences may not be detected. It's better to use procedures more powerful, such us holm or hochberg. The most powerful method are li and finner [1].

[1] S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental Analysis of Power. Information Sciences 180 (2010) 2044–2064. doi:10.1016/j.ins.2009.12.010

Related Question