Let's step back and look at what the data would look like. From what you describe, 3 algorithms (i.e. groups or treatments) and 10 datasets (i.e. subjects). In this case, you have a a within-subjects design (i.e. repeated measures) with one factor. One way to represent this is like this:
set.seed(123)
df <- data.frame(dataset = rep(seq(10), 3),
algorithm = rep(c("ML1","ML2","ML3"), each=10),
Accuracy = runif(30))
> df
dataset algorithm Accuracy
1 1 ML1 0.28757752
2 2 ML1 0.78830514
3 3 ML1 0.40897692
4 4 ML1 0.88301740
5 5 ML1 0.94046728
6 6 ML1 0.04555650
7 7 ML1 0.52810549
8 8 ML1 0.89241904
9 9 ML1 0.55143501
10 10 ML1 0.45661474
11 1 ML2 0.95683335
12 2 ML2 0.45333416
13 3 ML2 0.67757064
14 4 ML2 0.57263340
15 5 ML2 0.10292468
16 6 ML2 0.89982497
17 7 ML2 0.24608773
18 8 ML2 0.04205953
19 9 ML2 0.32792072
20 10 ML2 0.95450365
21 1 ML3 0.88953932
22 2 ML3 0.69280341
23 3 ML3 0.64050681
24 4 ML3 0.99426978
25 5 ML3 0.65570580
26 6 ML3 0.70853047
27 7 ML3 0.54406602
28 8 ML3 0.59414202
29 9 ML3 0.28915974
30 10 ML3 0.14711365
You will typically see examples that have 'subject' as a label. In your case, your 'subjects' are 'datasets'. If you can assume normality, you would do repeated-measures ANOVA. However, you state you know the accuracies are not normally distributed and you naturally want a non-parametric method. Your dataset is also balanced (10 samples/group) so we can use the Friedman test (which essentially is a nonparametric repeated-measures ANOVA).
If you get a significant p-value from the test, you would do post-hoc analysis with a pairwise paired Wilcoxon test with some sort of correction (e.g. bonferroni, holm, etc.). You would not use Mann-Whitney because you have 'paired/repeated measures' data.
Lastly, you probably want the effect size any significant differences. This also would use the wilcoxon test. In R there is no function I can recall right now but the equation is very simple:
$$r=\frac{Z}{sqrt(N)}$$
Where Z is the Z-score and N is the sample size (between the two groups being compared). You can get this Z-score using the wilcoxsign_test
from the coin
package.
Using the above data, this can be done in R with the following. Please note, the above data was just randomly generated so there is no significance. This is just for demonstrating some code:
# Friedman Test
friedman.test(Accuracy ~ algorithm|dataset, data=df)
# Post-hoc tests with 'bonferroni correction'
with(df, pairwise.wilcox.test(Accuracy, algorithm, p.adj="bonferroni", paired=T))
# Get Z-score for calculating effect-size
library(coin)
with(df, wilcoxsign_test(Accuracy ~ factor(algorithm)|factor(dataset),
data=df[algorithm == "ML1" | algorithm == "ML2",]))
# Calculate effect size, in this case Z = -0.2548, two groups is 20 datasets
0.2548/sqrt(20)
The Kruskal-Wallis $H$ statistic is given by:
$$H=\frac{\frac{12\sum_{i=1}^{k}{n_{i}\left(\bar{R}_{i}-\bar{R}\right)^{2}}}{N\left(N+1\right)}}{1-\frac{\sum{T}}{N^{3}-N}}\text{, where:}$$
$k$ is the number of groups;
$N$ is the number of observations across all groups;
$n_{i}$ is the number of observations in the $i^{th}$ group;
$\bar{R}$ is the mean rank of all observations;
$\bar{R}_{i}$ is the rank sum of observations from the $i^{th}$ group (ranks are across observations from all groups); and
$T=t^{3}-t$ for each set of tied ranks, where $t$ is the number of ties in the set, and $\sum{T}$ is the sum of this quantity across all sets of tied ranks.
When there are no ties $T=0$, the denominator of $H$ simplifies to $1$.
For $N=50,000$ and a uniform distribution of ties across your eleven possible values the denominator of $H$ is approximately:
$$1-\frac{11\left(4545^3-45\right)}{50000^3-50000} \approx 0.9997$$
Assuming a highly skewed distribution of ties—say all but ten observations tied on a single value—the denominator of $H$ is approximately:
$$1-\frac{\left(49,990^3-49,990\right)}{50000^3-50000} \approx 0.0006$$
The most extreme case would be where all $N$ observations were tied on the same value, in which case the denominator of $H$ would simplify to $0$, and $H$ would thus be undefined.
Because the cubed term in $T$ can never be greater than $N^{3}$, I do not think it is possible to obtain a negative value of the denominator, and therefore not possible to obtain a negative value of $H$.
Conclusion:
- It is not possible to obtain a negative value of $H$ by adjusting for ties using Kruskal & Wallis formula for $H$ (Equation 1.2) and their adjustment for ties (Equation 1.3).
- Cubing a large $N$ might place one's software in the position of trying to calculate beyond its available precision, and numerical inconsistencies might thus result.
Kruskal, W. H. and Wallis, A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260):583–621.
Best Answer
For 8 input variables and 8 outcome variables, you need multivariate multiple regression or MANCOVA.
MANOVA is used in case of one input and multiple outcomes.