First: The first thing to decide in doing a permutation test for a one-way ANOVA is the 'metric' you are going to use to judge differences. You might pick the maximum difference in the sample means, the variance of the sample means, the standard F-statistic, and so on.
Here I will illustrate the standard F-statistic. The motivation for doing a permutation test would be that you doubt the assumptions for the standard ANOVA model: normal distributions or equal variances. Failure of these assumptions does not necessarily mean that the F-statistic is a bad way of measuring differences among means. Rather the problem is that the F-statistic may not have an F-distribution.
Second, for my example, I'll select the dimensions of the data. Suppose we have $g = 3$ treatment groups, with $n = 10$ replications per group.
Second, we need some data. I'll generate these in R. Since you know R, you'll know that the null hypothesis is true and that the assumptions of a standard ANOVA are met. That way we can compare the permutation distribution against the distribution F(2, 27) when we're done--as a reality check on the validity of the program. Obviously, you can retrieve my data by using the same seed I did. You can check that the three group means are 48.85053, 49.64549, and 48.83616.
The last line does the standard ANOVA obtaining F = 0.2778 and P-value 0.7596.
set.seed(1234)
x1 = rnorm(10, 50, 3); x2 = rnorm(10, 50, 3); x3 = rnorm(10, 50, 3)
Group = as.factor(rep(1:3, each=10)); Meas = c(x1, x2, x3);
anova(lm(Meas ~ Group))
Third, do the permutation test. Under the null hypothesis, it ought not to matter into which of the three groups each of the 30 observations falls. So the permutation test is done by randomly permuting the data vector 'Meas' and
finding the F-statistic for each permutation. In what follows, I will take
the lazy way out and use the R statements 'lm' and 'anova' find each of the
F-statistics. Note that the F-statistic can be retrieved as element [1,4] of the
output. This runs slowly because R formats the ANOVA table for each iteration,
wasting a lot of time.
(It would be much more efficient to write code to find F for each iteration,
and you should experiment with that and maybe use more iterations than I did.)
With the seed shown, my simulated permutation distribution of the F statistic
had 0.7667 its values above .2778 (the F-value for our data). This is close to the P-value 0.7596 obtained from the standard ANOVA. So our permutation distribution is giving nearly the same P-value as did F(2, 27).
You can also
use 'hist(f.stat, prob=T)' to make a histogram of the m values of 'f.stat',
and then use 'curve(df(x, 2, 27), n=1000, add=T)' to show that the
simulation distribution is very nearly F(2, 27). (Caution: Never use 'F' to
represent the F-statistic. In R, the name 'F' is reserved for 'FALSE' in
logical vectors; changing that can result is amazing malfunctions. Would you care to guess how I know this?)
set.seed(4321) # just so you can exactly replicate my simulation, if you like
m = 10000; f.stat = numeric(m)
for (i in 1:m) { perm.Meas = sample(Meas, 30)
f.stat[i] = anova(lm(perm.Meas ~ Group))[1,4] }
mean(f.stat > .2778) # P-val for your data; compare with .7596
Finally: Now to speed things up and perhaps use a larger m (for a balanced design only): You could put the
30 data values into a 3 x 10 matrix 'MAT' after permuting them. Then use
'rowMeans(Mat) to get the 3 group means and 'apply(MAT, 1, var)' to get
the 3 group variances. From there it is trivial arithmetic to get
'f.stat[i]'.
Of course the next steps would be to simulate data for which the null hypothesis
is not true and see if you get the right noncentral F distribution, and
then try using some real data.
Note: This is a simulated permutation test. To do a real permutation test
one would have to look at all the ways to put 30 observations into 3 groups of 10 and find the F-statistic for each of them (I guess something like $5.5 \times 10^{12}$)--a hopelessly formidable task. Instead we find the F-statistic for m of the possible arrangements and trust that to give
a good idea what the true permutation distribution is like. So, usually in practice,
except for the most trivial examples with only tiny datasets, 'permutation test' is an abbreviation for 'simulated permutation test'.
Addendum: You have requested the data. They were randomly generated in R using the code above. They appear below, unrounded in the first three columns and rounded to one place in the last three:
x1 x2 x3 x1 x2 x3
46.37880 48.56842 50.40226 46.4 48.6 50.4
50.83229 47.00484 48.52794 50.8 47.0 48.5
53.25332 47.67124 48.67836 53.3 47.7 48.7
42.96291 50.19338 51.37877 43.0 50.2 51.4
51.28737 52.87848 47.91884 51.3 52.9 47.9
51.51817 49.66914 45.65539 51.5 49.7 45.7
48.27578 48.46697 51.72427 48.3 48.5 51.7
48.36010 47.26641 46.92903 48.4 47.3 46.9
48.30664 47.48848 49.95459 48.3 47.5 50.0
47.32989 57.24751 47.19215 47.3 57.2 47.2
Best Answer
You are talking about a 'power and sample size' computation for a balanced fixed effects one way ANOVA with $g=10$ groups and equal numbers $r$ of replications in each group. The model is
$$ Y_{ij} = \mu + \alpha_i + e_{ij}; \text{ for } i = 1,\dots,g;\; j=1,\dots,r;$$ where $\sum_{i=1}^g \alpha_i = 0$ and $e_{ij} \stackrel {iid}{\sim} \mathsf{Norm}(0, \sigma).$
In practice one usually uses software for such computations. Perhaps there is a formula in your text to find the power $\pi(\tau)$ of an F-test at the 5% level against an alternative $\tau = \sum_{i=1}^g \alpha_i^2,$ for a given number or replications $r$ in each group. (Caution: details of the notation differ among textbooks.)
The power is the probability of rejecting $H_0$ given that the actual differences among the group means are reflected by $\tau.$ The maximum difference $\delta$ to which you refer is the largest discrepancy $|\alpha_i - \alpha_{i'}|,$ for $i \ne i'.$ Specifying $\delta$ is equivalent to putting a cap on $\tau.$
Such formulas use a non-central F distribution, which is not generally tabled, and so require software. To find $r$ that will give a close approximation to the desired $\pi(\tau)$ typically requires some iteration.
Below is output from Minitab's 'power and sample size' procedure for a one-way (one-factor) ANOVA design that matches your specifications. (SAS, R and other statistical software packages have similar procedures.)
Notes: (a) This procedure requires an estimate of parameter $\sigma$ of the model. You are supposed to get this from the data shown. However, even though the fake data in your 6-level experiment matches means for the original CR data, nothing is said about matching variances. Because the data are in a picture file, I did not take the time to find that exact estimate $\hat \sigma$ (often called something like $s_e$ in computer printouts; the square root of MSE from the ANOVA table). Instead, I am using $\hat \sigma = 0.4.$ (My guess, just looking at the data.)
(b) You say that $\delta$ should be 2%; so I used $\delta = 2,$ assuming that the numbers in the fake data table are percentages.
(c) I you would like results for some other $\sigma$ and $\delta$ and do not have suitable software at hand, please leave a Comment, and I will run the procedure again.
(d) If the number of replications can vary among groups, power computations become more complicated, and simulation is commonly used.
(e) For completeness: In a random effects model, power computations use the (ordinary) F-distribution (not the non-central F). In such a model the parameters $\alpha_i$ are replaced by random variables $A_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma_A),$ and (roughly speaking) the purpose of the ANOVA is to determine whether $\sigma_A$ is significantly positive compared with $\sigma.$