If we write $Z_i = \frac{O_i-np_i}{\sqrt{np_i}}$ as in the leture notes, the idea is that the vector $Z\to\mathcal N(0,\Sigma)$, a multivariate normal distribution, where
$$\Sigma=\text{Cov}(Z)=\begin{bmatrix}
1-p_1 & -\sqrt{p_1 p_2} & \cdots \\
-\sqrt{p_1 p_2} & 1-p_2 & \cdots \\
\vdots & \vdots & \ddots
\end{bmatrix}.$$
If we compute $\text{Det}(\Sigma-\lambda I)=(1-\lambda)^{n-1}\lambda$ we get that $\Sigma$ has $n-1$ eigenvalues that are 1 and one that is 0.
(The computation is made easy by the fact that $\Sigma=I-pp^T$ for $p=(\sqrt{p_1},\sqrt{p_2},\dots)$, and Sylvester's theorem.)
This means the distribution is really $n-1$ dimensional embedded in $n$ dimensions, and there is a rotation matrix $A$ that makes
$$A\Sigma A^T=\begin{bmatrix}
0 & 0 & 0 & \cdots \\
0 & 1 & 0 & \cdots \\
0 & 0 & 1 & \cdots \\
\vdots & \vdots & \vdots & \ddots
\end{bmatrix}.$$
Now let $X = AZ \sim N(0,A\Sigma A^T)$. Then $X$ is a vector $(0, X_1, X_2, \dots)$ of iid. $\mathcal N(0,1)$ gaussians.
The function $f(Z) = Z_1^2 + Z_2^2 + \dots$ is the norm $\|Z\|_2^2$, and hence it doesn't change when we rotate its argument. This means $f(Z) = f(AZ) = f(X) = 0^2 + X_1^2 + \dots$,
which is Chi-square distributed!
Pretty cool stuff. Thank you for pointing me towards this result :-)
You are talking about a 'power and sample size' computation for a balanced fixed effects one way ANOVA with $g=10$ groups and equal numbers $r$ of replications
in each group. The model is
$$ Y_{ij} = \mu + \alpha_i + e_{ij}; \text{ for } i = 1,\dots,g;\; j=1,\dots,r;$$
where $\sum_{i=1}^g \alpha_i = 0$ and $e_{ij} \stackrel {iid}{\sim} \mathsf{Norm}(0, \sigma).$
In practice one usually uses software for such computations.
Perhaps there is a formula in your text to find the power $\pi(\tau)$ of an
F-test at the 5% level against an alternative $\tau = \sum_{i=1}^g \alpha_i^2,$
for a given number or replications $r$ in each group. (Caution: details of the
notation differ among textbooks.)
The power is the probability of rejecting
$H_0$ given that the actual differences among the group means are reflected
by $\tau.$ The maximum difference $\delta$ to which you refer is the
largest discrepancy $|\alpha_i - \alpha_{i'}|,$ for $i \ne i'.$ Specifying
$\delta$ is equivalent to putting a cap on $\tau.$
Such formulas use a non-central F distribution,
which is not generally tabled, and so require software. To find $r$ that will
give a close approximation to the desired $\pi(\tau)$ typically requires some
iteration.
Below is output from Minitab's 'power and sample size' procedure for a one-way (one-factor) ANOVA design that matches your specifications. (SAS, R and other
statistical software packages have similar procedures.)
Power and Sample Size
One-way ANOVA
α = 0.05 Assumed standard deviation = 0.4
Factors: 1 Number of levels: 10
Maximum Sample Target
Difference Size Power Actual Power
2 3 0.8 0.964007
The sample size is for each level.
Notes: (a) This procedure
requires an estimate of parameter $\sigma$ of the model. You are supposed to
get this from the data shown. However, even though the fake data in
your 6-level experiment matches means for the original CR data, nothing
is said about matching variances. Because the data are in a picture file,
I did not take the time to find that exact estimate $\hat \sigma$ (often
called something like $s_e$ in computer printouts; the square root of MSE from
the ANOVA table). Instead, I am using $\hat \sigma = 0.4.$ (My guess, just
looking at the data.)
(b) You say that $\delta$ should be 2%; so I used $\delta = 2,$ assuming
that the numbers in the fake data table are percentages.
(c) I you would like results for some other $\sigma$ and $\delta$ and do not
have suitable software at hand, please leave a Comment, and I will run the
procedure again.
(d) If the number of replications can vary among groups, power computations
become more complicated, and simulation is commonly used.
(e) For completeness: In a random effects model, power computations use the (ordinary) F-distribution (not the non-central F). In such a model the parameters $\alpha_i$ are replaced by random variables $A_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma_A),$ and (roughly speaking) the purpose of the ANOVA is to determine whether $\sigma_A$ is significantly positive compared with $\sigma.$
Best Answer
This sounds like a good case for using Fisher's Exact Test. It sounds like you could assign your data into a 2x2 contingecy table. If that is the case, I would read up on the Tea Drinker Exp w/r Fisher's Test.
p-value = R1!*R2!*C1!*C2! / { (grand total)! times (factorials of internals) }
p-value = (6!*11!*7!*10!) / (17!*2!*4!*5!*6!)
One more thing, MH test is a more specific example of the the Fisher's Test when used with multiple confounding levels, for example: 2x2xk matrices. Also I would not rule out the chi-square test but it assumes that you data is normal and this can be done but the warning is that you must have a large data set for that assumption of normality. The Fisher's Exact Test does not depend on the assumption of normality, ie good for small data sets.