Solved – Calculating power function for ANOVA

anovafunctionstatistical-power

There are several functions in R to calculate the power of a test, for example, the pwr-functions of the pwr package.

My question is how we get this function for ANOVA (pwr.anova.test)?

I know that
\begin{align}
1-\beta &= P(H1|H1) \\
&= P(F>F_{1-\alpha,k-1,N-k}) \\
&= 1-P(F\le F_{1-\alpha,k-1,N-k}) \\
&= 1-P(MQE/MQR\le F_{1-\alpha,k-1,N-k}) \\
~ \\
&= 1-P\bigg(n k \bigg(\frac{^{\sum(\bar{x}_{i.}-\bar{x}_{..})^2}/_{(k-1)}}
{^{\sum\sum(x_{ij}-\bar{x}_{i.})^2}/_{k(n-1)}} –
f^2\!+\!f^2\bigg) \le F_{1-\alpha,k-1,N-k}\bigg) \\
~ \\
~ \\
&= 1-P\bigg(nk\bigg(\frac{^{\sum (\bar{x}_{i.}-\bar{x}_{..})^2}/_{(k-1)}}
{^{\sum\sum(x_{ij}-\bar{x}_{i.}^2)}/_{k(n-1)}} – \frac{^{\sum(µ_i-µ)^2}/_k}{\sigma^2}\bigg)\le F_{1-\alpha,k-1,N-k}-nkf^2\bigg) \\
~ \\
&= ???…??? \\
&= 1-P(F_{k-1,N-k}\le F_{1-\alpha,k-1,N-k}-knf^2) \\
&=
\end{align}

  p.body =  pf(qf(\alpha, k-1, (n-1)*k, low=F), k-1, (n-1)k, k*n*f^2, low=F)

With $f^2$ the effect;
$knf^2$ the non-centrality parameter; and
$\sigma^2$ the error variance within groups;

But what does the function which has a $F_{k-1,N-k}$ distribution look like? It has to be the (variance between groups/df) / (variance within groups/df)$-knf^2$, but how can I write these so that I see the distribution? In my calculation I did not see it. I have never found the exact way how to get to this formula. Maybe you can help.

Best Answer

Power of F-tests for Gaussian linear models

General $F$-test

Any Gaussian linear model can be written $\boxed{Y=\mu+\sigma G}$ where $G$ has the standard normal distribution on $\mathbb{R}^n$ and $\mu$ is assumed to belong to a linear subspace $W$ of $\mathbb{R}^n$. Usually, the theory of Gaussian linear models treat them with the $Y=X\beta+\sigma G$ writing, corresponding to $W=\text{Im}(X)$. There are good reasons for that but the underlying geometry is clearer with the $\boxed{Y=\mu+\sigma G}$ treatment.

The so-called "ANOVA test" is a particular test of a F-test for a null hypothesis $H_0\colon\{\mu \in U\}$ where $U\subset W$ is a linear subspace. Actually the F-test exactly coincides with the likelihood-ratio test in this situation, and it is based on the Fisher statistic $$ F = \frac{{\Vert P_Z Y\Vert}^2/(m-\ell)}{{\Vert P_W^\perp Y\Vert}^2/(n-m)}, $$ where $Z=U^\perp \cap W$ is the orthogonal complement of $U$ in $W$, and denoting $m=\dim(W)$ and $\ell=\dim(U)$.

Obviously $\boxed{P_Z Y = P_Z \mu + \sigma P_Z G}$ and $\boxed{P_W^\perp Y = \sigma P_W^\perp G}$.

When $H_0\colon\{\mu \in U\}$ is true then $P_Z \mu = 0$ and therefore $$ F = \frac{{\Vert P_Z G\Vert}^2/(m-\ell)}{{\Vert P_W^\perp G\Vert}^2/(n-m)} \sim F_{m-\ell,n-m} $$ has the Fisher $F_{m-\ell,n-m}$ distribution. Consequently, from the classical relation between the Fisher distribution and the Beta distribution, $R^2 \sim {\cal B}(m-\ell, n-m)$.

In the general situation we have to deal with $P_Z Y = P_Z \mu + \sigma P_Z G$ when $P_Z\mu \neq 0$. In this general case one has ${\Vert P_Z Y\Vert}^2 \sim \sigma^2\chi^2_{m-\ell}(\lambda)$, where $\chi^2_{m-\ell}(\lambda)$ is the noncentral $\chi^2$ distribution with $m-\ell$ degrees of freedom and noncentrality parameter $\boxed{\lambda=\frac{{\Vert P_Z \mu\Vert}^2}{\sigma^2}}$, and then $\boxed{F \sim F_{m-\ell,n-m}(\lambda)}$ noncentral Fisher distribution. To compute $P_Z\mu$, note that $P_Z = P_W - P_U$ and $P_W\mu=\mu$, hence the only thing to compute is $P_U \mu$.

Power of the $F$-test

The power of the $F$-test depends on the significance level $\alpha$ which is chosen by the user. The critical value $c$ of the test is the value for which $\Pr_0(F>c)=1-\alpha$ where $\Pr_0$ denotes the probability under the null $H_0$. The power is then $\Pr(F>c)=1-\alpha$ where $\Pr$ denotes the probability under the unknown parameters. As we have seen, $\Pr$ only depends on $\lambda$, $n$, $m$ and $l$.

R code

The effect size in this situation is $\sqrt{\frac{\lambda}{n}}$, and it is more usual to take the effect size rather than the non-centrality parameter $\lambda$ as an input of the power. Thus I have written the following R function which returns the power using the effect size as an argument rather than the noncentrality parameter.

# alpha : significance level eff : effect size ; n : sample size ; m :
# number of parameters of the model ; l : number of parameters of the
# submodel H0 ;
Power <- function(alpha, eff, n, m, l) {
    df1 <- m - l
    df2 <- n - m
    c <- qf(1 - alpha, df1, df2)
    lambda <- eff^2 * n
    pow <- pf(c, df1, df2, ncp = lambda, lower.tail = FALSE)
    return(pow)
}
Example
Power(alpha = 5/100, eff = 0.5, n = 48, m = 5, l = 4)
## [1] 0.92303

The case of ANOVA

The pwr.anova.test() function of the pwr package calculates power of ANOVA for the balanced case only. For example, when there are $3$ groups, $4$ observations per group, the power for $\alpha=5\%$ and for an effect size of $0.5$ is

library(pwr)
pwr.anova.test(3, 4, 0.5, 5/100)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 3
##               n = 4
##               f = 0.5
##       sig.level = 0.05
##           power = 0.24088
## 
## NOTE: n is number in each group

With the general notations of the $F$-test, here $U$ is a one-dimensional subspace corresponding to the assumptions that all group means are equal. The space $W$ has the same dimension as there are groups, here $3$. Thus using my function one get the power as follows:

Power(alpha = 5/100, eff = 0.5, n = 12, m = 3, l = 1)
[1] 0.24088