[Math] Determine the number of observations needed for the ANOVA test to succeed

statistical-inferencestatistics

I need help for part 3 of this question. I have done 2 parts but don't know how to start for part 3?

enter image description here

Here is part 3:

Consider planning a new experiment to compare 10 brands of margarine (an extension of Exercise 26). The experiment will have equal sample sizes for all 10 brands. The food scientist wants the ANOVA F-test to have at least an 80% chance of detecting a difference among the means, if the maximum difference is 2.0%. Use the estimates from the Exercise 26 data to estimate the population variance and standard deviation. How many observations should the scientist have from each brand to achieve these goals?

Best Answer

You are talking about a 'power and sample size' computation for a balanced fixed effects one way ANOVA with $g=10$ groups and equal numbers $r$ of replications in each group. The model is

$$ Y_{ij} = \mu + \alpha_i + e_{ij}; \text{ for } i = 1,\dots,g;\; j=1,\dots,r;$$ where $\sum_{i=1}^g \alpha_i = 0$ and $e_{ij} \stackrel {iid}{\sim} \mathsf{Norm}(0, \sigma).$

In practice one usually uses software for such computations. Perhaps there is a formula in your text to find the power $\pi(\tau)$ of an F-test at the 5% level against an alternative $\tau = \sum_{i=1}^g \alpha_i^2,$ for a given number or replications $r$ in each group. (Caution: details of the notation differ among textbooks.)

The power is the probability of rejecting $H_0$ given that the actual differences among the group means are reflected by $\tau.$ The maximum difference $\delta$ to which you refer is the largest discrepancy $|\alpha_i - \alpha_{i'}|,$ for $i \ne i'.$ Specifying $\delta$ is equivalent to putting a cap on $\tau.$

Such formulas use a non-central F distribution, which is not generally tabled, and so require software. To find $r$ that will give a close approximation to the desired $\pi(\tau)$ typically requires some iteration.

Below is output from Minitab's 'power and sample size' procedure for a one-way (one-factor) ANOVA design that matches your specifications. (SAS, R and other statistical software packages have similar procedures.)

Power and Sample Size 

One-way ANOVA

α = 0.05  Assumed standard deviation = 0.4

Factors: 1  Number of levels: 10


   Maximum     Sample  Target
   Difference    Size   Power  Actual Power
            2       3     0.8      0.964007

   The sample size is for each level.

enter image description here

Notes: (a) This procedure requires an estimate of parameter $\sigma$ of the model. You are supposed to get this from the data shown. However, even though the fake data in your 6-level experiment matches means for the original CR data, nothing is said about matching variances. Because the data are in a picture file, I did not take the time to find that exact estimate $\hat \sigma$ (often called something like $s_e$ in computer printouts; the square root of MSE from the ANOVA table). Instead, I am using $\hat \sigma = 0.4.$ (My guess, just looking at the data.)

(b) You say that $\delta$ should be 2%; so I used $\delta = 2,$ assuming that the numbers in the fake data table are percentages.

(c) I you would like results for some other $\sigma$ and $\delta$ and do not have suitable software at hand, please leave a Comment, and I will run the procedure again.

(d) If the number of replications can vary among groups, power computations become more complicated, and simulation is commonly used.

(e) For completeness: In a random effects model, power computations use the (ordinary) F-distribution (not the non-central F). In such a model the parameters $\alpha_i$ are replaced by random variables $A_i \stackrel{iid}{\sim} \mathsf{Norm}(0, \sigma_A),$ and (roughly speaking) the purpose of the ANOVA is to determine whether $\sigma_A$ is significantly positive compared with $\sigma.$

Related Question