Sample size recommended for a chi-square test for the variance

chi-squared-testsample-size

I have read that chi-square test for the variance requires the sample to be large enough. In this answer: p-value and equivalence testing, the chi-square test for the variance is suggested for a small sample. I have had a look on various websites but I don't seem to find this information. Any insight on what would be (in general) the required sample size to run a chi-square test for the variance?

Best Answer

If you are using a chi-squared test of $H_0: \sigma^2 = 64$ against $H_a: \sigma^2 > 64,$ then the sample size required depends on how much greater than $64$ is important to you. Let's say you want have probability $.90$ of detecting if the actual variance is $\sigma^2 = 100$ or more. That is, you want the 'power' of the test to be 90%. Is $n = 100$ observations enough?

Because $Q = \frac{(n-1)S^2}{\sigma^2} \sim \mathsf{Chisq}(\nu = n-1),$ we have use reject $H_0$ at the 5% level if $\frac{(n-1)S^2}{64} > c = 123.23$, where the critical value $c = 123.23$ cuts probability $0.05$ from the upper tail of $\mathsf{Chisq}(99).$

qchisq(.95, 99)
[1] 123.2252

Let's try a couple of normal samples of size $n = 100$ from populations with $\sigma^2 = 64$ to see what happens. [The value of the mean is irrelevant; I used 200.] Mostly, the values of the test statistic $Q$ are below $c=123,2252$ (as shown), but occasionally (not shown) the result exceeds $c,$ which should happen in 5% of the cases.

set.seed(1234)
x = rnorm(100, 200, sqrt(64)) 
99*var(x)/64
[1] 99.87417
x = rnorm(100, 200, sqrt(64))
99*var(x)/64
[1] 105.4757

By contrast, if $\sigma^2 = 100,$ then we should get $Q > c$ in most of the cases:

 set.seed(1235)
 x = rnorm(100, 200, sqrt(100))
 99*var(x)/64
[1] 170.5568
x = rnorm(100, 200, sqrt(100))
99*var(x)/64
[1] 185.5564
x = rnorm(100, 200, sqrt(100))
99*var(x)/64
[1] 155.7088

A simulation with a million samples of size $n=100$ shows that the power is above 90% $(0.9327 \pm 0.0005).$ So $n=100$ is enough.

set.seed(2021)
q.a = replicate( 10^6, 99*var(rnorm(100,200,sqrt(100))) /64 )
mean(q.a > c)
[1] 0.93268
2*sd(q.a > c)/1000
[1] 0.000501151

Notes: (1) Many statistical software programs include a 'power and sample size' procedure for such tests. And there are some online calculators (which I have not vetted). (2) There is a standard formula for a 95% confidence interval for $\sigma^2;$ you might use that to make an initial guess what $n$ is needed. (3) Also, simulation is not really required for power. Maybe you can use what I have shown to find a direct computation in R for the power.

Related Question