F-Test Sample Size Effect – Understanding the Impact

f-testhypothesis testingp-valuesample-sizet-test

Having worked hard to understand the t-test, I'm rapidly falling out of love with it. All that is required in the t-test to gain significance is increase sample size, which renders it close to pointless IMO.

But what about the F-test, as used in ANOVA, linear regression etc? Variance is independent of sample size, so am I right in saying the significance of the P-value in the F-test is unaffected by sample size?

Best Answer

All that is required in the t-test to gain significance is increase sample size

The property you mention is effectively consistency (or rather, is what you'd expect to see given consistency under some commonly satisfied conditions). Consistency says that as $n\to\infty$, $P(\text{reject } H_0|H_0 \text{ false})\to 1$

All the tests you mention are consistent. In fact, I think most people* would regard inconsistency as a reason to reject a proposed test.

If you think consistency is a reason not to use a test, that suggests you should probably abandon hypothesis tests altogether, because you'll be hard pressed to find any other kind being regularly used, outside of some very specific situations.

* Not all, however. Some people are happy to use an inconsistent test, as long as the properties are reasonable at the sample size they're using it at. However, since they'd generally switch to another test once sample sizes became large enough that it was advantageous to do so, they're not actually avoiding power going to 1 as sample size goes to infinity.

--

Your comment suggests you're either using hypothesis tests in situations where a different tool would be better (which is quite often the case - hypothesis tests are vastly overused*), or perhaps that you don't really follow what's going on with significance tests.

You might find that in some situations confidence intervals or even just the estimates themselves do what you need. You may find in other cases equivalence tests come closer to what you want.

* as an example, if you look at questions here related to hypothesis tests of assumptions of other tests, you'll find that where I answer those questions I almost always advise against it -- because it doesn't answer the question of interest in that case.

--

Variance is independent of sample size,

This also tends to suggest you don't quite understand what's going on.

The population variances in a t-test don't change with sample size either. What gets smaller is the standard error of the difference in mean. The analogous situation occurs in regression and ANOVA. While the population variances of the observations don't change with sample size, the variances of the effects being measured decrease with $n$.

The numerator in an F-test will contain an estimate of variance that has two components - the variation between the population means (the thing that's zero under the null) and the variability of the sample means about their population means (which is a function of the variance in the error term and the sample sizes). The denominator only has the variance in the error term.

If the sample sizes in an ANOVA increase, the variation about the means will diminish but the variation between means will not. So if the means are unequal, as sample sizes become larger, the F-statistic will tend to become larger and larger.

Indeed, it's perfectly possible to cast the t-test you currently reject as an ANOVA F-test and as a test in regression (either as a t-test of a coefficient or as an F-test based on change in sums of squares)


Edit:

The equivalence of the t-test, one way ANOVA and regression on the group indicator is discussed here, but I'll try to motivate it a little further.

The difference is that the t-statistic puts into the denominator a scaling factor that the F test puts (the reciprocal of) into the numerator. Once you rearrange the t-statistic and square it, it's actually exactly the formula for the F.

Here's the t-statistic (I'll call this statistic $T$) for a two-sample t-test:

$T=\frac {\bar{x}-\bar{y}} {s_p\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}}$

Now rewrite it so the estimate of the error standard deviation is alone on the numerator:

$=\frac {(\bar{x}-\bar{y})\frac{1}{\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}}} {s_p}$

Nothing is different, it's the same statistic written a different way.

Now square it:

$T^2 =\frac {(\bar{x}-\bar{y})^2\frac{1}{\frac{1}{n_x}+\frac{1}{n_y}}} {s_p^2}$

I don't want to labor the point with a lot of algebra*, but the numerator is now the numerator of the F in a two-group ANOVA, while the denominator is the denominator of that F. In the F, the term that in the $t$-test scales $\hat{\sigma}$ to give the standard error of the difference in means is in the numerator, turning the squared difference in means into a mean square.

* (basically, you rewrite $(\bar{x}-\bar{y})^2$ in terms of a sum of squares of deviations from the overall mean and do a little manipulation to show the numerator there is the same as the treatment mean square; the denominator is more clearly the same. You might like to try doing the algebra for the equal-sample size case.)

At the above link my answer does an example using a t-test and regression on the sleep data in R. I'll include the data for anyone who wants to follow along; the data set is small enough you could even check everything on a calculator if you were so inclined:

> unstack(sleep[,1:2])
     X1   X2
1   0.7  1.9
2  -1.6  0.8
3  -0.2  1.1
4  -1.2  0.1
5  -0.1 -0.1
6   3.4  4.4
7   3.7  5.5
8   0.8  1.6
9   0.0  4.6
10  2.0  3.4

To expand on the equivalence some more, recall that rearranged t-statistic:

$T=\frac {(\bar{x}-\bar{y})\frac{1}{\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}}} {s_p}$

Here's the group means and sd's:

> with(sleep,tapply(extra,group,mean))
   1    2 
0.75 2.33 

> with(sleep,tapply(extra,group,sd))
       1        2 
1.789010 2.002249 

The sample sizes are both 10. So the numerator of the above $T$ is

${(0.75-2.33)\frac{1}{\sqrt{\frac{1}{10}+\frac{1}{10}}}}$

> (num=(0.75-2.33)*1/sqrt(1/10+1/10))
[1] -3.532987

The denominator, $s_p$ is

$\sqrt{\frac{(n_1-1)s_1^2+(n_2-1)s_2^2}{n_1+n_2-2}}=\sqrt{\frac{9s_1^2+9s_2^2}{18}}=\sqrt{\frac{s_1^2+s_2^2}{2}}$

> (denom=sqrt(sum(with(sleep,tapply(extra,group,sd))^2)/2))
[1] 1.898625

Is this rearranged form really the t-statistic? Let's check:

> (T=num/denom)
[1] -1.860813

(t.test gave t = -1.8608 as you can see at the linked post)

So now for the equivalence to F. Let's square the numerator and denominator:

> num^2;denom^2
[1] 12.482
[1] 3.604778

Now here's the one-way ANOVA. Look at the Mean Sq column:

> summary(aov(extra~group,sleep))
            Df Sum Sq Mean Sq F value Pr(>F)  
group        1  12.48  12.482   3.463 0.0792 
Residuals   18  64.89   3.605                 

Well how about that. Also:

> T^2
[1] 3.462627
Related Question