Variance Analysis – How to Get F-Test P-Value

f-testhypothesis testingp-valuevariance

Suppose we can choose from two different catalysers. 10 observations are taken from the first one and 12 from the other one. If $s_1 = 14$ and $s_2 = 28$, can we reject at $\alpha = 5\%$ the hypothesis that the variances are equal?

Here is what the teacher did :

The ratio is : $s_1/s_2 = 0.5.$

Then

$$P(F_{n=9, m=11} \le 0.5) = 0.1538$$

Then he says : the p-value is $2 \times \min(0.1539; 0.8461) = 0.3074$ and he rejects $H_0$.

How do I get the 0.1538 ?

I think I can check a F-table for n=9, m=11 , but what do I do then to get the probability that this value is $\le 0.5$ ?

Best Answer

The first thing to notice is that since this is a variance test, you can have F's that are either large or small being significant, whereas often F tables assume you're doing ANOVA type calculations (where only large values of F can cause rejection).

So you need to make use of the fact that the lower tail of $F(\nu_1,\nu_2)$ is the same as the reciprocal of the upper tail of $F(\nu_2,\nu_1)$.

There's a little more discussion of that here

How do I tell which tail I am in? -- The median of an F-distribution in the cases you'll need to worry about for a variance test will be close to 1. So if the F-statistic is less than 1, assume you need the lower tail. If it's bigger than 1, assume you need the upper tail.

In the numerical example in your question, F=0.5 -- you want a lower tail for F.

So to find that, you need to swap the degrees of freedom, and the F-values will all be the inverses of the ones you need. Since you need the area below 0.5, it's the same as finding the area above 1/0.5 = 2 on an $F_{11,9}$.

So you need to worry first about the highest $\alpha$ you can find (0.1 in the indicated tables).

Since the tables you linked have df1 on the columns, you need to find the 11 column and the 9 row in this case.

You don't have an 11, so let's look at 10 and 12:

    ...     10       12
 ⁞
 9        2.41632   2.37888   

So how do you deal with the fact that there's no 11?

Well, first, notice that as long as df2 is at least 3 (and it will be for a variance test in an exam), the table of critical values decrease as either d.f. increases

So if were just getting a lower bound om the p-value, look at the next lower d.f. (i.e. compare with df1=10 in this case).

[For more accuracy see this post on interpolation, which discusses interpolation in degrees of freedom for the F toward the end. If your test is looming I doubt you have time to learn anything more than linear interpolation though. That suggests linear interpolation in the reciprocal of the degrees of freedom.]

The value at df1 10, df2= 9 is 2.41632 which is bigger than your 2. So you're nearer to 1 than the 0.1 value.

Which means that your lower-tailed p-value is >0.1


What if the problem was similar to the one in the question but the F was say $0.4$ instead of $0.5$?

1/0.4 = 2.5 which means it's further into the tail than the two 0.10 values above (2.41632, 2.37888). So the lower-tail p<0.10 .

Now compare with the 5% values. We see it's less than both the 12,9 and 10,9 values (which are both just above 3). So the lower tail p>0.05. So $0.05<p<0.10$.

What if the problem was similar to the one in the question but the F was in between the values for 10 and 12?

Now let's say the F ratio was 0.323.

This is between the 0.05 value for 10,9 and 12,9 d.f. - so is p<0.05 or >0.05 ?

Possibility 1: say it's approximately 0.05.

Possibility 2: is to say that it must at least the next smaller (p>0.025)

Possibility 3: use interpolation (but this time in the significance level, not the df), as described at the interpolation link I gave before. That suggests linear interpolation in $\log \alpha$.

Personally, if I were ever possessed to do an F-test of variances in practice*, yet somehow unable to access even a calculator (with which to do a quick numerical integration), I'd choose option 3. If I couldn't do that for some reason, I'd choose option 1. However, the expectations of the person marking it might well be option 2.

* if I'd been taking powerful hallucinogens, or had suffered severe head trauma, or some other incident somehow rendering me no longer able to appreciate what a really bad idea this would likely be.


Two tailed p-values

It appears that it's intended that you just double one tailed p-values to obtain two-tailed ones.

That's fine as far as it goes, so just stick with that, but for a discussion of some of the issues in more detail, see the discussion in the example at the end of the answer here

[May add some more detail later]

Related Question