Solved – Intuitive explanation of the F-statistic formula

anovaf statisticintuition

Introduction to Statistical Learning defines the F-statistic as follows:
$$\frac{(TSS – RSS) / p}{RSS / (n – p – 1)}$$
I am trying to interpret this formula intuitively – the numerator looks like the $ESS$ per regressor, and the denominator looks like the $RSS$ per observation. This does not seem like an apples to apples comparison; can anyone explain why it makes sense?

They also say that the expectation of the denominator is equal to the variance of the irreducible error if the linear model assumptions hold (I get this, as the denominator is really the Residual Standard Error, which is an unbiased estimator). They also say the numerator is equal to the variance of the irreducible error if the null hypothesis is true. Therefore the F-statistic will be close to 1 if the null hypothesis is true.

But if you let p = 1, that is, apply the F-statistic to single linear regression, it becomes:

$$\frac{TSS – RSS}{RSS / (n-2)}$$

According to the book, if the regressor has no explanatory power, the F-statistic should be close to 1. But if you imagine a data set where the coefficient on X is 0 (i.e. no explanatory power), $TSS$ will be equal to $RSS$, so the numerator and hence the F-statistic should be 0, not 1 as they claim. What is going on?

Furthermore, if you accept that the F-statistic is 1, as they claim, then $(TSS – RSS) = RSS / (n-2)$, which means $ESS = RSS / (n-2)$. If you think of the F-statistics as comparing explained vs. unexplained variation, this does not seem like a fair comparison since this decomposition makes it the ratio of TOTAL explained variation summed across ALL observations vs. the unexplained variation PER observation. Again, what am I missing?

I am just trying to make sense of it in a layman way, apologies if I am missing something obvious.

Best Answer

Note that if there were no population effect (the population means were identical at every combination of the regressors), there would still be some estimated effect -- the RegressionSS would be nonzero -- it would tend to increase if the error variance increased, or if you added more regressors.

Indeed, if there were no effects, you could estimate $\sigma^2$ from the Regression sum of squares; $\hat{\sigma}^2=RSS/p$. So under the null hypothesis we're taking the ratio of two independent estimates of $\sigma^2$, and in that case (under the assumption of iid normal errors), the ratio turns out to have an F-distribution. However, if there are any effects, then the estimator based on the Regression SS will tend to be "too large" - there's an additional term in the variance estimate from the vatiation in conditional population means. So when $H_0$ is false the test statistic will tend to fall more often into the upper tail of the null distribution than when the null hypothesis is true - that's why it makes intuitive sense to use F-tests in this situation.