Regression Analysis – Understanding the Power of the Regression F Test

f distributionhypothesis testingnon-centralregressionstatistical-power

The classical F-test for subsets of variables in multilinear regression has the form
$$
F = \frac{(\mbox{SSE}(R) – \mbox{SSE}(B))/(df_R – df_B)}{\mbox{SSE}(B)/df_B},
$$
where $\mbox{SSE}(R)$ is the sum of squared errors under the 'reduced' model, which nests inside the 'big' model $B$, and $df$ are the degrees of freedom of the two models. Under the null hypothesis that the extra variables in the 'big' model have no linear explanatory power, the statistic is distributed as an F with $df_R – df_B$ and $df_B$ degrees of freedom.

What is the distribution, however, under the alternative? I assume it is a non-central F (I hope not doubly non-central), but I cannot find any reference on what exactly the non-centrality parameter is. I am going to guess it depends on the true regression coefficients $\beta$, and probably on the design matrix $X$, but beyond that I am not so sure.

Best Answer

The noncentrality parameter is $\delta^{2}$, the projection for the restricted model is $P_{r}$, $\beta$ is the vector of true parameters, $X$ is the design matrix for the unrestricted (true) model, $|| x ||$ is the norm:

$$ \delta^{2} = \frac{|| X \beta - P_{r} X \beta ||^{2}}{\sigma^{2}} $$

You can read the formula like this: $E(y | X) = X \beta$ is the vector of expected values conditional on the design matrix $X$. If you treat $X \beta$ as an empirical data vector $y$, then its projection onto the restricted model subspace is $P_{r} X \beta$, which gives you the prediction $\hat{y}$ from the restricted model for that "data". Consequently, $X \beta - P_{r} X \beta$ is analogous to $y - \hat{y}$ and gives you the error of that prediction. Hence $|| X \beta - P_{r} X \beta ||^{2}$ gives the sum of squares of that error. If the restricted model is true, then $X \beta$ already is within the subspace defined by $X_{r}$, and $P_{r} X \beta = X \beta$, such that the noncentrality parameter is $0$.

You should find this in Mardia, Kent & Bibby. (1980). Multivariate Analysis.