F-Statistic Distribution – Insights on Wald Test Applications

f-testwald test

On the Wikipedia page for the Wald test, there is a derivation for the test statistic under multiple hypotheses. The derivation ends in an F-distribution. I am curious about the final step in that derivation, repeated below.

Suppose $\hat{\theta}_n$ is a $P\times 1$ vector and $R$ is a $Q\times P$ matrix.

The null and alternative hypotheses are: $$H_0: R\theta =r\text{ with }H_1: R\theta \ne r$$

Suppose $$\sqrt{n}(\hat{\theta}_n-\theta)\rightarrow N(0,V) $$

Then, by Slutsky's theorem and the properties of the normal distribution, multiplying by $R$ has distribution

$$R\sqrt{n}(\hat{\theta}_n-\theta)\rightarrow N(0,RVR') $$

Recalling that a quadratic form of normal distribution has a chi-squared distribution:

$$\sqrt{n}(R\hat{\theta}_n-r)[RVR']^{-1}\sqrt{n}(R\hat{\theta}_n-r)\rightarrow \chi^2_Q $$

Rearranging $n$ gives

$$(R\hat{\theta}_n-r)[R(V/n)R']^{-1}(R\hat{\theta}_n-r)\rightarrow \chi^2_Q $$

The step that confuses me is next.

If the covaraince matrix, $V$, is not known but needs to be estimated and we have a consistent estimator, $\hat{V}_n\sim \chi^2_{n-P}$ of $V$, then by independnece of the covariance estimator and equation above, we have,

$$(R\hat{\theta}_n-r)[R(\hat{V}_n/n)R']^{-1}(R\hat{\theta}_n-r)\rightarrow F_{Q,n-P} $$

I am confused because $\hat{V}_n$ is a matrix. What does it mean for a matrix to be distributed as $\chi^2$ ? Also, how is independence applied to derive the final result?

I'm essentially just looking for a couple more details in this final step and greatly appreciate any help.

Best Answer

I will give it a try. Assume that the Gauss-Markov assumptions hold and we deal with a simple linear regression model: $$ \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta}+\boldsymbol{u} \quad, \boldsymbol{u} \vert \boldsymbol{X} \sim {\cal N}(\boldsymbol{0}_n, \sigma^2 \boldsymbol{I}_n) $$ You want to test $H_0: \boldsymbol{R}\boldsymbol{\beta}=\boldsymbol{r}$ where $\boldsymbol{R} \in \mathbb{R}^{n \times k}$ with $rank(\boldsymbol{R})=r$. The F-statistic is defined as $$ F= \frac{(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})^\top[\boldsymbol{R}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{R}^\top]^{-1}(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})/r}{\hat{\sigma}^2} $$ which is $F_{r,n-k}$ distributed under $H_0$. By definition, if $X_1 \sim \chi^2(d_1)$ and $X_2 \sim \chi^2(d_2)$, and $X_1$ and $X_2$ are independent, then $F=\frac{X_1/d_1}{X_2/d_2}$ is $F_{d_1,d_2}$ distributed.

Note that an unbiased and consistent estimator for the variance of the disturbances is given by $\hat{\sigma}^2=\frac{\boldsymbol{\hat{u}}^\top\boldsymbol{\hat{u}}}{n-k}$. So, we can write: \begin{align} F&=\frac{(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})^\top[\boldsymbol{R}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{R}^\top]^{-1}(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})/r}{\hat{\sigma}^2}\\ &=\frac{(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})^\top[\boldsymbol{R}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{R}^\top]^{-1}(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})/r}{\frac{\boldsymbol{\hat{u}}^\top\boldsymbol{\hat{u}}}{n-k}}\\ &=\frac{d/r}{q/(n-k)} \end{align} Where $d=(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})^\top[\sigma^2\boldsymbol{R}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{R}^\top]^{-1}(\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r})^\top$ and $q=\frac{\boldsymbol{\hat{u}}^\top\boldsymbol{\hat{u}}}{\sigma^2}$, $\sigma^2$ is now the "true" variance of the disturbances. Basically, all we have to do now is to show that:

$d \vert \boldsymbol{X} \sim \chi^2(r)$
$q \vert \boldsymbol{X} \sim \chi^2(n-k)$
$d$ and $q$ are independent conditional on $\boldsymbol{X}$

Proof of 1

Let $\boldsymbol{a}=\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r}$. Under $H_0$, this becomes $\boldsymbol{R}\hat{\boldsymbol{\beta}}-\boldsymbol{r}=\boldsymbol{R}(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})$. Since the sampling error $\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}$ is distributed as $\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\vert \boldsymbol{X} \sim {\cal N}(\boldsymbol{0}_k,\sigma^2(\boldsymbol{X}^\top\boldsymbol{X})^{-1})$, we know that $\boldsymbol{a}$ is Mv normal with mean vector $\boldsymbol{0}_k$ but we still need to calculate the covariance matrix: \begin{align} Var(\boldsymbol{a} \vert \boldsymbol{X})=Var(\boldsymbol{R}(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta})\vert \boldsymbol{X})=\boldsymbol{R}Var(\hat{\boldsymbol{\beta}}-\boldsymbol{\beta}\vert \boldsymbol{X})\boldsymbol{R}^\top=\boldsymbol{R}\sigma^2(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{R}^\top \end{align} Thus, $d$ can be written as $\boldsymbol{a}^\top Var(\boldsymbol{a} \vert \boldsymbol{X})\boldsymbol{a}$ which is a quadratic form.

Furthermore, if $\boldsymbol{y} \in \mathbb{R}^{m \times 1}$ is $\cal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma})$ distributed, the quadratic form $(\boldsymbol{y}-\boldsymbol{\mu})^\top\boldsymbol{\Sigma}^{-1}(\boldsymbol{y}-\boldsymbol{\mu})$ is $\chi^2(m)$ distributed.

It follows that $d \vert \boldsymbol{X} \sim \chi^2(r)$.

Proof of 2

Remember: \begin{align*} \frac{\boldsymbol{\hat{u}}^\top\boldsymbol{\hat{u}}}{\sigma^2}&=\frac{\boldsymbol{u}^\top\boldsymbol{M}\boldsymbol{u}}{\sigma^2} \end{align*} Where $\boldsymbol{M}=\boldsymbol{I}_n-\boldsymbol{X}(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top$ is the residual maker matrix (which is symmetric and idempotent). Since $\boldsymbol{u} \vert \boldsymbol{X} \sim {\cal N}(\boldsymbol{0}_n, \sigma^2 \boldsymbol{I}_n)$ and $rank(\boldsymbol{M})=tr(\boldsymbol{M})=n-k$, we have ${q\vert \boldsymbol{X} \sim \chi^2(n-k)}$. Where the last part follows from the fact that if $\boldsymbol{y} \sim {\cal N}(\boldsymbol{0}_n,\sigma^2 \boldsymbol{I}_n)$ and $\boldsymbol{M} \in \mathbb{R}^{n \times n}$ is a symmetric, idempotent, and non-stochastic matrix, then the distribution of the quadratic form $\boldsymbol{y}^\top\boldsymbol{M}\boldsymbol{y}$ is given by: \begin{align} \boldsymbol{y}^\top\boldsymbol{M}\boldsymbol{y} \sim \chi^2(tr(\boldsymbol{M})) \end{align}

It follows that $q \vert \boldsymbol{X} \sim \chi^2(n-k)$.

Proof of 3

Basically, $d$ is a function of $\hat{\boldsymbol{\beta}}$ and $q$ is a function of the residuals $\boldsymbol{\hat{u}}$. Now, note that: \begin{align} Cov(\boldsymbol{\hat{\beta}},\boldsymbol{\hat{u}}\vert \boldsymbol{X})&=E([\boldsymbol{\hat{\beta}}-E(\boldsymbol{\hat{\beta}}\vert \boldsymbol{X})][\boldsymbol{\hat{u}}-E(\boldsymbol{\hat{u}}\vert \boldsymbol{X})]^\top \vert \boldsymbol{X})\\ &=E([\boldsymbol{\hat{\beta}}-\boldsymbol{\beta}][\boldsymbol{M}\boldsymbol{u}-E(\boldsymbol{M}\boldsymbol{u}\vert \boldsymbol{X})]^\top \vert \boldsymbol{X})\\ &=E((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{u}[\boldsymbol{M}\boldsymbol{u}-\boldsymbol{M}E(\boldsymbol{u}\vert \boldsymbol{X})]^\top \vert \boldsymbol{X})\\ &=E((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{u}[\boldsymbol{M}\boldsymbol{u}-\boldsymbol{M}]^\top \vert \boldsymbol{X})\\ &=E((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{u}[\boldsymbol{M}\boldsymbol{u}]^\top \vert \boldsymbol{X})\\ &=E((\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{u}\boldsymbol{u}^\top\boldsymbol{M}^\top \vert \boldsymbol{X})\\ &=(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top E(\boldsymbol{u}\boldsymbol{u}^\top\vert \boldsymbol{X})\boldsymbol{M}^\top\\ &=(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top \sigma^2\boldsymbol{I}_n\boldsymbol{M}^\top\\ &=\sigma^2(\boldsymbol{X}^\top\boldsymbol{X})^{-1}\boldsymbol{X}^\top\boldsymbol{M}^\top\\ &=\sigma^2(\boldsymbol{X}^\top\boldsymbol{X})^{-1}(\boldsymbol{M}\boldsymbol{X})^\top\\ &=\boldsymbol{0}_{n \times n} \end{align} Where the last line follows from the property of the residual-maker matrix that $\boldsymbol{M}\boldsymbol{X}=\boldsymbol{0}_{n \times n}$. Therefore, $\boldsymbol{\hat{\beta}}$ and $\boldsymbol{\hat{u}}$ are independently distributed conditional on $\boldsymbol{X}$ (two joint normal uncorrelated random variables are independent). Furthermore, if two random vectors $\boldsymbol{x}$ and $\boldsymbol{y}$ are independent, so are every measurable functions $f(\boldsymbol{x})$ and $g(\boldsymbol{y})$. Thus, $d$ and $q$ are independent conditional on $\boldsymbol{X}$.

Finally, combining these results yields the desired result $F \vert \boldsymbol{X} \sim F_{r,{n-k}}$. However, since this distribution does not depend on $\boldsymbol{X}$, the conditional distribution is identical to the unconditional distribution and we get $F \sim F_{r,{n-k}}$.

As you can see, you need to have a solid understanding about linear regressions and the distribution of quadratic forms to understand this explanation. If you want to refresh your knowledge about this stuff, I can recommend the book of Greene "Econometric Analysis". Especially the appendix of this book is helpful to get an overview about the distribution of quadratic forms and the multivariate normal distribution.

I hope that you find this answer helpful.

Best Answer

Proof of 1

Proof of 2

Proof of 3

Related Solutions

Solved – Asymptotic Distribution of the Wald Test Statistic

Solved – Why does scipy use Wald Statistic + t-test as opposed to Wald Statistic + Wald test for linear regression

Related Question