Hausman Test – Why Larger Samples Make the Hausman Test Statistic More Significant

covariance-matrixfixed-effects-modelhausmanpanel datastandard error

Hausman test statistic formula:
$$
H=(\beta_{f}-\beta_{r})' \left[\mathrm{Cov}(\beta_{f})-\mathrm{Cov}(\beta_{r})\right]^{-1}(\beta_{f}-\beta_{r} )
$$
where $\beta_{f}$ is the beta of fixed effects model and $\beta_{r}$ is the beta of random effects model.

What I understand so far: the standard error decreases with increasing data sample size

What I do not understand: The relation between standard error and covariance-variance matrix, which in turn increases the Hausman test statistic, as stated in the above formula.

Why is the Hausman test statistic automatically higher, the higher the data sample?

Best Answer

First for your question about the variance-covariance and s.e. relationship: the variance-covariance matrix is a symmetric matrix which contains on the off-diagonal elements the covariances between all your betas in the model. The main diagonal elements contain the variance of each beta. If you take the square root of the main diagonal entries, you get the standard error of your betas.

Now to Hausman.
Since random effects is a matrix weighted average of the within and between variation in your data it is more efficient (i.e. has lower variance) than the fixed effects estimator which only exploits the within variation. If you want to test the difference between both models, you can write the test statistic as $$H = (\beta_{FE}-\beta_{RE})'[Var(\beta_{FE})-Var(\beta_{RE})]^{-1}(\beta_{FE}-\beta_{RE})$$

Given that RE is more efficient the difference in the variances is positive definite - or at least it should be. If you use different variance estimators in the two regressions then $H$ might as well be negative. Often this is a sign of model miss-specification but this is a tricky discussion as there can be other instances for which the test statistic may be negative. Let's not consider those for the moment for simplicity.

If you now increase the sample size, you correctly said that your estimators become more efficient. Consequently $[Var(\beta_{FE})-Var(\beta_{RE})]^{-1}$ becomes smaller. Note that this difference is the denominator of a fraction, so as the denominator becomes smaller the fraction becomes bigger.

Maybe this is more intuitive if we consider the case when you are interested in a single variable (call it $k$) only. In this case the test statistic can be written as $$H =\frac{(\beta_{FE,k}-\beta_{RE,k})}{\sqrt{[se(\beta_{FE,k})^{2}-se(\beta_{RE,k})^{2}]}}$$

To give a numerical example let's start first with the small sample. Let's say the difference in coefficients is 100 and their standard errors in FE and RE are 10 and 5, respectively: $$H_{small} =\frac{(100)}{\sqrt{[10^{2}-5^{2}]}} = 11.547$$

Then you increase the sample size and suppose the standard errors reduce by one half: $$H_{large} =\frac{(100)}{\sqrt{[5^{2}-2.5^{2}]}} = 23.094$$ Now you see how the test statistic becomes larger for a larger sample (as the denominator decreases in size thanks to the smaller standard errors). The intuition for the test statistic in matrix notation is the same.