Solved – Hausman test with or without covariates

fixed-effects-modelhausmanpanel datarandom variable

I am using panel data and I would like to determine whether I can use the Random Effects (RE) model instead of Fixed Effects (FE) to estimate one coefficient of interest.
When I use the Hausman test comparing FE and RE, I have to reject the null hypothesis (meaning that the RE model is not OK). However, the difference between the coefficient of interest estimated by FE and RE is not statistically significant.
So my question is: Can I justify using the RE model only based on this fact? After all, the null hypothesis of the Hausman test must be rejected only because the estimation of some other covariates (control variables) significantly differ between the RE and FE approaches. But in my case, these variables are not of interest and I do not seek for consistent estimators.

Best Answer

The choice between FE and RE models depends on the focus of the statistical inference. The FE model is an appropriate specification if we are focusing on a specific set of $N$ individuals (say, $N$ firms or $N$ OECD countries, or $N$ American states) and our inference is restricted to the behavior of this set of individuals. The RE model is an appropriate specifiction if we are drawing $N$ individuals randomly from a large population and are trying to make inferences about that population (see Baltagi, Econometric Analysis of Panel Data, 2008, §§2.2-3). The Hausman test can't say anything about your focus.

The Hausman test is asymptotically equivalent to a standard Wald test for the omission of $\tilde{\mathbf{X}}$, a matrix of deviations from individual means (see Baltagi, 2008, §4.3). In other words, given the model $$y_{it}=\mathbf{x}_{it}\boldsymbol{\beta}+\mu_i+u_{it}\tag{1}$$ one can split $\mathbf{x}_{it}$: $$y_{it}=(\bar{\mathbf{x}}_i+\tilde{\mathbf{x}}_{it})'\boldsymbol{\beta}+\mu_i+u_{it}\tag{2}$$ where $\bar{\mathbf{x}}_i$ is the vector of individual time-invariant means for the $i$th individual and $\tilde{\mathbf{x}}_{it}=\mathbf{x}_{it}-\bar{\mathbf{x}}_i$. Further, one can give separate parameters $\boldsymbol{\beta}_1$ to the individual means and $\boldsymbol{\beta}_2$ to the deviation variables: $$y_{it}=\bar{\mathbf{x}}_i'\boldsymbol{\beta}_1+\tilde{\mathbf{x}}_{it}'\boldsymbol{\beta}_2+\mu_i+u_{it}\tag{3}$$ $\boldsymbol{\beta}_1$ is a between regression coefficient, while $\boldsymbol{\beta}_1$ is the within (FE) regression coefficient. The Hausman test is based on $\hat{\boldsymbol{\beta}}_{RE}-\hat{\boldsymbol{\beta}}_{FE}$, but can equivalently be based on $\hat{\boldsymbol{\beta}}_1-\hat{\boldsymbol{\beta}}_2$ (see Baltagi, 2008, §4.3).

As to correlation, some variables in $\mathbf{X}$ may be correlated with $\boldsymbol{\mu}$, but $\tilde{\mathbf{x}}_{it}$ is orthogonal to $\mathbf{1}\mu_i$ ($\mathbf{1}$ is a vector of ones) for all $i$. Thus:

  • under a FE framework, the time-invariant terms $\bar{\mathbf{x}}_i'\boldsymbol{\beta}_1$ and $\mu_i$ are swept out and one gets unbiased and consistent estimates for $\boldsymbol{\beta}_2$;
  • under a RE framework,
    • if one estimates models $(1)$ or $(2)$ the implicit assumption $\hat{\boldsymbol{\beta}}_1=\hat{\boldsymbol{\beta}}_2$ doesn't hold and the Hausman test fails;
    • if one estimates model $(3)$, then $\hat{\boldsymbol{\beta}}_2$ is unbiased and consistent (it is exactly identical to $\hat{\boldsymbol{\beta}}_{FE}$) and one can ignore or suppress $\bar{\mathbf{x}}_i'\boldsymbol{\beta}_1$, or use $\bar{\mathbf{x}}_i'\boldsymbol{\beta}_1+\mu_i$ to model the random intercept (see, e.g., Snijders and Bosker, Multilevel Analysis, 2012, chap. 4).

In brief, you can estimate a RE model which passes the Hausman test by just splitting your $\mathbf{X}$ matrix into its individual time-invariant means $\bar{\mathbf{X}}$ and the within-individual time-varying deviations $\tilde{\mathbf{X}}$ (see here for a simple example, additional details and references). I'd say that such a coherent approach would be better than an eventual and questionable 'mixture' of consistent and inconsistent estimates.