Econometrics – Relationship Between Random Effects Model and Mixed Models

econometricslme4-nlmemixed modelpanel dataplm

I used to think that "random effects model" in econometrics corresponds to a "mixed model with random intercept" outside of econometrics, but now I am not sure. Does it?

Econometrics uses terms like "fixed effects" and "random effects" somewhat differently from the literature on mixed models, and this causes a notorious confusion. Let us consider a simple situation where $y$ linearly depends on $x$ but with a different intercept in different groups of measurements:

$$y_{it} = \beta x_{it} + u_i + \epsilon_{it}.$$

Here each unit/group $i$ is observed at different timepoints $t$. Econometricians call it "panel data".

  • In mixed models terminology, we can treat $u_i$ as a fixed effect or as a random effect (in this case, it's random intercept). Treating it as fixed means fitting $\hat \beta$ and $\hat u_i$ to minimize squared error (i.e. running OLS regression with dummy group variables). Treating it as random means that we additionally assume that $u_i\sim\mathcal N(u_0,\sigma^2_u)$ and use maximum likelihood to fit $u_0$ and $\sigma^2_u$ instead of fitting each $u_i$ on its own. This leads to the "partial pooling" effect, where the estimates $\hat u_i$ get shrunk toward their mean $\hat u_0$.

    R formula when treating group as fixed:    y ~ x + group
    R formula when treating group as random:   y ~ x + (1|group)
    
  • In econometrics terminology, we can treat this whole model as a fixed effects model or as a random effects model. The first option is equivalent to the fixed effect above (but econometrics has its own way of estimating $\beta$ in this case, called "within" estimator). I used to think that the second option is equivalent to the random effect above; e.g. @JiebiaoWang in his highly upvoted answer to What is a difference between random effects-, fixed effects- and marginal model? says that

    In econometrics, the random-effects model may only refer to random intercept model as in biostatistics

Okay — let us test if this understanding is correct. Here is some random data generated by @ChristophHanck in his answer to What is the difference between fixed effect, random effect and mixed effect models? (I put the data here on pastebin for those who do not use R):

enter image description here

@Christoph does two fits using econometrics approaches:

fe <- plm(stackY~stackX, data = paneldata, model = "within")
re <- plm(stackY~stackX, data = paneldata, model = "random")

The first one yields the estimate of beta equal to -1.0451, the second one 0.77031 (yes, positive!). I tried to reproduce it with lm and lmer:

l1 = lm(stackY ~ stackX + as.factor(unit), data = paneldata)
l2 = lmer(stackY ~ stackX + (1|as.factor(unit)), data = paneldata)

The first one yields -1.045 in perfect agreement with the within estimator above. Cool. But the second yields -1.026, which is miles away from the random effects estimator. Heh? What is going on? In fact, what is plm even doing, when called with model = "random"?

Whatever it is doing, can one somehow understand it via the mixed models perspective?

And what is the intuition behind whatever it is doing? I read in a couple of econometrics places that random effects estimator is a weighted average between the fixed effects estimator and the "between" estimator which is more or less regression slope if we do not include group identity in the model at all (this estimate is strongly positive in this case, around 4.) E.g. @Andy writes here:

The random effects estimator then uses a matrix weighted average of the within and between variation of your data. […] This makes random effects more efficient[.]

Why? Why would we want this weighted average? And in particular, why would we want it instead of running a mixed model?

Best Answer

Summary: the "random-effects model" in econometrics and a "random intercept mixed model" are indeed the same models, but they are estimated in different ways. The econometrics way is to use FGLS, and the mixed model way is to use ML. There are different algorithms of doing FGLS, and some of them (on this dataset) produce results that are very close to ML.


1. Differences between estimation methods in plm

I will answer with my testing on plm(..., model = "random") and lmer(), using the data generated by @ChristophHanck.

According to the plm package manual, there are four options for random.method: the method of estimation for the variance components in the random effects model. @amoeba used the default one swar (Swamy and Arora, 1972).

For random effects models, four estimators of the transformation parameter are available by setting random.method to one of "swar" (Swamy and Arora (1972)) (default), "amemiya" (Amemiya (1971)), "walhus" (Wallace and Hussain (1969)), or "nerlove" (Nerlove (1971)).

I tested all the four options using the same data, getting an error for amemiya, and three totally different coefficient estimates for the variable stackX. The ones from using random.method='nerlove' and 'amemiya' are nearly equivalent to that from lmer(), -1.029 and -1.025 vs -1.026. They are also not very different from that obtained in the "fixed-effects" model, -1.045.

# "amemiya" only works using the most recent version:
# install.packages("plm", repos="http://R-Forge.R-project.org")

re0 <- plm(stackY~stackX, data = paneldata, model = "random") #random.method='swar'
re1 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='amemiya')
re2 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='walhus')
re3 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='nerlove')
l2  <- lmer(stackY~stackX+(1|as.factor(unit)), data = paneldata)

coef(re0)     #    (Intercept)   stackX    18.3458553   0.7703073 
coef(re1)     #    (Intercept)   stackX    30.217721   -1.025186 
coef(re2)     #    (Intercept)   stackX    -1.15584     3.71973 
coef(re3)     #    (Intercept)   stackX    30.243678   -1.029111 
fixef(l2)     #    (Intercept)   stackX    30.226295   -1.026482 

Unfortunately I do not have time right now, but interested readers can find the four references, to check their estimation procedures. It would be very helpful to figure out why they make such a difference. I expect that for some cases, the plm estimation procedure using the lm() on transformed data should be equivalent to the maximum likelihood procedure utilized in lmer().

2. Comparison between GLS and ML

The authors of plm package did compare the two in Section 7 of their paper: Yves Croissant and Giovanni Millo, 2008, Panel Data Econometrics in R: The plm package.

Econometrics deal mostly with non-experimental data. Great emphasis is put on specification procedures and misspecification testing. Model specifications tend therefore to be very simple, while great attention is put on the issues of endogeneity of the regressors, dependence structures in the errors and robustness of the estimators under deviations from normality. The preferred approach is often semi- or non-parametric, and heteroskedasticity-consistent techniques are becoming standard practice both in estimation and testing.

For all these reasons, [...] panel model estimation in econometrics is mostly accomplished in the generalized least squares framework based on Aitken’s Theorem [...]. On the contrary, longitudinal data models in nlme and lme4 are estimated by (restricted or unrestricted) maximum likelihood. [...]

The econometric GLS approach has closed-form analytical solutions computable by standard linear algebra and, although the latter can sometimes get computationally heavy on the machine, the expressions for the estimators are usually rather simple. ML estimation of longitudinal models, on the contrary, is based on numerical optimization of nonlinear functions without closed-form solutions and is thus dependent on approximations and convergence criteria.


3. Update on mixed models

I appreciate that @ChristophHanck provided a thorough introduction about the four random.method used in plm and explained why their estimates are so different. As requested by @amoeba, I will add some thoughts on the mixed models (likelihood-based) and its connection with GLS.

The likelihood-based method usually assumes a distribution for both the random effect and the error term. A normal distribution assumption is commonly used, but there are also some studies assuming a non-normal distribution. I will follow @ChristophHanck's notations for a random intercept model, and allow unbalanced data, i.e., let $T=n_i$.

The model is \begin{equation} y_{it}= \boldsymbol x_{it}^{'}\boldsymbol\beta + \eta_i + \epsilon_{it}\qquad i=1,\ldots,m,\quad t=1,\ldots,n_i \end{equation} with $\eta_i \sim N(0,\sigma^2_\eta), \epsilon_{it} \sim N(0,\sigma^2_\epsilon)$.

For each $i$, $$\boldsymbol y_i \sim N(\boldsymbol X_{i}\boldsymbol\beta, \boldsymbol\Sigma_i), \qquad\boldsymbol\Sigma_i = \sigma^2_\eta \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \sigma^2_\epsilon \boldsymbol I_{n_i}.$$ So the log-likelihood function is $$const -\frac{1}{2} \sum_i\mathrm{log}|\boldsymbol\Sigma_i| - \frac{1}{2} \sum_i(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta)^{'}\boldsymbol\Sigma_i^{-1}(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta).$$

When all the variances are known, as shown in Laird and Ware (1982), the MLE is $$\hat{\boldsymbol\beta} = \left(\sum_i\boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol X_i \right)^{-1} \left(\sum_i \boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol y_i \right),$$ which is equivalent to the GLS $\hat\beta_{RE}$ derived by @ChristophHanck. So the key difference is in the estimation for the variances. Given that there is no closed-form solution, there are several approaches:

  • directly maximization of the log-likelihood function using optimization algorithms;
  • Expectation-Maximization (EM) algorithm: closed-form solutions exist, but the estimator for $\boldsymbol \beta$ involves empirical Bayesian estimates of the random intercept;
  • a combination of the above two, Expectation/Conditional Maximization Either (ECME) algorithm (Schafer, 1998; R package lmm). With a different parameterization, closed-form solutions for $\boldsymbol \beta$ (as above) and $\sigma^2_\epsilon$ exist. The solution for $\sigma^2_\epsilon$ can be written as $$\sigma^2_\epsilon = \frac{1}{\sum_i n_i}\sum_i(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta})^{'}(\hat\xi \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \boldsymbol I_{n_i})^{-1}(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta}),$$ where $\xi$ is defined as $\sigma^2_\eta/\sigma^2_\epsilon$ and can be estimated in an EM framework.

In summary, MLE has distribution assumptions, and it is estimated in an iterative algorithm. The key difference between MLE and GLS is in the estimation for the variances.

Croissant and Millo (2008) pointed out that

While under normality, homoskedasticity and no serial correlation of the errors OLS are also the maximum likelihood estimator, in all the other cases there are important differences.

In my opinion, for the distribution assumption, just as the difference between parametric and non-parametric approaches, MLE would be more efficient when the assumption holds, while GLS would be more robust.