Summary: the "random-effects model" in econometrics and a "random intercept mixed model" are indeed the same models, but they are estimated in different ways. The econometrics way is to use FGLS, and the mixed model way is to use ML. There are different algorithms of doing FGLS, and some of them (on this dataset) produce results that are very close to ML.
1. Differences between estimation methods in plm
I will answer with my testing on plm(..., model = "random")
and lmer()
, using the data generated by @ChristophHanck.
According to the plm package manual, there are four options for random.method
: the method of estimation for the variance components in the random effects model. @amoeba used the default one swar
(Swamy and Arora, 1972).
For random effects models, four estimators of the transformation
parameter are available by setting random.method to one of "swar"
(Swamy and Arora (1972)) (default), "amemiya" (Amemiya (1971)),
"walhus" (Wallace and Hussain (1969)), or "nerlove" (Nerlove (1971)).
I tested all the four options using the same data, getting an error for amemiya
, and three totally different coefficient estimates for the variable stackX
. The ones from using random.method='nerlove'
and 'amemiya' are nearly equivalent to that from lmer()
, -1.029 and -1.025 vs -1.026. They are also not very different from that obtained in the "fixed-effects" model, -1.045.
# "amemiya" only works using the most recent version:
# install.packages("plm", repos="http://R-Forge.R-project.org")
re0 <- plm(stackY~stackX, data = paneldata, model = "random") #random.method='swar'
re1 <- plm(stackY~stackX, data = paneldata, model = "random", random.method='amemiya')
re2 <- plm(stackY~stackX, data = paneldata, model = "random", random.method='walhus')
re3 <- plm(stackY~stackX, data = paneldata, model = "random", random.method='nerlove')
l2 <- lmer(stackY~stackX+(1|as.factor(unit)), data = paneldata)
coef(re0) # (Intercept) stackX 18.3458553 0.7703073
coef(re1) # (Intercept) stackX 30.217721 -1.025186
coef(re2) # (Intercept) stackX -1.15584 3.71973
coef(re3) # (Intercept) stackX 30.243678 -1.029111
fixef(l2) # (Intercept) stackX 30.226295 -1.026482
Unfortunately I do not have time right now, but interested readers can find the four references, to check their estimation procedures. It would be very helpful to figure out why they make such a difference. I expect that for some cases, the plm
estimation procedure using the lm()
on transformed data should be equivalent to the maximum likelihood procedure utilized in lmer()
.
2. Comparison between GLS and ML
The authors of plm
package did compare the two in Section 7 of their paper: Yves Croissant and Giovanni Millo, 2008, Panel Data Econometrics in R: The plm package.
Econometrics deal mostly with non-experimental data. Great emphasis is put on specification procedures and misspecification testing. Model specifications tend therefore to be very simple, while great attention is put on the issues of endogeneity of the regressors, dependence
structures in the errors and robustness of the estimators under deviations from normality.
The preferred approach is often semi- or non-parametric, and heteroskedasticity-consistent
techniques are becoming standard practice both in estimation and testing.
For all these reasons, [...] panel model estimation in econometrics is mostly
accomplished in the generalized least squares framework based on Aitken’s Theorem [...]. On the contrary, longitudinal data
models in nlme
and lme4
are estimated by (restricted or unrestricted) maximum likelihood. [...]
The econometric GLS approach has closed-form analytical solutions computable by standard linear algebra and, although the latter can sometimes get computationally heavy on
the machine, the expressions for the estimators are usually rather simple. ML estimation of
longitudinal models, on the contrary, is based on numerical optimization of nonlinear functions without closed-form solutions and is thus dependent on approximations and convergence
criteria.
3. Update on mixed models
I appreciate that @ChristophHanck provided a thorough introduction about the four random.method
used in plm
and explained why their estimates are so different. As requested by @amoeba, I will add some thoughts on the mixed models (likelihood-based) and its connection with GLS.
The likelihood-based method usually assumes a distribution for both the random effect and the error term. A normal distribution assumption is commonly used, but there are also some studies assuming a non-normal distribution. I will follow @ChristophHanck's notations for a random intercept model, and allow unbalanced data, i.e., let $T=n_i$.
The model is
\begin{equation}
y_{it}= \boldsymbol x_{it}^{'}\boldsymbol\beta + \eta_i + \epsilon_{it}\qquad i=1,\ldots,m,\quad t=1,\ldots,n_i
\end{equation}
with $\eta_i \sim N(0,\sigma^2_\eta), \epsilon_{it} \sim N(0,\sigma^2_\epsilon)$.
For each $i$, $$\boldsymbol y_i \sim N(\boldsymbol X_{i}\boldsymbol\beta, \boldsymbol\Sigma_i), \qquad\boldsymbol\Sigma_i = \sigma^2_\eta \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \sigma^2_\epsilon \boldsymbol I_{n_i}.$$
So the log-likelihood function is $$const -\frac{1}{2} \sum_i\mathrm{log}|\boldsymbol\Sigma_i| - \frac{1}{2} \sum_i(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta)^{'}\boldsymbol\Sigma_i^{-1}(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta).$$
When all the variances are known, as shown in Laird and Ware (1982), the MLE is
$$\hat{\boldsymbol\beta} = \left(\sum_i\boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol X_i \right)^{-1} \left(\sum_i \boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol y_i \right),$$
which is equivalent to the GLS $\hat\beta_{RE}$ derived by @ChristophHanck. So the key difference is in the estimation for the variances. Given that there is no closed-form solution, there are several approaches:
- directly maximization of the log-likelihood function using optimization algorithms;
- Expectation-Maximization (EM) algorithm: closed-form solutions exist, but the estimator for $\boldsymbol \beta$ involves empirical Bayesian estimates of the random intercept;
- a combination of the above two, Expectation/Conditional Maximization Either (ECME) algorithm (Schafer, 1998; R package
lmm
). With a different parameterization, closed-form solutions for $\boldsymbol \beta$ (as above) and $\sigma^2_\epsilon$ exist. The solution for $\sigma^2_\epsilon$ can be written as $$\sigma^2_\epsilon = \frac{1}{\sum_i n_i}\sum_i(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta})^{'}(\hat\xi \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \boldsymbol I_{n_i})^{-1}(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta}),$$ where $\xi$ is defined as $\sigma^2_\eta/\sigma^2_\epsilon$ and can be estimated in an EM framework.
In summary, MLE has distribution assumptions, and it is estimated in an iterative algorithm. The key difference between MLE and GLS is in the estimation for the variances.
Croissant and Millo (2008) pointed out that
While under normality, homoskedasticity and no serial correlation of the errors OLS are also the maximum likelihood estimator, in all the other cases there are important differences.
In my opinion, for the distribution assumption, just as the difference between parametric and non-parametric approaches, MLE would be more efficient when the assumption holds, while GLS would be more robust.
Best Answer
Perhaps another way of seeing the difference is to focus on what the "fixed effect" is defined to be. In econometrics, a panel (longitudinal) model is typically specified as $$ y_{it} = X_{it}*b + a_{i} + e_{it} $$ where the $X$ matrix would be called the "right hand side" variables, the "design matrix", or the "independent variables", etc. The $a_i$ is an "unobserved error component". The term "Fixed Effect" or "Random Effect" has to do ONLY with the assumptions about the unobserved component ($a_i$).
If one assumes it is a "fixed effect", then the beta-hat statistics are robust to correlation between $a_i$ and $e_{it}$. That is, the beta-hat statistics are conditional on the "fixed" unobserved component being controlled for. One can either include a dummy variable for each individual $i$ in the data as part of the $X$ matrix to calculate this (bad idea) or it can just be partial-ed out (better idea).
The "Random Effect" assumption about $a_i$ allows for $a_i$ to be a random (unobserved) variable, but assumptions must be made about the independence (or at least lack of correlation) between $a_i$ and $e_{it}$.
Another way to state this is that under most assumptions, assuming that $a_i$ is "fixed" will result in consistent asymptotic estimates, whereas ONLY under the independence assumption will "random" effects be consistent. A Hausman-style test can be used to see if the Random effect assumption is valid. In most of the cases for the observational data that economists use, the random effects assumption (i.e. the assumption of non-correlation between the unobserved random component and the error term) is invalid ... and this is why economists tend to favor the "Fixed-Effect model" when using longitudinal data.
I too have seen a lot of confused jargon in the literature, mostly because people from different disciplines are talking past each other, and the term "Fixed" and Random" when applied to "Effects" are not used to communicate, but rather used as inertial labels, and cause inadvertent confusion. At this stage, most of what goes under the rubric of "Mixed Models" would simply be the "Random Effects" model from the typical Econometrician's perspective (which would tend to use the label "random coefficient model" for the equivalent math). That is, all the worry economists have about the inconsistency of the random effects assumption for panel data would (in observational data) still hold for any Mixed Model.