Solved – Concepts of mixed effects in statistics and econometrics, how to cope with them

biostatisticseconometrics

When I search for the definition about fixed-effects, random-effects or mixed-effects model here or elsewhere on the internet, there are a lot of discrepances. My first exposure to linear mixed-effects model was in longitudinal data analysis in Biostatistics. The definition is clear to me that the fixed-effect is the population-averaged effect, and random-effects is the subject-specific effect. Then the mixed-effects model is the model that contains both fixed-effects and random-effects. The mixed-effects model is usually the random effects model because it contains at least one fixed-effects parameters. Like time slope, you have one mean slope for all individuals in the data, and random-effects are those subject-specific slope deviating from the mean slope.

However in Econometrics, the deveoplments of fixed-effects and random-effects models have distinct definitions, which is whether heterogeneity correlates or not with the error term. Some statistical tests were developed to test whether fixed-effects or random-effects model should be used. There are lot of social science analyses adopting the Econometric approach as well. Therefore when I read the discussions about fixed-effects, random-effects or mixed-effects models posted by people from different areas, they always confuse me. Even though sometimes the mathematical defintions are similar, the modelling process and consideration behind it are quite different.

I hope there are some general discussions on Statistics and Econometrics about their discrepance in defintions or concepts rather than methodologies or algorithms used.

Best Answer

Perhaps another way of seeing the difference is to focus on what the "fixed effect" is defined to be. In econometrics, a panel (longitudinal) model is typically specified as $$ y_{it} = X_{it}*b + a_{i} + e_{it} $$ where the $X$ matrix would be called the "right hand side" variables, the "design matrix", or the "independent variables", etc. The $a_i$ is an "unobserved error component". The term "Fixed Effect" or "Random Effect" has to do ONLY with the assumptions about the unobserved component ($a_i$).

If one assumes it is a "fixed effect", then the beta-hat statistics are robust to correlation between $a_i$ and $e_{it}$. That is, the beta-hat statistics are conditional on the "fixed" unobserved component being controlled for. One can either include a dummy variable for each individual $i$ in the data as part of the $X$ matrix to calculate this (bad idea) or it can just be partial-ed out (better idea).

The "Random Effect" assumption about $a_i$ allows for $a_i$ to be a random (unobserved) variable, but assumptions must be made about the independence (or at least lack of correlation) between $a_i$ and $e_{it}$.

Another way to state this is that under most assumptions, assuming that $a_i$ is "fixed" will result in consistent asymptotic estimates, whereas ONLY under the independence assumption will "random" effects be consistent. A Hausman-style test can be used to see if the Random effect assumption is valid. In most of the cases for the observational data that economists use, the random effects assumption (i.e. the assumption of non-correlation between the unobserved random component and the error term) is invalid ... and this is why economists tend to favor the "Fixed-Effect model" when using longitudinal data.

I too have seen a lot of confused jargon in the literature, mostly because people from different disciplines are talking past each other, and the term "Fixed" and Random" when applied to "Effects" are not used to communicate, but rather used as inertial labels, and cause inadvertent confusion. At this stage, most of what goes under the rubric of "Mixed Models" would simply be the "Random Effects" model from the typical Econometrician's perspective (which would tend to use the label "random coefficient model" for the equivalent math). That is, all the worry economists have about the inconsistency of the random effects assumption for panel data would (in observational data) still hold for any Mixed Model.

1. Differences between estimation methods in `plm`

I will answer with my testing on plm(..., model = "random") and lmer(), using the data generated by @ChristophHanck.

According to the plm package manual, there are four options for random.method: the method of estimation for the variance components in the random effects model. @amoeba used the default one swar (Swamy and Arora, 1972).

For random effects models, four estimators of the transformation parameter are available by setting random.method to one of "swar" (Swamy and Arora (1972)) (default), "amemiya" (Amemiya (1971)), "walhus" (Wallace and Hussain (1969)), or "nerlove" (Nerlove (1971)).

I tested all the four options using the same data, ~~getting an error for amemiya~~, and three totally different coefficient estimates for the variable stackX. The ones from using random.method='nerlove' and 'amemiya' are nearly equivalent to that from lmer(), -1.029 and -1.025 vs -1.026. They are also not very different from that obtained in the "fixed-effects" model, -1.045.

# "amemiya" only works using the most recent version:
# install.packages("plm", repos="http://R-Forge.R-project.org")

re0 <- plm(stackY~stackX, data = paneldata, model = "random") #random.method='swar'
re1 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='amemiya')
re2 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='walhus')
re3 <- plm(stackY~stackX, data = paneldata, model = "random",  random.method='nerlove')
l2  <- lmer(stackY~stackX+(1|as.factor(unit)), data = paneldata)

coef(re0)     #    (Intercept)   stackX    18.3458553   0.7703073 
coef(re1)     #    (Intercept)   stackX    30.217721   -1.025186 
coef(re2)     #    (Intercept)   stackX    -1.15584     3.71973 
coef(re3)     #    (Intercept)   stackX    30.243678   -1.029111 
fixef(l2)     #    (Intercept)   stackX    30.226295   -1.026482

Unfortunately I do not have time right now, but interested readers can find the four references, to check their estimation procedures. It would be very helpful to figure out why they make such a difference. I expect that for some cases, the plm estimation procedure using the lm() on transformed data should be equivalent to the maximum likelihood procedure utilized in lmer().

2. Comparison between GLS and ML

The authors of plm package did compare the two in Section 7 of their paper: Yves Croissant and Giovanni Millo, 2008, Panel Data Econometrics in R: The plm package.

Econometrics deal mostly with non-experimental data. Great emphasis is put on specification procedures and misspecification testing. Model specifications tend therefore to be very simple, while great attention is put on the issues of endogeneity of the regressors, dependence structures in the errors and robustness of the estimators under deviations from normality. The preferred approach is often semi- or non-parametric, and heteroskedasticity-consistent techniques are becoming standard practice both in estimation and testing.

For all these reasons, [...] panel model estimation in econometrics is mostly accomplished in the generalized least squares framework based on Aitken’s Theorem [...]. On the contrary, longitudinal data models in nlme and lme4 are estimated by (restricted or unrestricted) maximum likelihood. [...]

The econometric GLS approach has closed-form analytical solutions computable by standard linear algebra and, although the latter can sometimes get computationally heavy on the machine, the expressions for the estimators are usually rather simple. ML estimation of longitudinal models, on the contrary, is based on numerical optimization of nonlinear functions without closed-form solutions and is thus dependent on approximations and convergence criteria.

3. Update on mixed models

I appreciate that @ChristophHanck provided a thorough introduction about the four random.method used in plm and explained why their estimates are so different. As requested by @amoeba, I will add some thoughts on the mixed models (likelihood-based) and its connection with GLS.

The likelihood-based method usually assumes a distribution for both the random effect and the error term. A normal distribution assumption is commonly used, but there are also some studies assuming a non-normal distribution. I will follow @ChristophHanck's notations for a random intercept model, and allow unbalanced data, i.e., let $T=n_i$.

The model is \begin{equation} y_{it}= \boldsymbol x_{it}^{'}\boldsymbol\beta + \eta_i + \epsilon_{it}\qquad i=1,\ldots,m,\quad t=1,\ldots,n_i \end{equation} with $\eta_i \sim N(0,\sigma^2_\eta), \epsilon_{it} \sim N(0,\sigma^2_\epsilon)$.

For each $i$, $$\boldsymbol y_i \sim N(\boldsymbol X_{i}\boldsymbol\beta, \boldsymbol\Sigma_i), \qquad\boldsymbol\Sigma_i = \sigma^2_\eta \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \sigma^2_\epsilon \boldsymbol I_{n_i}.$$ So the log-likelihood function is $$const -\frac{1}{2} \sum_i\mathrm{log}|\boldsymbol\Sigma_i| - \frac{1}{2} \sum_i(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta)^{'}\boldsymbol\Sigma_i^{-1}(\boldsymbol y_i - \boldsymbol X_{i}\boldsymbol\beta).$$

When all the variances are known, as shown in Laird and Ware (1982), the MLE is $$\hat{\boldsymbol\beta} = \left(\sum_i\boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol X_i \right)^{-1} \left(\sum_i \boldsymbol X_i^{'} \boldsymbol\Sigma_i^{-1} \boldsymbol y_i \right),$$ which is equivalent to the GLS $\hat\beta_{RE}$ derived by @ChristophHanck. So the key difference is in the estimation for the variances. Given that there is no closed-form solution, there are several approaches:

directly maximization of the log-likelihood function using optimization algorithms;
Expectation-Maximization (EM) algorithm: closed-form solutions exist, but the estimator for $\boldsymbol \beta$ involves empirical Bayesian estimates of the random intercept;
a combination of the above two, Expectation/Conditional Maximization Either (ECME) algorithm (Schafer, 1998; R package lmm). With a different parameterization, closed-form solutions for $\boldsymbol \beta$ (as above) and $\sigma^2_\epsilon$ exist. The solution for $\sigma^2_\epsilon$ can be written as $$\sigma^2_\epsilon = \frac{1}{\sum_i n_i}\sum_i(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta})^{'}(\hat\xi \boldsymbol 1_{n_i} \boldsymbol 1_{n_i}^{'} + \boldsymbol I_{n_i})^{-1}(\boldsymbol y_i - \boldsymbol X_{i} \hat{\boldsymbol\beta}),$$ where $\xi$ is defined as $\sigma^2_\eta/\sigma^2_\epsilon$ and can be estimated in an EM framework.

In summary, MLE has distribution assumptions, and it is estimated in an iterative algorithm. The key difference between MLE and GLS is in the estimation for the variances.

Croissant and Millo (2008) pointed out that

While under normality, homoskedasticity and no serial correlation of the errors OLS are also the maximum likelihood estimator, in all the other cases there are important differences.

In my opinion, for the distribution assumption, just as the difference between parametric and non-parametric approaches, MLE would be more efficient when the assumption holds, while GLS would be more robust.

Best Answer

Related Solutions

Comparison – Time-Series Econometrics vs. Panel Data Econometrics

Econometrics – Relationship Between Random Effects Model and Mixed Models

1. Differences between estimation methods in plm

2. Comparison between GLS and ML

3. Update on mixed models

Related Question

1. Differences between estimation methods in `plm`