Solved – Modeling within group correlation – random effects, fixed effects, clustered standard errors

fixed-effects-modelmixed modelmultilevel-analysisrrandom-effects-model

I know this kind of question has been asked before, but I can't find anything that clearly elucidates the issue. What is the 'right' way to model the follow situation:

Let's say I pair two people up randomly (on the basis of their sex) such that there are four possible groups: MM, FF, MF, and FM. I have them play some behavioral economics game, and I am interested in their before-and-after differences on some variable, let's say their happiness.

In R, I can think of several potential ways to model this relationship. Assume treatment is a categorical variable containing the above four categories:

lm(happy_score ~ treatment + education + income)

However, this doesn't account for the fact that, within every 2-person team, there will be correlations unaccounted for in the above specification.

So, we can add a fixed effect for team (a unique identifier for each 2-person team)

lm(happy_score ~ treatment + education + income + team)

This seems to not be getting at what I want, as it will compute a separate slope for each team, when I really want to just 'control' for the correlation that is likely to be shared between two individuals on the same team.

I've also been told to address this issue we can cluster standard errors at the team level, so:

 lm_robust(
  happy_score ~ treatment + education + income,
  data = data,
  clusters = team,
  se = "stata"
)

But I'm not sure what this is doing that is different from adding a fixed effect. Should I have both fixed effects and clustered standard errors?

I've also been pushed in the direction of mixed models, but I don't understand what they are substantively doing that is different from some combination of fixed effects and interactions:

 lmer(happy_score ~ treatment +  education + income 
                (1|team), data = data)

Are there intuitive ways of thinking about each of these options?

Best Answer

The approaches you've outlined differ in terms of how you're using the team-specific data to measure the baseline happy score (happy score in absence of intervention). These approaches correspond to a "pooled" approach, an "unpooled" approach, and a "partially pooled" approach.

The partially pooled approach is the most reasonable of the approaches since it balances the team-specific information with information from the grand mean across teams (see http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf). I explain why this is, below.

Pooled model: lm(happy_score ~ treatment + education + income)

This approach ignores variability in the baseline. The variation in the baseline happy score would be reflected in the standard error of the estimated treatment effect.

Unpooled model: lm(happy_score ~ treatment + education + income + team)

This approach will estimate a separate underlying baseline for each team. This approach is thought of as the unpooled approach since no information on baseline happy score is shared across the teams. The problem is that there is probably a fair bit of error in measurement of the baselines, since you only have 4 observations per team. Some teams with extremely high or extremely low initial scores will probably experience regression to the grand mean, irrespective of the intervention. So in this unpooled approach, the treatment effect is measured on top of this noisily measured baseline. The variation due to the poorly measured baseline happy score would be reflected in the estimated treatment effect, which would have a relatively high standard error.

Partially pooled model: lmer(happy_score ~ treatment + education + income + (1|team), data = data)

The multilevel model approach will provide estimates of each team's baseline that is "partially pooled" across teams. This means that team-specific means are "shrunken" towards the grand mean. The optimal degree of shrinkage/pooling is determined by the data. If there is lots of real variation (that lasts across time points) between teams, then the degree of pooling will be small. If most of the variation between teams is noise (i.e. it's due to measurement error or something that is not consistent across time points), then the degree of pooling will be large. With this partial pooling approach, your estimated treatment effect will benefit from the more accurate measurement of the baseline and should have a relatively low standard error.
The intuition for the partial pooling approach is nicely explained in Efron's classic paper - "Stein's Paradox in Statistics" (http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf)

The robust approach should yield a similar answer to the multilevel model approach in this scenario, except you won't get the "shrunken" team-specific estimates that the multilevel model would provide. If this were a Poisson or logistic multilevel model, then estimated effects would differ, since the robust approach would yield "population average" effects while the multilevel model would yield "subject specific" effects.

Also for all the above approaches, I'd suggest to change the coding of the treatment variable to a 5 category variable [before (ref), after-FF, after-FM, after-MF, after-MM]. This coding is the constrained baseline approach to the analysis of cluster RCTs (see Hooper R et al. Analysis of cluster randomised trials with an assessment of outcome at baseline. BMJ 2018).

Related Solutions

Panel Data – Understanding Within-Group Estimate and Individual Fixed Effects Retrieved

You can and should use a well-specified random effects model. Always.

The Hausman test is said to suggest fixed effects models, but can and should be viewed "as a standard Wald test for the omission of the variables $\widetilde{\mathbf{X}}$" (Baltagi 2008, §4.3), where $\widetilde{\mathbf{X}}$ is a matrix of deviations from group means. If you do not omit $\widetilde{\mathbf{X}}$, a random effects model gives you the same population (fixed) effects as a fixed effects model, and the individual effects.

Mundlak (1978) argues that there is a unique estimator for the model $$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\alpha}+\mathbf{u}\qquad\qquad \mathbf{Z}=\mathbf{I}_{N}\otimes\mathbf{e}_T$$ where $\mathbf{I}_{N}$ is an identity matrix, $\otimes$ denotes Kronecker product, $\mathbf{e}_T$ is a vector of ones, so $\mathbf{Z}$ is the matrix of individual dummies, and $\boldsymbol{\alpha}=(\alpha_1,\dots,\alpha_N)$.

If $\alpha_i=\overline{\mathbf{X}}_{i*}\boldsymbol{\pi}+w_{i}$, $\boldsymbol{\pi}\ne\mathbf{0}$, averaging over $t$ for a given $i$, the model can be written as $$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\qquad\qquad \mathbf{P}=\mathbf{I}_N\otimes\bar{\mathbf{J}}_T$$ where $\mathbf{P}$ is a matrix which averages the observations across time for each individual (Baltagi 2008, §2.1). Under the fixed effects model, the within estimator is $$\hat{\boldsymbol{\beta}}_{w}=(\mathbf{X'QX})^{-1}\mathbf{X'Qy}\tag{1}$$ where $\mathbf{Q}=\mathbf{I}-\mathbf{P}$ is a matrix which obtains the deviations from individual means. Mundlak argues that under the random effects model, to get the same estimates the estimator should be $$\begin{bmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\pi}}\end{bmatrix}= \left(\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P}\end{bmatrix}\boldsymbol{\Sigma}^{-1}\begin{bmatrix}\mathbf{X}&\mathbf{XP} \end{bmatrix}\right)^{-1}\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P} \end{bmatrix}\boldsymbol{\Sigma}^{-1}\mathbf{y}\tag{2}$$ where $\boldsymbol{\Sigma}^{-1}$ is the variance of the error term, while the "usual" estimator (the so-called "Balestra-Nerlove estimator") is $$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{y}$$ which is biased. According to Mundlak, since $(1)$ and $(2)$ obtain the same estimates for $\boldsymbol{\beta}$, $(2)$ is the within estimator, i.e. $(1)$ is the unique estimator and does not depend on the knowledge of the variance components.

However, the models $$\begin{align} \mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\tag{FE} \\ \mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}\mathbf{X}\boldsymbol{\pi}+(\mathbf{Pw}+\mathbf{u})\tag{RE} \end{align}$$ are formally equivalent (Hsiao 2003, §4.3), so a random effects model obtains the same estimates ... as long as you do not omit $\widetilde{\mathbf{X}}$! Let's try.

Data generation (R code):

set.seed(1234)
N <- 25                 # individuals
T <- 5                  # time
In <- diag(N)           # identity matrix of order N
Int <- diag(N*T)        # identity matrix of order N*T
Jt <- matrix(1, T, T)   # matrix of ones of order T
Jtm <- Jt / T           
P <- kronecker(In, Jtm) # averages the obs across time for each individual
s2a <- 0.3              # sigma^2_\alpha
s2u <- 0.6              # sigma^2_u
w <- rep(rnorm(N, 0, sqrt(s2a)), each = T)
u <- rnorm(N*T, 0, sqrt(s2u))
b <- c(1.5, -2)
p <- c(-0.7, 0.8)
X <- cbind(runif(N*T, 2, 5), runif(N*T, 4, 8))
XPX <- cbind(X, P %*% X) # [ X PX ]
y <- XPX %*% c(b,p) + (P %*% w + u) # y = Xb + PXp + Pw + u
ds <- data.frame(id=rep(1:N, each=T), wave=rep(1:T, N), y, split(X, col(X)))

Under a fixed effects model we get:

> fe.1 <- plm(y ~ X1 + X2, data=ds, model="within")
> summary(fe.1)$coefficients
    Estimate Std. Error   t-value     Pr(>|t|)
X1  1.435987 0.07825464  18.35019 1.806239e-33
X2 -1.916447 0.06339342 -30.23100 1.757634e-51

while under a random effects model...

> re.1 <- plm(y ~ X1 + X2, data=ds, model="random")
> summary(re.1)$coefficients
             Estimate Std. Error    t-value     Pr(>|t|)
(Intercept)  1.830633 0.51687109   3.541759 5.638216e-04
X1           1.405060 0.07927271  17.724390 1.505521e-35
X2          -1.874784 0.06372731 -29.418846 3.076414e-57

bias!

But what if we do not omit $\widetilde{\mathbf{X}}=\mathbf{QX}$?

> Q <- diag(N*T) - P
> X1.mean <- P %*% ds$X1
    > X1.dev  <- Q %*% ds$X1
> X2.mean <- P %*% ds$X2
    > X2.dev  <- Q %*% ds$X2
> re.2 <- plm(y ~ X1.mean + X1.dev + X2.mean + X2.dev, data=ds, model="random")
> summary(re.2)$coefficients
               Estimate Std. Error      t-value     Pr(>|t|)
(Intercept) -0.04123108 2.30907450  -0.01785611 9.857833e-01
X1.mean      0.81279279 0.38146339   2.13072292 3.515287e-02
X1.dev       1.43598746 0.07824535  18.35236883 1.239171e-36
X2.mean     -1.23071499 0.26379329  -4.66545216 8.072196e-06
X2.dev      -1.91644653 0.06338590 -30.23458903 5.809240e-58

The estimates for X1.dev and X2.dev are equal to the within estimates for X1 and X2 (no room for Hausman tests!), and you get much more. You get what you need.

However this is just the tip of the iceberg. I recommend that you read at least Bafumi and Gelman (2006), Snijders and Berkhof (2008), Bell and Jones (2014).

References

Baltagi, Badi H. (2008), Econometric Analysis of Panel Data, John Wiley & Sons

Bafumi, Joseph and Andrew Gelman (2006), Fitting Multilevel Models When Predictors and Group Effects Correlate, http://www.stat.columbia.edu/~gelman/research/unpublished/Bafumi_Gelman_Midwest06.pdf

Bell, Andrew and Kelvyn Jones (2014), "Explaining Fixed Effects: Random Effects modelling of Time-Series Cross-Sectional and Panel Data", Political Science Research and Methods, http://dx.doi.org/10.7910/DVN/23415

Hsiao, Cheng (2003), Analysis of Panel Data, Cambridge University Press

Mundlak, Yair (1978), "On the Pooling of Time Series and Cross Section Data", Econometrica, 43(1), 44-56

Sniiders, Tom A. B. and Johannes Berkhof (2008), "Diagnostic Checks for Multilevel Models", in: Jan de Leeuw and Erik Meijer (eds), Handbook of Multilevel Analysis, Springer, Chap. 3

Solved – cluster-robust standard errors are smaller than unclustered ones in fgls with cluster fixed effects

Robust clustered standard errors can change your standard errors in both directions. That is, clustered standard errors can be larger or smaller than conventional standard errors. The direction in which standard errors will change depends on the sign of the intra-class correlation. This post explains robust standard errors in greater detail.

Best Answer

Related Solutions

Panel Data – Understanding Within-Group Estimate and Individual Fixed Effects Retrieved

Solved – cluster-robust standard errors are smaller than unclustered ones in fgls with cluster fixed effects

Related Question