Solved – Modeling within group correlation – random effects, fixed effects, clustered standard errors

fixed-effects-modelmixed modelmultilevel-analysisrrandom-effects-model

I know this kind of question has been asked before, but I can't find anything that clearly elucidates the issue. What is the 'right' way to model the follow situation:

Let's say I pair two people up randomly (on the basis of their sex) such that there are four possible groups: MM, FF, MF, and FM. I have them play some behavioral economics game, and I am interested in their before-and-after differences on some variable, let's say their happiness.

In R, I can think of several potential ways to model this relationship. Assume treatment is a categorical variable containing the above four categories:

lm(happy_score ~ treatment + education + income)

However, this doesn't account for the fact that, within every 2-person team, there will be correlations unaccounted for in the above specification.

So, we can add a fixed effect for team (a unique identifier for each 2-person team)

lm(happy_score ~ treatment + education + income + team)

This seems to not be getting at what I want, as it will compute a separate slope for each team, when I really want to just 'control' for the correlation that is likely to be shared between two individuals on the same team.

I've also been told to address this issue we can cluster standard errors at the team level, so:

 lm_robust(
  happy_score ~ treatment + education + income,
  data = data,
  clusters = team,
  se = "stata"
)

But I'm not sure what this is doing that is different from adding a fixed effect. Should I have both fixed effects and clustered standard errors?

I've also been pushed in the direction of mixed models, but I don't understand what they are substantively doing that is different from some combination of fixed effects and interactions:

 lmer(happy_score ~ treatment +  education + income 
                (1|team), data = data)    

Are there intuitive ways of thinking about each of these options?

Best Answer

The approaches you've outlined differ in terms of how you're using the team-specific data to measure the baseline happy score (happy score in absence of intervention). These approaches correspond to a "pooled" approach, an "unpooled" approach, and a "partially pooled" approach.

The partially pooled approach is the most reasonable of the approaches since it balances the team-specific information with information from the grand mean across teams (see http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf). I explain why this is, below.

Pooled model: lm(happy_score ~ treatment + education + income)

  • This approach ignores variability in the baseline. The variation in the baseline happy score would be reflected in the standard error of the estimated treatment effect.

Unpooled model: lm(happy_score ~ treatment + education + income + team)

  • This approach will estimate a separate underlying baseline for each team. This approach is thought of as the unpooled approach since no information on baseline happy score is shared across the teams. The problem is that there is probably a fair bit of error in measurement of the baselines, since you only have 4 observations per team. Some teams with extremely high or extremely low initial scores will probably experience regression to the grand mean, irrespective of the intervention. So in this unpooled approach, the treatment effect is measured on top of this noisily measured baseline. The variation due to the poorly measured baseline happy score would be reflected in the estimated treatment effect, which would have a relatively high standard error.

Partially pooled model: lmer(happy_score ~ treatment + education + income + (1|team), data = data)

  • The multilevel model approach will provide estimates of each team's baseline that is "partially pooled" across teams. This means that team-specific means are "shrunken" towards the grand mean. The optimal degree of shrinkage/pooling is determined by the data. If there is lots of real variation (that lasts across time points) between teams, then the degree of pooling will be small. If most of the variation between teams is noise (i.e. it's due to measurement error or something that is not consistent across time points), then the degree of pooling will be large. With this partial pooling approach, your estimated treatment effect will benefit from the more accurate measurement of the baseline and should have a relatively low standard error.
  • The intuition for the partial pooling approach is nicely explained in Efron's classic paper - "Stein's Paradox in Statistics" (http://statweb.stanford.edu/~ckirby/brad/other/Article1977.pdf)

The robust approach should yield a similar answer to the multilevel model approach in this scenario, except you won't get the "shrunken" team-specific estimates that the multilevel model would provide. If this were a Poisson or logistic multilevel model, then estimated effects would differ, since the robust approach would yield "population average" effects while the multilevel model would yield "subject specific" effects.

Also for all the above approaches, I'd suggest to change the coding of the treatment variable to a 5 category variable [before (ref), after-FF, after-FM, after-MF, after-MM]. This coding is the constrained baseline approach to the analysis of cluster RCTs (see Hooper R et al. Analysis of cluster randomised trials with an assessment of outcome at baseline. BMJ 2018).

Related Question