You can and should use a well-specified random effects model. Always.
The Hausman test is said to suggest fixed effects models, but can and should be viewed "as a standard Wald test for the omission of the variables $\widetilde{\mathbf{X}}$" (Baltagi 2008, §4.3), where $\widetilde{\mathbf{X}}$ is a matrix of deviations from group means. If you do not omit $\widetilde{\mathbf{X}}$, a random effects model gives you the same population (fixed) effects as a fixed effects model, and the individual effects.
Mundlak (1978) argues that there is a unique estimator for the model
$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\alpha}+\mathbf{u}\qquad\qquad \mathbf{Z}=\mathbf{I}_{N}\otimes\mathbf{e}_T$$
where $\mathbf{I}_{N}$ is an identity matrix, $\otimes$ denotes Kronecker product, $\mathbf{e}_T$ is a vector of ones, so $\mathbf{Z}$ is the matrix of individual dummies, and $\boldsymbol{\alpha}=(\alpha_1,\dots,\alpha_N)$.
If $\alpha_i=\overline{\mathbf{X}}_{i*}\boldsymbol{\pi}+w_{i}$, $\boldsymbol{\pi}\ne\mathbf{0}$, averaging over $t$ for a given $i$, the model can be written as
$$\mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\qquad\qquad
\mathbf{P}=\mathbf{I}_N\otimes\bar{\mathbf{J}}_T$$
where $\mathbf{P}$ is a matrix which averages the observations across time for each individual (Baltagi 2008, §2.1). Under the fixed effects model, the within estimator is
$$\hat{\boldsymbol{\beta}}_{w}=(\mathbf{X'QX})^{-1}\mathbf{X'Qy}\tag{1}$$
where $\mathbf{Q}=\mathbf{I}-\mathbf{P}$ is a matrix which obtains the deviations from individual means. Mundlak argues that under the random effects model, to get the same estimates the estimator should be
$$\begin{bmatrix} \hat{\boldsymbol{\beta}} \\ \hat{\boldsymbol{\pi}}\end{bmatrix}=
\left(\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P}\end{bmatrix}\boldsymbol{\Sigma}^{-1}\begin{bmatrix}\mathbf{X}&\mathbf{XP} \end{bmatrix}\right)^{-1}\begin{bmatrix}\mathbf{X}' \\ \mathbf{X'P} \end{bmatrix}\boldsymbol{\Sigma}^{-1}\mathbf{y}\tag{2}$$
where $\boldsymbol{\Sigma}^{-1}$ is the variance of the error term,
while the "usual" estimator (the so-called "Balestra-Nerlove estimator") is
$$\hat{\boldsymbol{\beta}}=(\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\Sigma}^{-1}\mathbf{y}$$
which is biased. According to Mundlak, since $(1)$ and $(2)$ obtain the same estimates for $\boldsymbol{\beta}$, $(2)$ is the within estimator, i.e. $(1)$ is the unique estimator and does not depend on the knowledge of the variance components.
However, the models
$$\begin{align}
\mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}(\mathbf{X}\boldsymbol{\pi}+\mathbf{w})+\mathbf{u}\tag{FE} \\
\mathbf{y}&=\mathbf{X}\boldsymbol{\beta}+\mathbf{P}\mathbf{X}\boldsymbol{\pi}+(\mathbf{Pw}+\mathbf{u})\tag{RE}
\end{align}$$
are formally equivalent (Hsiao 2003, §4.3), so a random effects model obtains the same estimates ... as long as you do not omit $\widetilde{\mathbf{X}}$! Let's try.
Data generation (R code):
set.seed(1234)
N <- 25 # individuals
T <- 5 # time
In <- diag(N) # identity matrix of order N
Int <- diag(N*T) # identity matrix of order N*T
Jt <- matrix(1, T, T) # matrix of ones of order T
Jtm <- Jt / T
P <- kronecker(In, Jtm) # averages the obs across time for each individual
s2a <- 0.3 # sigma^2_\alpha
s2u <- 0.6 # sigma^2_u
w <- rep(rnorm(N, 0, sqrt(s2a)), each = T)
u <- rnorm(N*T, 0, sqrt(s2u))
b <- c(1.5, -2)
p <- c(-0.7, 0.8)
X <- cbind(runif(N*T, 2, 5), runif(N*T, 4, 8))
XPX <- cbind(X, P %*% X) # [ X PX ]
y <- XPX %*% c(b,p) + (P %*% w + u) # y = Xb + PXp + Pw + u
ds <- data.frame(id=rep(1:N, each=T), wave=rep(1:T, N), y, split(X, col(X)))
Under a fixed effects model we get:
> fe.1 <- plm(y ~ X1 + X2, data=ds, model="within")
> summary(fe.1)$coefficients
Estimate Std. Error t-value Pr(>|t|)
X1 1.435987 0.07825464 18.35019 1.806239e-33
X2 -1.916447 0.06339342 -30.23100 1.757634e-51
while under a random effects model...
> re.1 <- plm(y ~ X1 + X2, data=ds, model="random")
> summary(re.1)$coefficients
Estimate Std. Error t-value Pr(>|t|)
(Intercept) 1.830633 0.51687109 3.541759 5.638216e-04
X1 1.405060 0.07927271 17.724390 1.505521e-35
X2 -1.874784 0.06372731 -29.418846 3.076414e-57
bias!
But what if we do not omit $\widetilde{\mathbf{X}}=\mathbf{QX}$?
> Q <- diag(N*T) - P
> X1.mean <- P %*% ds$X1
> X1.dev <- Q %*% ds$X1
> X2.mean <- P %*% ds$X2
> X2.dev <- Q %*% ds$X2
> re.2 <- plm(y ~ X1.mean + X1.dev + X2.mean + X2.dev, data=ds, model="random")
> summary(re.2)$coefficients
Estimate Std. Error t-value Pr(>|t|)
(Intercept) -0.04123108 2.30907450 -0.01785611 9.857833e-01
X1.mean 0.81279279 0.38146339 2.13072292 3.515287e-02
X1.dev 1.43598746 0.07824535 18.35236883 1.239171e-36
X2.mean -1.23071499 0.26379329 -4.66545216 8.072196e-06
X2.dev -1.91644653 0.06338590 -30.23458903 5.809240e-58
The estimates for X1.dev
and X2.dev
are equal to the within estimates for X1
and X2
(no room for Hausman tests!), and you get much more. You get what you need.
However this is just the tip of the iceberg. I recommend that you read at least Bafumi and Gelman (2006), Snijders and Berkhof (2008), Bell and Jones (2014).
References
Baltagi, Badi H. (2008), Econometric Analysis of Panel Data, John Wiley & Sons
Bafumi, Joseph and Andrew Gelman (2006), Fitting Multilevel Models When Predictors and Group Effects Correlate, http://www.stat.columbia.edu/~gelman/research/unpublished/Bafumi_Gelman_Midwest06.pdf
Bell, Andrew and Kelvyn Jones (2014), "Explaining Fixed Effects: Random Effects modelling of Time-Series Cross-Sectional and Panel Data", Political Science Research and Methods, http://dx.doi.org/10.7910/DVN/23415
Hsiao, Cheng (2003), Analysis of Panel Data, Cambridge University Press
Mundlak, Yair (1978), "On the Pooling of Time Series and Cross Section Data", Econometrica, 43(1), 44-56
Sniiders, Tom A. B. and Johannes Berkhof (2008), "Diagnostic Checks for Multilevel Models", in: Jan de Leeuw and Erik Meijer (eds), Handbook of Multilevel Analysis, Springer, Chap. 3
Best Answer
The approaches you've outlined differ in terms of how you're using the team-specific data to measure the baseline happy score (happy score in absence of intervention). These approaches correspond to a "pooled" approach, an "unpooled" approach, and a "partially pooled" approach.
The partially pooled approach is the most reasonable of the approaches since it balances the team-specific information with information from the grand mean across teams (see http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf). I explain why this is, below.
Pooled model: lm(happy_score ~ treatment + education + income)
Unpooled model: lm(happy_score ~ treatment + education + income + team)
Partially pooled model: lmer(happy_score ~ treatment + education + income + (1|team), data = data)
The robust approach should yield a similar answer to the multilevel model approach in this scenario, except you won't get the "shrunken" team-specific estimates that the multilevel model would provide. If this were a Poisson or logistic multilevel model, then estimated effects would differ, since the robust approach would yield "population average" effects while the multilevel model would yield "subject specific" effects.
Also for all the above approaches, I'd suggest to change the coding of the treatment variable to a 5 category variable [before (ref), after-FF, after-FM, after-MF, after-MM]. This coding is the constrained baseline approach to the analysis of cluster RCTs (see Hooper R et al. Analysis of cluster randomised trials with an assessment of outcome at baseline. BMJ 2018).