Solved – Glmer random effects model vs. dumthe-coded fixed effects

fixed-effects-modelgeneralized linear modellme4-nlmerrandom-effects-model

I'm trying to analyze the data from an experiment I conducted, and could use some guidance in relation to fixed vs. random effects.

The experiment was related to risk-seeking behavior in the context of hypothetical gambles, and implemented a 3 (Response Scale: Control vs. RI vs. ABR) x 3 (Stakes) X 5 (Endowment) factorial design. Response Scale was a between-subjects manipulation, and the levels of Stakes and Endowment were combined factorially to produce 15 different gamble scenarios, all of which were evaluated by each participant (i.e. gamble evaluation was within-subjects). The DV of interest for the particular analysis I'm working on is a binary indicator variable called "Would.Play" that describes whether a participant would choose to play the gamble if they were to encounter it in real life.

As a preliminary analysis, I'd like to be able to claim that there were no [or, as the data seem to indicate, were] meaningful differences in Would.Play as a result of random assignment to a particular Response Scale condition (designated by the factor variable "Response.Scale", ref="Control").

I can obviously do this with a binary logit for each of the 15 gambles (designated by the variable "Gamble.Num"), but I'd like to avoid issues with multiple testing. My preference, therefore, is to fit a single model that accounts for the heterogeneity in gambles by fitting a separate intercept for each gamble.

I've come across two ways to do this, each of which seems to give different results: Dummy "Fixed Effects" modeling in glm() and "random effects" modeling in glmer() (see output below).

It seems possible that the difference in the estimated coefficients could be the result of the Dummy "Fixed Effects" approach taking Gamble.Num==1 as a reference level, but I don't have a very deep understanding of the math underlying these two techniques. I was hoping someone would be able to give me a quick explanation of (a) why the these two models appear to give different results; and (b) whether one of these approaches is better suited to answering my question of interest: is there a unique effect of Response.Scale on Would.Play, taking heterogeneity in gambles into account?

Below is a quick look at the data I'm using, and the output of the two models:

## Data ##
head(analysis.0.data)
 Local.ID Condition Response.Scale RS.Code Gambles.First Gamble.Num Endowment Stakes
1        8         4             RI       1             0          1      -150     10
2        8         4             RI       1             0          2      -150     50
3        8         4             RI       1             0          3      -150    200
4        8         4             RI       1             0          4       -25     10
5        8         4             RI       1             0          5       -25     50
6        8         4             RI       1             0          6       -25    200
  Would.Play Perc.Risk
1          0         4
2          0         6
3          0         5
4          0         3
5          0         5
6          0         7


## Dummy "Fixed Effects" Model ##
summary(glm(Would.Play ~ Response.Scale + factor(Gamble.Num), family="binomial",     
data=analysis.0.data))

Call:
glm(formula = Would.Play ~ Response.Scale + factor(Gamble.Num), 
    family = "binomial", data = analysis.0.data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7766  -0.7204  -0.4678   0.7006   2.5394  

Coefficients:
                     Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -1.14906    0.21987  -5.226 1.73e-07 ***
Response.ScaleRI     -0.06749    0.12815  -0.527  0.59844    
Response.ScaleABR    -0.91035    0.13843  -6.576 4.82e-11 ***
factor(Gamble.Num)2  -0.94090    0.35886  -2.622  0.00874 ** 
factor(Gamble.Num)3  -1.12416    0.37769  -2.976  0.00292 ** 
factor(Gamble.Num)4   0.31966    0.28379   1.126  0.25999    
factor(Gamble.Num)5  -0.63953    0.33303  -1.920  0.05482 .  
factor(Gamble.Num)6  -0.85860    0.35120  -2.445  0.01449 *  
factor(Gamble.Num)7   1.42100    0.26770   5.308 1.11e-07 ***
factor(Gamble.Num)8   0.35620    0.28268   1.260  0.20765    
factor(Gamble.Num)9  -0.51138    0.32379  -1.579  0.11425    
factor(Gamble.Num)10  2.10754    0.27298   7.720 1.16e-14 ***
factor(Gamble.Num)11  0.28248    0.28496   0.991  0.32154    
factor(Gamble.Num)12 -1.02908    0.36760  -2.799  0.00512 ** 
factor(Gamble.Num)13  2.49612    0.28133   8.873  < 2e-16 ***
factor(Gamble.Num)14  1.72839    0.26867   6.433 1.25e-10 ***
factor(Gamble.Num)15  0.08524    0.29204   0.292  0.77039    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2649.2  on 2249  degrees of freedom
Residual deviance: 2096.4  on 2233  degrees of freedom
AIC: 2130.4

Number of Fisher Scoring iterations: 5


## GLMER "Random-Effects" Model##
summary(glmer(Would.Play ~ Response.Scale + (1|Gamble.Num), family="binomial", 
data=analysis.0.data))
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) 
[glmerMod]
 Family: binomial  ( logit )
Formula: Would.Play ~ Response.Scale + (1 | Gamble.Num)
   Data: analysis.0.data

     AIC      BIC   logLik deviance df.resid 
  2169.3   2192.1  -1080.6   2161.3     2246 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9011 -0.5461 -0.3522  0.5439  4.6708 

Random effects:
 Groups     Name        Variance Std.Dev.
 Gamble.Num (Intercept) 1.291    1.136   
Number of obs: 2250, groups:  Gamble.Num, 15

Fixed effects:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -0.90254    0.30722  -2.938  0.00331 ** 
Response.ScaleRI  -0.06682    0.12707  -0.526  0.59897    
Response.ScaleABR -0.90170    0.13727  -6.569 5.07e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Correlation of Fixed Effects:
            (Intr) Rs.SRI
Rspns.SclRI -0.202       
Rspns.ScABR -0.183  0.456

Thanks!

Best Answer

I am thinking about the same problem and found your question here. I'll tell you what I know so far. Maybe we can reach a satisfying conclusion.

The random effect models treat between subject (in your case, different Gamble.Nums) variation as normally distributed. But it may not be normal, and then if you try to fit a random effect model, the result will be biased. To fix this problem, people use dummy variables to code those subjects, and estimate their effect separately, instead of assuming a distribution. This approach looks good, but you have a bunch more parameters to estimate, and you don't have a statistical distribution of the whole population. For the random effect model, those subjects (Gamble.Nums) are assumed to be randomly sampled from a population. From the random model fit, you get estimates of the parameters of the distribution of that population.

Or, you can adopt a model selection point of view, and use anova(glmer, glm) to see which one is better.

From my own experience, the estimates from the glm using dummies and those from ranef(glmer) are strongly correlated. The real difference is not that large at all.