Solved – Effects size calculation for fixed factors in GLMER

effect-sizelme4-nlmeodds-ratio

I'm running a GLMER with two categorical fixed effects, their interaction term, and one categorical random effect.

Can you suggest/explain a technique for calculating the effect size of the fixed factors?

The issue is that I have many observations (4,000 – 10,000) and I know that very small differences at this scale will produce significant p values even though the effect may be meager, so a measure of the size of the effect would be a better value to provide for readers to understand the data.

I think that the odds ratio is what I'm after, but any additional information (or resources) about how it's calculated in this situation (mathematically) or what accepted practice is would be a great help.

Here's a sample output from my analysis

     AIC      BIC   logLik deviance df.resid 
  9197.8   9233.0  -4593.9   9187.8     8533 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2606 -0.8777  0.4258  0.6301  1.6139 

Random effects:
 Groups Name        Variance Std.Dev.
 pid    (Intercept) 1.106    1.052   
Number of obs: 8538, groups:  pid, 170

Fixed effects:
                                Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      1.44524    0.13077  11.052  < 2e-16 ***
train_condswitch                -0.56156    0.18239  -3.079 0.002077 ** 
eg_typepartial                  -0.20007    0.07959  -2.514 0.011941 *  
train_condswitch:eg_typepartial  0.41786    0.10890   3.837 0.000125 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) trn_cn eg_typ
trn_cndswtc -0.712              
eg_typeprtl -0.351  0.250       
trn_cndsw:_  0.255 -0.331 -0.731
> eg_type = full:
 train_cond    lsmean        SE df asymp.LCL asymp.UCL
 classify   1.4452365 0.1307725 NA 1.1889271  1.701546
 switch     0.8836727 0.1280070 NA 0.6327837  1.134562

eg_type = partial:
 train_cond    lsmean        SE df asymp.LCL asymp.UCL
 classify   1.2451647 0.1270257 NA 0.9961990  1.494130
 switch     1.1014605 0.1265104 NA 0.8535046  1.349416

Results are given on the logit scale. 
Confidence level used: 0.95 

Best Answer

As @JackeJR commented (+1), the answer is indeed to do simulation-based power analysis. An easily accessible paper on the matter is by Johnson et al. Power analysis for generalized linear mixed models in ecology and evolution. Check the website included; the authors have provided in their supplementary material relevant R code as well as very detailed tutorial (somewhat unintuitively it is under the name 'AppendixS1.pdf'). Notice also that the lme4 package has a simulate() function so you can use that directly if you wish.

Conceptually note that power can be affected by different issues; for example in a binomial setting as the one you seem to work with, overdispersion is one of the main cause of power drain. Moreover, consider that there are two more major kind of power analysis: post hoc and a priori; you might want to employ them complementary to your simulations. In the case of post hoc analysis you use your fitted model to your F-values, degrees of freedom, etc. and make the calculation for your particular model. In the case of a priori power analysis you can use calculations based on the analytical formulas where you constructed an exemplary dataset based on a hypothetical study. The book by GaƂecki & Burzykowski Linear Mixed-Effects Models Using R A Step-by-Step Approach (Chapt. 20) touched upon this matter in detail; but as the authors conclude: "In summary, the simulation approach to power calculations is attractive, due to its flexibility. It can prove especially useful if the impact of various mechanisms of missing values needs to be investigated.". As a final note I think that (for background at least) Lenth's paper: Some practical guidelines for effective sample size determination and MacCallum's et al. paper: Power analysis and determination of sample size for covariance structure modeling are a bit of a modern classics so you might want to consider giving them a quick look.

As mentioned by @mark999 (+1) the above statement while true do not really answer the question in a fully straightforward manner. The alternative to this power-analysis approach is to focus on some kind of $R^2$ for GLMMs. In that sense something in line of Nakagawa & Schielzeth's A general and simple method for obtaining R2 from generalized linear mixed-effects models is perfectly fine. To that extent you might want to also see Feingold, 2013: A Regression Framework for Effect Size Assessments in Longitudinal Modeling of Group Differences which seems to be concerned with roughly the same issues.

Related Question