Mixed Model – How to Calculate Linear Mixed Model R2 Using Statsmodels in Python

mixed modelpythonstatsmodels

I've ran a linear mixed model using statsmodels and obtained the follow result:

          Mixed Linear Model Regression Results
=========================================================
Model:            MixedLM Dependent Variable: voxel_value
No. Observations: 76      Method:             REML       
No. Groups:       6       Scale:              2.5874     
Min. group size:  7       Log-Likelihood:     -154.0848  
Max. group size:  16      Converged:          Yes        
Mean group size:  12.7                                   
---------------------------------------------------------
              Coef.  Std.Err.    z    P>|z| [0.025 0.975]
---------------------------------------------------------
Intercept     37.823    1.560  24.247 0.000 34.765 40.880
mb            -2.844    0.168 -16.934 0.000 -3.173 -2.515
sense         -5.888    0.333 -17.682 0.000 -6.541 -5.235
Group Var     10.747    4.454                            
=========================================================

I'm trying to implement the method of calculating marginal and conditional means suggested by Nakagawa & Schielzeth (2012).

For example to calculate marginal $R^2$:

$$\frac{var(f)}{var(f) + var(r) + var(e)}$$

With $var(f)$ being the variance in the model explained by the fixed effects, $var(r)$ being the variance explained by the random effects and $var(e)$ is the variance of the residuals.

I believe I can find $var(e)$ by using the np.var() on the array returned using the .resid method on my statsmodel result object. However I'm not sure how to get $var(f)$ and $var(r)$.

Any help would be greatly appreciated!

Edit:
Side question, im also looking to find the standardised coefficients of each variable. However using:

zscored_df = current_df.select_dtypes(include=[np.number]).apply(zscore, ddof=1)

to normalise the data produces different z-values when running the mixed model.

Edit 2:
After reading through the statsmodels docs some more I found out fittedvalues reflected values fitted using both the fixed and random effects. So to get var(f) you should instead use r.predict.var() as the predict method only uses the fixed effects.

So the final code for calculating both marginal and condtional $R^2$ is:

var_resid = result.scale
var_random_effect = float(result.cov_re.iloc[0])
var_fixed_effect = result.predict(current_df).var()

total_var = var_fixed_effect + var_random_effect + var_resid
marginal_r2 = var_fixed_effect / total_var
conditional_r2 = (var_fixed_effect + var_random_effect) / total_var

Best Answer

I think the best way to get var(e) is to use the scale parameter (2.5874, or r.scale where r is the fitted model). However this should be very close to taking r.resid.var() as you suggested. To get var(f) use r.fittedvalues.var(). To get var(r) use r.cov_re or just copy the number out of the summary table above (10.747).

Regarding the Z-scores, I'm not sure where the zscore function you use is defined, but I suspect that it may subtract the mean first then divide by the standard deviation. In this case, it is expected that the significance levels of the coefficients will change. However if you only divide by the standard deviation the Z-scores will not change.