Solved – Understand Summary from Statsmodels’ MixedLM function

mixed modelpythonregressionstatsmodels

I am using MixedLM to fit a repeated-measures model to this data, in an effort to determine whether any of the treatment time points is significantly different from the others.

I get the following summary, and I have also plotted the data, for ease of overview (the error bars report the .95 confidence interval for the mean).

          Mixed Linear Model Regression Results
==========================================================
Model:              MixedLM  Dependent Variable:  t       
No. Observations:   25       Method:              REML    
No. Groups:         5        Scale:               0.9429  
Min. group size:    5        Likelihood:          -31.8145
Max. group size:    5        Converged:           Yes     
Mean group size:    5.0                                   
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        1.192    0.434  2.746 0.006  0.341  2.043
session[T.post] -0.141    0.614 -0.230 0.818 -1.345  1.062
session[T.t1]   -0.617    0.614 -1.004 0.315 -1.820  0.587
session[T.t2]   -1.171    0.614 -1.906 0.057 -2.374  0.033
session[T.t3]   -0.352    0.614 -0.574 0.566 -1.556  0.851
Intercept RE     0.000    0.197                           
==========================================================

enter image description here

There are a number of things I am unsure about:

  • Why does the table report z-values? I do not specify a population variance (and the function has no attribute to do so) – so shouldn't it be reporting t-statistics?
  • Are the p-values multiple comparison corrected – to account for the complexity of the model?
  • Why is the standard error for all but the intercept reported to be equal? In the figure clearly it is not.
  • I take it that the first condition (intercept) is tested against zero, while the others are tested against the first one? So based on the table (and a significance level of $p=0.05$) I could say that (a) my first condition has values significantly different from zero, but (b) no condition has values significantly different from the first (though t2 comes close)? Again, is all of this multiple comparison corrected?

Not least of all, I notice that if I choose e.g. the last session as the intercept, I get a table from which the conclusions detailed above are no longer as clear:

          Mixed Linear Model Regression Results
=========================================================
Model:             MixedLM  Dependent Variable:  t       
No. Observations:  25       Method:              REML    
No. Groups:        5        Scale:               0.9429  
Min. group size:   5        Likelihood:          -31.8145
Max. group size:   5        Converged:           Yes     
Mean group size:   5.0                                   
---------------------------------------------------------
               Coef.  Std.Err.   z    P>|z| [0.025 0.975]
---------------------------------------------------------
Intercept       1.051    0.434  2.420 0.015  0.200  1.902
session[T.pre]  0.141    0.614  0.230 0.818 -1.062  1.345
session[T.t1]  -0.475    0.614 -0.774 0.439 -1.679  0.728
session[T.t2]  -1.029    0.614 -1.676 0.094 -2.233  0.174
session[T.t3]  -0.211    0.614 -0.344 0.731 -1.415  0.992
Intercept RE    0.000    0.197                           
=========================================================
  • Shouldn't this be a problem? I mean, one would expect a statistical analysis on categorical data to give the same results, regardless how he categories are ordered on input.

Best Answer

In a balanced model like this, the standard errors of the fixed intercepts will be always be equal to each other. The values under "z" in the summary table are the parameter estimates divided by their standard errors. The p-values are calculated with respect a standard normal distribution.

None of the inferential results are corrected for multiple comparisons. Under statsmodels.stats.multicomp and statsmodels.stats.multitest there are some tools for doing that.

In case it helps, below is the equivalent R code, and below that I have included the fitted model summary output from R. You will see that everything agrees with what you got from statsmodels.MixedLM.

library(lme4)

da = read.csv("MixedLM_data.csv")

da$session = as.factor(da$session)
da$session = relevel(da$session, ref="pre")

rslt = lmer(t ~ session + (1 | subject), da)

Here is the output

> summary(rslt)
Linear mixed model fit by REML ['lmerMod']
Formula: t ~ session + (1 | subject)
   Data: da

REML criterion at convergence: 63.6

Scaled residuals:
     Min       1Q   Median       3Q      Max 
-2.00334 -0.48625 -0.02747  0.59021  1.63447

Random effects:
 Groups   Name        Variance  Std.Dev.
 subject  (Intercept) 3.564e-20 1.888e-10
 Residual             9.429e-01 9.710e-01
Number of obs: 25, groups:  subject, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   1.1923     0.4343   2.746
sessionpost  -0.1412     0.6141  -0.230
sessiont1    -0.6167     0.6141  -1.004
sessiont2    -1.1706     0.6141  -1.906
sessiont3    -0.3525     0.6141  -0.574

Correlation of Fixed Effects:
            (Intr) sssnps sssnt1 sssnt2
sessionpost -0.707
sessiont1   -0.707  0.500
sessiont2   -0.707  0.500  0.500
sessiont3   -0.707  0.500  0.500  0.500
Related Question