Solved – Understand Summary from Statsmodels’ MixedLM function

I am using MixedLM to fit a repeated-measures model to this data, in an effort to determine whether any of the treatment time points is significantly different from the others.

I get the following summary, and I have also plotted the data, for ease of overview (the error bars report the .95 confidence interval for the mean).

          Mixed Linear Model Regression Results
==========================================================
Model:              MixedLM  Dependent Variable:  t       
No. Observations:   25       Method:              REML    
No. Groups:         5        Scale:               0.9429  
Min. group size:    5        Likelihood:          -31.8145
Max. group size:    5        Converged:           Yes     
Mean group size:    5.0                                   
----------------------------------------------------------
                Coef.  Std.Err.   z    P>|z| [0.025 0.975]
----------------------------------------------------------
Intercept        1.192    0.434  2.746 0.006  0.341  2.043
session[T.post] -0.141    0.614 -0.230 0.818 -1.345  1.062
session[T.t1]   -0.617    0.614 -1.004 0.315 -1.820  0.587
session[T.t2]   -1.171    0.614 -1.906 0.057 -2.374  0.033
session[T.t3]   -0.352    0.614 -0.574 0.566 -1.556  0.851
Intercept RE     0.000    0.197                           
==========================================================

There are a number of things I am unsure about:

Why does the table report z-values? I do not specify a population variance (and the function has no attribute to do so) – so shouldn't it be reporting t-statistics?
Are the p-values multiple comparison corrected – to account for the complexity of the model?
Why is the standard error for all but the intercept reported to be equal? In the figure clearly it is not.
I take it that the first condition (intercept) is tested against zero, while the others are tested against the first one? So based on the table (and a significance level of $p=0.05$) I could say that (a) my first condition has values significantly different from zero, but (b) no condition has values significantly different from the first (though t2 comes close)? Again, is all of this multiple comparison corrected?

Not least of all, I notice that if I choose e.g. the last session as the intercept, I get a table from which the conclusions detailed above are no longer as clear:

          Mixed Linear Model Regression Results
=========================================================
Model:             MixedLM  Dependent Variable:  t       
No. Observations:  25       Method:              REML    
No. Groups:        5        Scale:               0.9429  
Min. group size:   5        Likelihood:          -31.8145
Max. group size:   5        Converged:           Yes     
Mean group size:   5.0                                   
---------------------------------------------------------
               Coef.  Std.Err.   z    P>|z| [0.025 0.975]
---------------------------------------------------------
Intercept       1.051    0.434  2.420 0.015  0.200  1.902
session[T.pre]  0.141    0.614  0.230 0.818 -1.062  1.345
session[T.t1]  -0.475    0.614 -0.774 0.439 -1.679  0.728
session[T.t2]  -1.029    0.614 -1.676 0.094 -2.233  0.174
session[T.t3]  -0.211    0.614 -0.344 0.731 -1.415  0.992
Intercept RE    0.000    0.197                           
=========================================================

Shouldn't this be a problem? I mean, one would expect a statistical analysis on categorical data to give the same results, regardless how he categories are ordered on input.

> summary(rslt) Linear mixed model fit by REML ['lmerMod'] Formula: t ~ session + (1 | subject) Data: da REML criterion at convergence: 63.6 Scaled residuals: Min 1Q Median 3Q Max -2.00334 -0.48625 -0.02747 0.59021 1.63447 Random effects: Groups Name Variance Std.Dev. subject (Intercept) 3.564e-20 1.888e-10 Residual 9.429e-01 9.710e-01 Number of obs: 25, groups: subject, 5 Fixed effects: Estimate Std. Error t value (Intercept) 1.1923 0.4343 2.746 sessionpost -0.1412 0.6141 -0.230 sessiont1 -0.6167 0.6141 -1.004 sessiont2 -1.1706 0.6141 -1.906 sessiont3 -0.3525 0.6141 -0.574 Correlation of Fixed Effects: (Intr) sssnps sssnt1 sssnt2 sessionpost -0.707 sessiont1 -0.707 0.500 sessiont2 -0.707 0.500 0.500 sessiont3 -0.707 0.500 0.500 0.500

Best Answer

In a balanced model like this, the standard errors of the fixed intercepts will be always be equal to each other. The values under "z" in the summary table are the parameter estimates divided by their standard errors. The p-values are calculated with respect a standard normal distribution.

None of the inferential results are corrected for multiple comparisons. Under statsmodels.stats.multicomp and statsmodels.stats.multitest there are some tools for doing that.

In case it helps, below is the equivalent R code, and below that I have included the fitted model summary output from R. You will see that everything agrees with what you got from statsmodels.MixedLM.

library(lme4)

da = read.csv("MixedLM_data.csv")

da$session = as.factor(da$session)
da$session = relevel(da$session, ref="pre")

rslt = lmer(t ~ session + (1 | subject), da)

Here is the output

Best Answer

Related Solutions

Solved – Difference in R-squared observed from statsmodels when WLS is used

Statsmodels – Adding Intercept in Logistic Regression

Related Question