Solved – Lsmeans output for clmm models (R)

confidence intervallogisticlsmeansordinal-datar

I've fitted a cumulative logit model, where the IV is categorical (different animation models being compared), the DV is ordinal (points on a 1-9 scale), and there are some random effects (subject, sentence). The animation models are rated across a bunch of sentences by every subject. I've largely been following this tutorial: http://rcompanion.org/handbook/G_07.html

First I turn the variables into unordered and ordered factors:

d\$score.f = factor(d\$score, ordered=TRUE)

d\$model.f = factor(d\$model, ordered=FALSE)

m = clmm(score.f ~ model.f + (1+model.f|sentence) + (1+model.f|subject), data = d)

Anova(m, type="II")

I got a significant p-value, so I then do the post-hoc test with lsmeans and Tukey: lsmeans(m, pairwise ~ model.f, adjust = "tukey")

The results are:

 model.f    lsmean        SE df  asymp.LCL asymp.UCL
 0       0.2503735 0.4297950 NA -0.5920091  1.092756
 1       0.8620336 0.3462029 NA  0.1834883  1.540579
 2       1.4808482 0.4786811 NA  0.5426504  2.419046

Confidence level used: 0.95 

$contrasts
 contrast   estimate        SE df z.ratio p.value
 0 - 1    -0.6116601 0.2591413 NA  -2.360  0.0479
 0 - 2    -1.2304747 0.2807095 NA  -4.383  <.0001
 1 - 2    -0.6188146 0.3124234 NA  -1.981  0.1170

I found through StackExchange that I could also do this to get confidence intervals: lsmeans(m, pairwise ~ model.f, adjust = "tukey", mode="mean.class")

It gives the following results:

 model.f mean.class        SE df asymp.LCL asymp.UCL
 0         5.465029 0.3372352 NA  4.804061  6.125998
 1         5.939538 0.2476488 NA  5.454156  6.424921
 2         6.387112 0.3248218 NA  5.750473  7.023751

Confidence level used: 0.95 

$contrasts
 contrast   estimate        SE df z.ratio p.value
 0 - 1    -0.4745091 0.2065156 NA  -2.298  0.0561
 0 - 2    -0.9220829 0.2018481 NA  -4.568  <.0001
 1 - 2    -0.4475738 0.2158877 NA  -2.073  0.0954

P value adjustment: tukey method for comparing a family of 3 estimates 

Now, to the actual question: The tutorial I followed was very clear on the fact that you should ignore the lsmean, SE, LCL, and UCL values of the lsmeans output. It would be nice to have some way to present the results other than just that there was a significant difference between 0 and 2 (and 0 and 1 in the first lsmeans output – why did the values change?), like a plot showing the least square means with the confidence intervals. One page on the tutorial website (http://rcompanion.org/handbook/G_06.html) did just that, but only with a non-clm model. Can you make such a plot with a clmm-model, like using the output from the second lsmeans call here with the mean.class and SE?

If not, what options do I have… I know how to calculate the probabilities for the different rating grades for the different models using the coefficients from the model (summary(m)), and could present it as one bar plot for each model or something, and the coefficients have std. error associated with them, but would calculating with these std. errors subtracted and added give the lowest ad highest end of the confidence intervals? Or calculate the expected value using the probabilities?

By the way, the data looks like this (so ten sentences, around 12 subjects, three models and a 1-9 rating scale):

subject,sentence,model,score
0,0,0,5
0,0,1,6
0,0,2,7
0,1,0,7
0,1,1,5
0,1,2,8
0,2,0,5
0,2,1,6
0,2,2,7

Best Answer

First, you should switch to the emmeans package, which is the continuation of lsmeans; the latter will be archived on CRAN soon.

Regarding the instruction to ignore certain results. I invite you to ask yourself what kind of statistical software would output results that should be ignored. The more likely possibility is that the writer of that advice does not understand the output so doesn't want to try to explain it, or disagrees with that method of analysis for an unstated reason (but why then show the methods at all?).

The default results of lsmeans() (or emmeans::emmeans()) are on a latent-variable scale; that latent model asserts that there is a continuous but unobservable response having a logistic distribution with a mean that depends on the predictors, and that there is also a set of cut points that define a set of intervals on the latent scale. In our data, the ordinal response corresponds to which interval each latent response falls in. This latent scale is not completely identifible, so any linear function of that scale is equally valid. The lsmeans() output picks one version of that, and its results are meaningful relative to that scale.

Moreover, the results with mode = "mean.class" are completely interpretable on an absolute scale. Regarding the ordinal values as 1, 2, ..., the mean.scale predictions comprise the mean of the ordinal variable, based on the predicted probabilities of its ordinal distribution.

For an example detailing the analysis of a clm model, see the relevant emmeans vignette: https://cran.r-project.org/web/packages/emmeans/vignettes/sophisticated.html#ordinal. Documentation for the different options available for ordinal models is provided in https://cran.r-project.org/web/packages/emmeans/vignettes/models.html#O. There is also an index of vignette topics where you may find yet more information.

Related Question