Solved – Why CLMM function for ordinal mixed logistic regression changes the means

lsmeansmixed modelordered-logit

I am using CLMM to run the ordinal mixed logistic regression model as the DV is ordinal number from 1 to 9 (rating scale). First I read the file and change the DV into ordinal using these commands:

> data1.frame <- read.delim("happy.txt", fileEncoding="UTF-16")
> data1.frame$response <- ordered(data1.frame$response)

Then I run CLMM function:

> mm1 <- clmm (response ~ group + (1|listeners),data=data1.frame)

And because I have three groups and I would like to see all pair contrasts, I run Tukey's pairwise comparison:

> lsmeans(mm1, pairwise~group, adjust="tukey")
$lsmeans
 group        lsmean        SE df asymp.LCL asymp.UCL
 english -0.63348352 0.4555165 NA -1.526388 0.2594206
 L2      -0.01566743 0.4424304 NA -0.882920 0.8515852
 thai    -0.39563590 0.4546666 NA -1.286874 0.4956022

Confidence level used: 0.95 

$contrasts
 contrast         estimate        SE df   z.ratio p.value
 english - L2   -0.6178161 0.1564433 NA -3.949137  0.0002
 english - thai -0.2378476 0.1873555 NA -1.269499  0.4125
 L2 - thai       0.3799685 0.1538963 NA  2.468991  0.0362

P value adjustment: tukey method for a family of 3 means

However, as you can see, in the 'lsmean' column, the mean of each group change into minus zero instead of something from 1 to 9. My question is: is this common when I change the DV from 1 to 9 into ordinal numbers?

If yes, it seems like when plotting the graph, I have to use the actual means of each group, rather than relying on the means provided by this pairwise comparison.

Best Answer

The default output from lsmeans is on the latent-variable scale -- a bit hard to explain but one way to think of it is that the common model involves a linear predictor for the logit of the cumulative probabilities, and the latent value is the average of that linear prediction of each grid value across cut points.

If you want the predicted average class number on the 1-9 scale, it's easy to get:

lsmeans(..., mode = "mean.class")

For more details, see ? models with lsmeans loaded.

Related Solutions

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

My view is that the $F$ test of statistical significance of the interaction effect is less important than the subjective nature of the interaction, as evidenced by the plot. The plot tells me that it is reasonably sensible to compare the overall averages of Depression and Top, but it'd be silly to compare those averages with the overall average of Slope -- whether or not these comparisons are statistically significant. Basically, I'd say to avoid doing comparisons that don't make sense -- so my advice is do not ignore the warning note in this case. If the curve for Top were fairly parallel with the other two, that's when you could ignore it.

In general, I suggest looking at enough plots that you can tell what's going on, and then restrict your post-hoc testing to things that are sensible.

Since P is continuous, you're really fitting straight lines (they look curved because you chose unequally spaced points). You can compare the slopes of these lines:

R> lstrends(Dens.LMER, pairwise ~ Contour, var = "P")

$lstrends
 Contour        P.trend          SE    df    lower.CL     upper.CL
 Depression -0.00681143 0.004901195 39.68 -0.01671957  0.003096714
 Slope      -0.03376293 0.010533875 41.88 -0.05502295 -0.012502911
 Top        -0.01306992 0.010499548 41.97 -0.03425936  0.008119525

Confidence level used: 0.95 

$contrasts
 contrast               estimate         SE    df t.ratio p.value
 Depression - Slope  0.026951501 0.01161827 42.00   2.320  0.0639
 Depression - Top    0.006258486 0.01158716 41.81   0.540  0.8520
 Slope - Top        -0.020693015 0.01487290 41.99  -1.391  0.3545

P value adjustment: tukey method for a family of 3 tests

The comparison between the shallowest and largest slopes has an adjusted $P$ value of about $.06$.

Solved – lsmeans output “rate”, “estimate”,

I'm glad you love the package; but it reminds me of a letter I read once in the newspaper column Hints for Heloise, in which the writer said that she had a new dishwasher and loves it, but it doesn't get her dishes clean. It made me wonder what exactly she loves about it, and whether she feels the same way about her husband.

As the developer of lsmeans, I assure you that a lot more effort has gone into documentation than coding. Documentation is much harder and less fun to write, especially within the format required for R packages. And I agree that R help pages can be hard to read. In writing those, my main objective (which I believe is in line with their intent) is to document in detail how each argument of each function works; and as the package has become more complex, those details get messier and messier.

But R also provides for vignettes, which can be more informal and expository. And I put a lot of effort into those as well. I wonder if you are not aware of them. In particular, if you load the package and do

vignette("using-lsmeans")

you'll get a PDF document that has a lot of information and exposition. I believe the answer to your question (1) (what is an LS mean) is very clearly answered in the first couple of sections, and I would like to ask you to read that rather than try to summarize it here. But let me know if it's confusing.

Now for the other questions. First, estimate is just a generic term, quite common in statistics. You ask for certain contrasts, and the program prints estimates of those contrasts. In your question, you ask for contrasts that compare each treatment with a control; the true values of those comparisons are unknown, and what we can obtain from the data are estimates of those parameters. In your example, the LS means are on the log scale, and the comparisons are thus differences of two logarithms.

In dealing with transformations and link functions, I tried to put generic labels in the output that might help with interpretation when results are back-transformed. The labels "rate" and "rate ratio" are a result of that. In a Poisson model, the mean is often referred to as a rate, because the probability model is concerned with the number of events per unit of time or space, and those events occur at a particular mean rate. In any case, that label rate was supposed to emphasize the fact that the LS means were back-transformed to the original scale of counts, in which we are estimating (that word again) the rate of occurrence.

When numbers are on a log scale, there is a math result that says that $\log a - \log b = \log(a/b)$; i.e., the log of the ratio of $a$ and $b$. In your example, the contrasts you computed are differences of logs, so when they are back-transformed, you obtain estimates (!) of the ratios of two rates. Hence the label "rate.ratio".

Also, I must emphasize that the above ratios are not odds ratios. Odds are quantities of the form $p/(1-p)$, and log-odds are termed "logits". If you have a model with binomial data, estimation is often on the logit scale, and the lsmeans package in fact labels back-transformed differences from a logit model as "odds ratios". They area different animal, and "logit" is not a synonym for "log."

I hope these answers are helpful. I also hope you'll read more of that vignette.

Best Answer

Related Solutions

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

Solved – lsmeans output “rate”, “estimate”,

Related Question