Solved – Interpreting lsmeans values from a mixed model with offset

lme4-nlmelsmeansoffsetr

I am using a generalized Linear Mixed-Effects model to look at the effects of different treatments on a density of trichomes.

The model is :

fitPoisson = glmer(Count_trichomes ~ Treatment1*Treatment2*Treatment3 + 
                         (1 | Block/Code) + offset(log(Length)), family=poisson(), data=dataset)

Treatment 1 and 2 has 2 levels (0 and 1) and Treatment 3 has 3 levels (0,1,2). Block accounts for the replicates and Code, for each individual. Length is in cm.

An anova(fitPoisson) told me that treatments 1 and 3 are significant and that there is no interactions. What I want now is to know what the density is for each level of treatments.

So I used a lsmeans to look at the differences :

    > lsmeans(fitPoisson, ~ Treatment1)

     Treatment1   lsmean         SE df asymp.LCL asymp.UCL
     0           5.309106 0.06113705 NA  5.189280  5.428933
     1           5.471452 0.06114033 NA  5.351619  5.591285

     Results are averaged over the levels of: Treatment2, Treatment3
     Results are given on the log (not the response) scale. 
     Confidence level used: 0.95

I can see that the density of level 0 is lower than the density of level 1, but I dont understand what are the units used. It doesn't seems like it is for trichomes/cm, since the mean for level 0 is 107 trichomes/cm and the mean for level 1 is 131 trichomes/cm (calculated in excel).

When I transform back from the log scale, it gives me :

    > summary(lsmeans(fitPoisson, ~ Treatment1), type = "response")

     Treatment1   rate       SE df asymp.LCL asymp.UCL
     0           202.1694 12.36004 NA  179.3393  227.9058
     1           237.8053 14.53949 NA  210.9496  268.0799

    Results are averaged over the levels of: Treatment2, Treatment3 
    Confidence level used: 0.95 
    Intervals are back-transformed from the log scale

Which is still far from the means I found in excel.

Maybe I just don't understand the information lsmeans is giving me, or I am not using the right function.

Best Answer

If you do

lsmeans(fitPoisson, ~ Treatment1 * Treatment2 * Treatment3)

you will see the predictions (on the log scale) made by the model for each combination of the three factors. The results you show in your question are the averages of these predictions, averaged (with equal weights) over the levels of Treatment2 and Treatment3. The results with type = "response" are the antilogs of these results, and the standard errors are obtained using the delta method. Note that the averaging is still done on the log scale, and the confidence intervals are computed on the log scale, then back-transformed.

If you want the averaging to be done on the raw count scale, that is possible too. Do:

rg = regrid(ref.grid(fitPoisson), transform = TRUE)
lsmeans(rg, ~ Treatment1)

(The regrid function creates a new reference grid for the model based on the back-transformed predictions. You can use summary(rg) to see the individual predictions.)

These results can differ markedly from the raw averages you obtained from the data when there is imbalance in the data, so that the raw averages give far from equal weights to the levels of those two factors.

Related Solutions

Mixed Model – How to Calculate Estimated Proportions and Their Confidence Intervals

First a note: you can't calculate a decent standard error on the probabilities, you have to do so on a logit scale and use those to construct your confidence intervals. Intervals around probabilities are hardly symmetrical, and definitely not when using a mixed model.

You can easily plot the effects using the package effects:

With the function Effect() you can specify the effects you want to plot and plot immediately, or extract the information you want.

Some random fake data:

ndata <- data.frame(
  DV = sample(0:1,200,TRUE),
  Treatment1 = rep(rep(c('A1','B1'),25),4),
  Treatment2 = rep(rep(c('A2','B2'),each=25),4),
  Block = rep(c("Block1","Block2"),each=100)
  )

m <- glmer (DV ~ Treatment1 * Treatment2 + (1|Block/Treatment1),
           data=ndata, family = binomial)

To get a plot of the effects, you can simply do:

plot(allEffects(m))

To get:

enter image description here

The same you get with plot(Effect(c("Treatment1","Treatment2"),m)

If you want to get the actual data out, you can save the result of a call to Effect() in an object, and extract the necessary data:

est <- Effect(c("Treatment1","Treatment2"),m)
cbind(est$x,est$fit,est$se,est$lower,est$upper)

to get:

  Treatment1 Treatment2       est$fit    est$se  est$lower est$upper
1         A1         A2 -7.696104e-02 0.2775554 -0.6209597 0.4670376
2         B1         A2 -1.670541e-01 0.2896820 -0.7348204 0.4007122
3         A1         B2 -3.364722e-01 0.2927591 -0.9102694 0.2373250
4         B1         B2  1.110223e-16 0.2773501 -0.5435962 0.5435962

Note that these are on the original (logit) scale. Calculating a confidence interval would involve transforming this to the original scale, using eg. plogis() :

> cbind(est$x,plogis(est$fit),plogis(est$lower),plogis(est$upper))
  Treatment1 Treatment2 plogis(est$fit) plogis(est$lower) plogis(est$upper)
1         A1         A2       0.4807692         0.3495632         0.6146824
2         B1         A2       0.4583333         0.3241378         0.5988588
3         A1         B2       0.4166667         0.2869447         0.5590543
4         B1         B2       0.5000000         0.3673514         0.6326486

PS : this is not the cleanest code, it's just for illustrative purposes.

Solved – lsmeans output “rate”, “estimate”,

I'm glad you love the package; but it reminds me of a letter I read once in the newspaper column Hints for Heloise, in which the writer said that she had a new dishwasher and loves it, but it doesn't get her dishes clean. It made me wonder what exactly she loves about it, and whether she feels the same way about her husband.

As the developer of lsmeans, I assure you that a lot more effort has gone into documentation than coding. Documentation is much harder and less fun to write, especially within the format required for R packages. And I agree that R help pages can be hard to read. In writing those, my main objective (which I believe is in line with their intent) is to document in detail how each argument of each function works; and as the package has become more complex, those details get messier and messier.

But R also provides for vignettes, which can be more informal and expository. And I put a lot of effort into those as well. I wonder if you are not aware of them. In particular, if you load the package and do

vignette("using-lsmeans")

you'll get a PDF document that has a lot of information and exposition. I believe the answer to your question (1) (what is an LS mean) is very clearly answered in the first couple of sections, and I would like to ask you to read that rather than try to summarize it here. But let me know if it's confusing.

Now for the other questions. First, estimate is just a generic term, quite common in statistics. You ask for certain contrasts, and the program prints estimates of those contrasts. In your question, you ask for contrasts that compare each treatment with a control; the true values of those comparisons are unknown, and what we can obtain from the data are estimates of those parameters. In your example, the LS means are on the log scale, and the comparisons are thus differences of two logarithms.

In dealing with transformations and link functions, I tried to put generic labels in the output that might help with interpretation when results are back-transformed. The labels "rate" and "rate ratio" are a result of that. In a Poisson model, the mean is often referred to as a rate, because the probability model is concerned with the number of events per unit of time or space, and those events occur at a particular mean rate. In any case, that label rate was supposed to emphasize the fact that the LS means were back-transformed to the original scale of counts, in which we are estimating (that word again) the rate of occurrence.

When numbers are on a log scale, there is a math result that says that $\log a - \log b = \log(a/b)$; i.e., the log of the ratio of $a$ and $b$. In your example, the contrasts you computed are differences of logs, so when they are back-transformed, you obtain estimates (!) of the ratios of two rates. Hence the label "rate.ratio".

Also, I must emphasize that the above ratios are not odds ratios. Odds are quantities of the form $p/(1-p)$, and log-odds are termed "logits". If you have a model with binomial data, estimation is often on the logit scale, and the lsmeans package in fact labels back-transformed differences from a logit model as "odds ratios". They area different animal, and "logit" is not a synonym for "log."

I hope these answers are helpful. I also hope you'll read more of that vignette.

Best Answer

Related Solutions

Mixed Model – How to Calculate Estimated Proportions and Their Confidence Intervals

Solved – lsmeans output “rate”, “estimate”,

Related Question