Solved – lsmeans output “rate”, “estimate”,

lsmeans

My Question is about the output of the lsmeans() and contrast() functions.
Can somebody please explain to me what is the meaning of "lsmeans", "rate", "rate.ratio" And especially "estimate"?

  1. As far as I know "lsmeans" is a mean estimated from a linear model
    taking covariates into account. Hence, not simply the group average!
    However, with only one factor, it is possibly simply the mean of
    each group-level.
  2. "rate" is the back transformed "lsmean" in poisson models, and
    should in this case be average counts of species for each group
    level.
  3. "rate.ratio" appears only when pairwise comparisons are made and is
    more dificult to understand. These are odds ratios. I think odds
    ratios are very difficult to understand. It should be differences on
    the logit scale (poisson model) and then back-transformed. Probably
    this means it is also in units of counts, and hence the differences
    in the count values.
  4. Last but not least we have "estimates". These always appear when
    using different contrast methods, i.e. "effects", "trt.vs.ctrl",….
    What do they mean? And what does it mena when they are negative, Is
    then the LHS less than the RHS? or the other way round?

I'm not a statisitcian, and what I have written here is pretty much sure to be wrong, but where is the source for getting all these informations about modern statistics? I hope you are the source!
I love this package, and I'm very thankful that it is available!!

Examples of different Outputs (I shortened a bit […]):

> lsm <- lsmeans::lsmeans(poisonGLM_1, ~ Factor1);lsm

  Factor1    lsmean         SE df asymp.LCL asymp.UCL
  A_Cm       3.765069 0.06213535 NA  3.643286  3.886852
  B_Sp_young 3.295837 0.11111111 NA  3.078063  3.513611
  [....]
  Confidence level used: 0.95 

> summary(lsm, type="response")

  Factor1      rate       SE df asymp.LCL asymp.UCL
  A_Cm       43.16667 2.682176 NA  38.21719  48.75714
  B_Sp_young 27.00000 3.000000 NA  21.71630  33.56926
  [....]   
  Confidence level used: 0.95 

> summary(pairs(lsm), type="response")

  contrast               rate.ratio         SE df    z.ratio p.value
  A_Cm - B_Sp_young       1.5987654 0.20353032 NA  3.6858954  0.0021
  [....]
  P value adjustment: tukey method for comparing a family of 5 estimates 
  Tests are performed on the linear-predictor scale 

> summary(regrid(pairs(lsm)), type="reponse")

  contrast               rate.ratio         SE df  z.ratio p.value
  A_Cm - B_Sp_young       1.5987654 0.20353032 NA 7.855171  <.0001
  [....]
  P value adjustment: tukey method for comparing a family of 5 estimates 

> lsm2 <- contrast(lsm, "trt.vs.ctrl");lsm2

  contrast             estimate        SE df    z.ratio p.value
  B_Sp_young - A_Cm -0.46923173 0.1273047 NA -3.6858954  0.0009
  C_Sp_int1 - A_Cm   0.10613242 0.1039483 NA  1.0210117  0.6703
  D_Sp_int2 - A_Cm   0.07796154 0.1048983 NA  0.7432105  0.8305
  E_Sp_old - A_Cm   -0.60100101 0.1339601 NA -4.4864179  <.0001
  P value adjustment: dunnettx method for 4 tests 

I'm probably the only one, but for me as a non-statistician it would be very helpful in general to have detailed explanations of the "output" of functions, included in the help files. The "details" section is soo often not very helpful. If anybody knows about a source of such explanations, please let me know.

Best Answer

I'm glad you love the package; but it reminds me of a letter I read once in the newspaper column Hints for Heloise, in which the writer said that she had a new dishwasher and loves it, but it doesn't get her dishes clean. It made me wonder what exactly she loves about it, and whether she feels the same way about her husband.

As the developer of lsmeans, I assure you that a lot more effort has gone into documentation than coding. Documentation is much harder and less fun to write, especially within the format required for R packages. And I agree that R help pages can be hard to read. In writing those, my main objective (which I believe is in line with their intent) is to document in detail how each argument of each function works; and as the package has become more complex, those details get messier and messier.

But R also provides for vignettes, which can be more informal and expository. And I put a lot of effort into those as well. I wonder if you are not aware of them. In particular, if you load the package and do

vignette("using-lsmeans")

you'll get a PDF document that has a lot of information and exposition. I believe the answer to your question (1) (what is an LS mean) is very clearly answered in the first couple of sections, and I would like to ask you to read that rather than try to summarize it here. But let me know if it's confusing.

Now for the other questions. First, estimate is just a generic term, quite common in statistics. You ask for certain contrasts, and the program prints estimates of those contrasts. In your question, you ask for contrasts that compare each treatment with a control; the true values of those comparisons are unknown, and what we can obtain from the data are estimates of those parameters. In your example, the LS means are on the log scale, and the comparisons are thus differences of two logarithms.

In dealing with transformations and link functions, I tried to put generic labels in the output that might help with interpretation when results are back-transformed. The labels "rate" and "rate ratio" are a result of that. In a Poisson model, the mean is often referred to as a rate, because the probability model is concerned with the number of events per unit of time or space, and those events occur at a particular mean rate. In any case, that label rate was supposed to emphasize the fact that the LS means were back-transformed to the original scale of counts, in which we are estimating (that word again) the rate of occurrence.

When numbers are on a log scale, there is a math result that says that $\log a - \log b = \log(a/b)$; i.e., the log of the ratio of $a$ and $b$. In your example, the contrasts you computed are differences of logs, so when they are back-transformed, you obtain estimates (!) of the ratios of two rates. Hence the label "rate.ratio".

Also, I must emphasize that the above ratios are not odds ratios. Odds are quantities of the form $p/(1-p)$, and log-odds are termed "logits". If you have a model with binomial data, estimation is often on the logit scale, and the lsmeans package in fact labels back-transformed differences from a logit model as "odds ratios". They area different animal, and "logit" is not a synonym for "log."

I hope these answers are helpful. I also hope you'll read more of that vignette.

Related Question