My Question is about the output of the lsmeans() and contrast() functions.
Can somebody please explain to me what is the meaning of "lsmeans", "rate", "rate.ratio" And especially "estimate"?
- As far as I know "lsmeans" is a mean estimated from a linear model
taking covariates into account. Hence, not simply the group average!
However, with only one factor, it is possibly simply the mean of
each group-level. - "rate" is the back transformed "lsmean" in poisson models, and
should in this case be average counts of species for each group
level. - "rate.ratio" appears only when pairwise comparisons are made and is
more dificult to understand. These are odds ratios. I think odds
ratios are very difficult to understand. It should be differences on
the logit scale (poisson model) and then back-transformed. Probably
this means it is also in units of counts, and hence the differences
in the count values. - Last but not least we have "estimates". These always appear when
using different contrast methods, i.e. "effects", "trt.vs.ctrl",….
What do they mean? And what does it mena when they are negative, Is
then the LHS less than the RHS? or the other way round?
I'm not a statisitcian, and what I have written here is pretty much sure to be wrong, but where is the source for getting all these informations about modern statistics? I hope you are the source!
I love this package, and I'm very thankful that it is available!!
Examples of different Outputs (I shortened a bit […]):
> lsm <- lsmeans::lsmeans(poisonGLM_1, ~ Factor1);lsm
Factor1 lsmean SE df asymp.LCL asymp.UCL
A_Cm 3.765069 0.06213535 NA 3.643286 3.886852
B_Sp_young 3.295837 0.11111111 NA 3.078063 3.513611
[....]
Confidence level used: 0.95
> summary(lsm, type="response")
Factor1 rate SE df asymp.LCL asymp.UCL
A_Cm 43.16667 2.682176 NA 38.21719 48.75714
B_Sp_young 27.00000 3.000000 NA 21.71630 33.56926
[....]
Confidence level used: 0.95
> summary(pairs(lsm), type="response")
contrast rate.ratio SE df z.ratio p.value
A_Cm - B_Sp_young 1.5987654 0.20353032 NA 3.6858954 0.0021
[....]
P value adjustment: tukey method for comparing a family of 5 estimates
Tests are performed on the linear-predictor scale
> summary(regrid(pairs(lsm)), type="reponse")
contrast rate.ratio SE df z.ratio p.value
A_Cm - B_Sp_young 1.5987654 0.20353032 NA 7.855171 <.0001
[....]
P value adjustment: tukey method for comparing a family of 5 estimates
> lsm2 <- contrast(lsm, "trt.vs.ctrl");lsm2
contrast estimate SE df z.ratio p.value
B_Sp_young - A_Cm -0.46923173 0.1273047 NA -3.6858954 0.0009
C_Sp_int1 - A_Cm 0.10613242 0.1039483 NA 1.0210117 0.6703
D_Sp_int2 - A_Cm 0.07796154 0.1048983 NA 0.7432105 0.8305
E_Sp_old - A_Cm -0.60100101 0.1339601 NA -4.4864179 <.0001
P value adjustment: dunnettx method for 4 tests
I'm probably the only one, but for me as a non-statistician it would be very helpful in general to have detailed explanations of the "output" of functions, included in the help files. The "details" section is soo often not very helpful. If anybody knows about a source of such explanations, please let me know.
Best Answer
I'm glad you love the package; but it reminds me of a letter I read once in the newspaper column Hints for Heloise, in which the writer said that she had a new dishwasher and loves it, but it doesn't get her dishes clean. It made me wonder what exactly she loves about it, and whether she feels the same way about her husband.
As the developer of lsmeans, I assure you that a lot more effort has gone into documentation than coding. Documentation is much harder and less fun to write, especially within the format required for R packages. And I agree that R help pages can be hard to read. In writing those, my main objective (which I believe is in line with their intent) is to document in detail how each argument of each function works; and as the package has become more complex, those details get messier and messier.
But R also provides for vignettes, which can be more informal and expository. And I put a lot of effort into those as well. I wonder if you are not aware of them. In particular, if you load the package and do
you'll get a PDF document that has a lot of information and exposition. I believe the answer to your question (1) (what is an LS mean) is very clearly answered in the first couple of sections, and I would like to ask you to read that rather than try to summarize it here. But let me know if it's confusing.
Now for the other questions. First,
estimate
is just a generic term, quite common in statistics. You ask for certain contrasts, and the program prints estimates of those contrasts. In your question, you ask for contrasts that compare each treatment with a control; the true values of those comparisons are unknown, and what we can obtain from the data are estimates of those parameters. In your example, the LS means are on the log scale, and the comparisons are thus differences of two logarithms.In dealing with transformations and link functions, I tried to put generic labels in the output that might help with interpretation when results are back-transformed. The labels "rate" and "rate ratio" are a result of that. In a Poisson model, the mean is often referred to as a rate, because the probability model is concerned with the number of events per unit of time or space, and those events occur at a particular mean rate. In any case, that label rate was supposed to emphasize the fact that the LS means were back-transformed to the original scale of counts, in which we are estimating (that word again) the rate of occurrence.
When numbers are on a log scale, there is a math result that says that $\log a - \log b = \log(a/b)$; i.e., the log of the ratio of $a$ and $b$. In your example, the contrasts you computed are differences of logs, so when they are back-transformed, you obtain estimates (!) of the ratios of two rates. Hence the label "rate.ratio".
Also, I must emphasize that the above ratios are not odds ratios. Odds are quantities of the form $p/(1-p)$, and log-odds are termed "logits". If you have a model with binomial data, estimation is often on the logit scale, and the lsmeans package in fact labels back-transformed differences from a logit model as "odds ratios". They area different animal, and "logit" is not a synonym for "log."
I hope these answers are helpful. I also hope you'll read more of that vignette.