Solved – Comparing lsmeans of levels from different factors

lme4-nlmelsmeansr

I have a glmer model that I am performing contrasts on. My model has four factors (emotion, gender, filter, ageGroup). No interactions. I have been using lsmeans(x, spec, contr='pairwise')) (and) lsm within glht) to get lsmeans and to do pairwise contrasts within-factors. I am also interested in comparing the means from one factor against a level from a different factor.

> summary(m7.only_within)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
 Family: binomial  ( logit )

 Formula: FaceStimulus.RESP.binary ~ emotion + emotion.Sample + gender +  
    gender.Sample + ageGroup + ageGroup.Sample + filter + filter.Sample + (1 | Subject)

Fixed effects:
                             Estimate Std. Error z value Pr(>|z|)    
(Intercept)                 -0.823451   0.264442  -3.114  0.00185 ** 
emotiondisgust               1.504896   0.092532  16.264  < 2e-16 ***
emotionfear                  1.555492   0.091832  16.938  < 2e-16 ***
emotion.Sample              -0.106948   0.044364  -2.411  0.01592 *  
gendermale                   0.012718   0.074198   0.171  0.86391    
gender.Sample                0.007021   0.010046   0.699  0.48462    
ageGroupold                  0.364239   0.076379   4.769 1.85e-06 ***
ageGroup.Sample             -0.013519   0.009577  -1.412  0.15808    
filtercropAdaptiveThreshold  0.419095   0.075906   5.521 3.37e-08 ***
filter.Sample                0.070147   0.029658   2.365  0.01802 *  
---

I understand that lsmeans calculates standard errors using variable variances and covariances from vcov() to do pairwise comparisons. Can I just compare one model variable against another? Like so:

summary(glht(m3.only_within, linfct = c("filtercropAdaptiveThreshold - ageGroupold = 0")))

Linear Hypotheses:
                                               Estimate Std. Error z value Pr(>|z|)    
filtercropAdaptiveThreshold - ageGroupold == 0  0.05486    0.10798   0.508        1

That contrast is the test of the difference between the betas in the model, not between the lsmeans. Naturally, all my other contrasts compare the lsmeans since I'm using lsm(). Can I just manually take the difference between the lsmeans, and then use the variances and covariance SE's from the model vcov for the two model effects to get the SE for the lsmeans difference? Naturally, those SEs are much lower than the SEs reported from the lsmeans output.

$lsmeans
 ageGroup     lsmean        SE df  asymp.LCL asymp.UCL
 young    0.01043529 0.2466025 NA -0.4728967 0.4937673
 old      0.37467465 0.2474967 NA -0.1104101 0.8597594

$lsmeans
 filter                     lsmean        SE df   asymp.LCL asymp.UCL
 cropDesat37           -0.01699274 0.2462214 NA -0.49957782 0.4655923
 cropAdaptiveThreshold  0.40210268 0.2478032 NA -0.08358268 0.8877880

So it would be:
$$
\displaystyle \begin{array}{l}
\bar{X_1}-\bar{X_2}= lsmean\_of\_cropAdaptiveThreshold – lsmean\_of\_old \\
= 0.40210268 – 0.37467465 = 0.0274\\ \\
SE_{\bar{X_1}-\bar{X_2}}= \sqrt{SE_1^2 + SE_2^2 + 2*cov}\\
= \sqrt{0.075906^2 + 0.076379^2 + 2*-3.225794e-05}\\
= 0.10798
\end{array}
$$

…which is the same SE as the SE in the beta-contrast.

Is it legal to compare cross-factor level lsmeans like this?

Edit 2021-07-26: I think what I probably conceptually wanted to do at the time I wrote this was compare the relative magnitude of the estimates. I'm pretty sure I could have done that by doing t-tests with the standards and their standard errors, doing some hand-waiving due to the repeated measures nature of the data. I certainly did not write this question that way though.

Best Answer

Let $\hat y_{ij}$ denote the prediction from your model at the $i$th level of ageGroup $(i=1,2)$ and the $j$th level of filter $(j=1,2)$. Then the LS means you seem to want to compare are $\hat y_{2\cdot} = (\hat y_{21}+\hat y_{22})/2$ and $\hat y_{\cdot 2} = (\hat y_{12}+\hat y_{22})/2$. Consequently, $$ \hat y_{\cdot 2} - \hat y_{2\cdot} = [(\hat y_{12}+\hat y_{22})/2] - [(\hat y_{21}+\hat y_{22})/2] = (\hat y_{12} - \hat y_{21})/2 $$ So what you think is a comparison of two marginal LS means is really one half of a diagonal comparison of two cell means. You can get it from lsmeans using code like this:

library(lsmeans)
m3.lsm = lsmeans(m3.only_within, c("ageGroup", "filter"))
contrast(m3.lsm, list(diag.comp = c(0, -1/2, 1/2, 0)))

I still, however, think you are going to confuse other people, if not yourself, by calling this a comparison of marginal LS means of different factors.

Related Solutions

Solved – Interpreting lsmeans values from a mixed model with offset

If you do

lsmeans(fitPoisson, ~ Treatment1 * Treatment2 * Treatment3)

you will see the predictions (on the log scale) made by the model for each combination of the three factors. The results you show in your question are the averages of these predictions, averaged (with equal weights) over the levels of Treatment2 and Treatment3. The results with type = "response" are the antilogs of these results, and the standard errors are obtained using the delta method. Note that the averaging is still done on the log scale, and the confidence intervals are computed on the log scale, then back-transformed.

If you want the averaging to be done on the raw count scale, that is possible too. Do:

rg = regrid(ref.grid(fitPoisson), transform = TRUE)
lsmeans(rg, ~ Treatment1)

(The regrid function creates a new reference grid for the model based on the back-transformed predictions. You can use summary(rg) to see the individual predictions.)

These results can differ markedly from the raw averages you obtained from the data when there is imbalance in the data, so that the raw averages give far from equal weights to the levels of those two factors.

Solved – Are LSMeans calculated from Predicted Values

I haven't heard the term LSmeans before, but from looking at the documentation of the lsmeans package, it looks like it simply computes predicted values from a regression.

Suppose you estimate the linear model $Y = \beta_1 X_1 + \beta_2 X_2 + \epsilon$ using OLS, to produce coefficient estimates $\hat{\beta}_1$ and $\hat{\beta}_2$. The LSmeans procedure simply computes the fitted value at some pre-specified values $\tilde{X}_1$ and $\tilde{X}_2$, as follows: $\tilde{Y} = \hat{\beta}_1 \tilde{X}_1 + \hat{\beta}_2 \tilde{X}_2$.

Using R's mtcars dataset, we can regress MPG on weight and an indicator for American-made. Then, we'll use the lsmeans package to get predicted values for both levels of American, holding weight at its mean.

> library(lsmeans)
> data(mtcars)
> mod = lm(mpg ~ wt + factor(am), mtcars)
> lsmeans.result = ref.grid(mod)
> summary(lsmeans.result)
     wt am prediction        SE df
3.21725  0   20.10022 0.8331837 29
3.21725  1   20.07660 1.0687077 29

We can reproduce the same prediction manually using predict:

> newdat = data.frame(wt = mean(mtcars$wt),
                      am = c(0, 1))
> manual.result = predict(mod, newdata = newdat)
> manual.result
       1        2 
20.10022 20.07660

Note that this matches the "prediction" column from the LSmeans result exactly.

Best Answer

Related Solutions

Solved – Interpreting lsmeans values from a mixed model with offset

Solved – Are LSMeans calculated from Predicted Values

Related Question