The problem you haveg is that are abusing (misusing) the formula notation. You never use a formula like this:
Periosna$Periostitis ~ Periosna$Cemetery
because, as you've found out it, it totally breaks the logic that you want to then use for subsequent operations.
You asked mcp()
to set up Tukey contrasts for the covariate Cemetery
, yet you fitted a model with covariate name Periosna$Cemetery
, hence it quite rightly told you that a variable of the stated name was not involved in the model fit.
Instead, the call should have been
glm(Periostitis ~ Cemetery, data = Periosna, family = binomial)
noting that we use the variable names directly, as they exist in Periosna
, and we tell R where to locate the variables in the formula via the data
argument.
Now when you call glht()
it should be able to find the Cemetery
variable.
Both of the lsmeans
statements you show generate lists of lsmobj
s, and each element of those lists is handled separately. If you want to incorporate an overall adjustment for two or more lists combined, it is technical and it takes a bit of work.
First, save the list:
lsmlist = lsmeans(warp.lm, list(pairwise ~ wool|tension,
pairwise ~ tension|wool))
This creates a list of 4 lsmobj
s (originally two lists of two)
> names(lsmlist)
[1] "lsmeans of wool | tension"
[2] "pairwise differences of contrast, tension | tension"
[3] "lsmeans of tension | wool"
[4] "pairwise differences of contrast, wool | wool"
The user wants to combine the three comparisons in lsmlist[[2]]
with the six in lsmlist[[4]]
and have an overall multiplicity adjustment for those 9 comparisons.
To start, create a new lsmobj
from one of the results, and fix it up.
mydiffs = lsmlist[[4]]
First, bind together the linear functions for the two sets of comparisons:
mydiffs@linfct = rbind(lsmlist[[2]]@linfct, lsmlist[[4]]@linfct)
We also need to define the grid
slot, which defines the factors associated with each linear function. To make it simple, I just define a factor named contrast
with 9 levels for the 9 contrasts (something fancier could be done here).
mydiffs@grid = data.frame(contrast = 1:9)
Finally, fix up the auxiliary info that does the bookkeeping. We now use our new contrast
factor as the only variable, with no "by" variables. For the combined family of 9 contrasts, the Tukey adjustment makes no sense so we use the multivariate $t$ ("mvt"
) method:
mydiffs = update(mydiffs, pri.vars = "contrast", by.vars = NULL,
adjust="mvt")
Now, we can look at the resulting summary:
> mydiffs
contrast estimate SE df t.ratio p.value
1 16.3333333 5.157299 48 3.167 0.0205
2 -4.7777778 5.157299 48 -0.926 0.9119
3 5.7777778 5.157299 48 1.120 0.8258
4 20.5555556 5.157299 48 3.986 0.0019
5 20.0000000 5.157299 48 3.878 0.0026
6 -0.5555556 5.157299 48 -0.108 1.0000
7 -0.5555556 5.157299 48 -0.108 1.0000
8 9.4444444 5.157299 48 1.831 0.3796
9 10.0000000 5.157299 48 1.939 0.3190
P value adjustment: mvt method for 9 tests
I could consider adding a feature for combining pieces of lsm.list
objects, but it is not straightforward -- primarily because of complications in obtaining meaningful labels for the results (the grid
part). It is also a problem that different users expect different defaults.
Best Answer
OK, let us dissect this model. First, the model itself:
This model has an underlying assumption that the error SD is homogeneous, and its estimated value is 10.94. With the default contrast coding (
"contr.treatment"
),We can observe these results in the output from
emmeans()
and its relatives. For (1), note that the first result below matches the intercept, in both the estimate and the standard error:For (2), the first, second, and fourth results below match the model summary in both estimate (with signs reversed) and standard error:
For (3), the first two results below match the model summary in both estimate and standard error.
Notice that within each set of results above, the standard errors stay constant within a table. That is a consequence of the fact that the design is balanced (equal numers of observations in each cell) and the model assumption of a constant error SD. If you fit a different model (using, say,
nlme::gls()
that allows for nonhomogeneous error variances, then you will get unequal SEs in both the model summary and the emmeans results.