The question seems to rely on a mistaken notion$^\dagger$.
A generalized linear model (GLM) does not in general assume constant variance.
Instead, there's an assumed variance function, $v(\mu)$, that relates the variance to the mean, $\text{Var}(Y_i)= \phi\,v(\mu_i)$, based on the particular distribution family in the exponential-family class of distributions.
So in the generalized linear model, interest focuses not on constant variance, but on a correctly specified variance function*.
* as in any model, George Box's famous aphorism applies - so we don't generally believe a variance function to be exactly correct, just a close enough description that the resulting inferences will be good enough for our particular purposes.
As a result a formal test of correctly specified variance doesn't really make sense, since it's answering a question we already know the answer to (no, it's not exactly correct), and any sufficiently large sample would tell us so.
Further, and more practically, even with the potential for an incorrectly specified variance (to an extent where the effect is substantial), choosing your procedures on the basis of a formal test of assumptions may be less advisable than simply not making an assumption you're not comfortable with. In the case of normal models at least, a number of papers indicate that it's better not to use a procedure that doesn't assume constant variance.
$\\$
$\dagger$(or, just possibly, a difference from the usual terminology, in which case your intent should be made more explicit)
The problem you haveg is that are abusing (misusing) the formula notation. You never use a formula like this:
Periosna$Periostitis ~ Periosna$Cemetery
because, as you've found out it, it totally breaks the logic that you want to then use for subsequent operations.
You asked mcp()
to set up Tukey contrasts for the covariate Cemetery
, yet you fitted a model with covariate name Periosna$Cemetery
, hence it quite rightly told you that a variable of the stated name was not involved in the model fit.
Instead, the call should have been
glm(Periostitis ~ Cemetery, data = Periosna, family = binomial)
noting that we use the variable names directly, as they exist in Periosna
, and we tell R where to locate the variables in the formula via the data
argument.
Now when you call glht()
it should be able to find the Cemetery
variable.
Best Answer
From what I see you are trying to use
glm.nb
which is a modification of the system functionglm()
. Such function includes estimation of the additional parameter, theta, for a Negative Binomial generalized linear model.Are you sure that it is necessary? I do not know the exact details, but please consider the using of "classical"
glm
.I think you should:
construct a full model including interactions (one dependent variable and many explanatory variables)
try to simplify the model into MAM (minimal adequate model) by removing non-significant interactions (or main effects). This link might help you in this procedure: How to get only desirable comparisons from post-hoc
When you have MAM next step is to take a look inside the model to see which group(s) are different from each other. You may either make all possible comparisons (this should work well if you have orthogonal design in you data), or see only comparisons in which you are interested by setting contrasts (see again the link given above).
R output in your question suggests that state2, 3 and 4 are all different from state1. So your interpretation is correct. This is (most probably) due to default setting of "treatment" contrast in your model which compares only first group with each other (how to change such contrast see again the given link above).
If you need any additional help, please do not hesitate to ask.