Solved – Post-hoc after GLM

generalized linear model

I am running a GLM, using the function glm.nb (pscl package) trying to figure how what could influence a particular trait in several locations and years. The output as follow (with slight modification and removing the things beyond this question)

          Estimate Std. Error z value Pr(>|z|)    
          state2        -0.163190   0.807307  -0.202  0.00081    
          state3        -0.530588   0.783023  -0.678  0.00302 
          state4         0.942953   0.737328   1.279  0.00094    
          year2         -1.759102   0.733214  -2.399  0.01643 
          year3         -0.633870   0.662903  -0.956  0.33897

I understand the output as such: compared to year1, year 2 is statistically different, but year3 is not different from year1. but what about year2 and year3?

For the state factor, state2, state3 and state4 are all statistically different than state1. so I need a post-hoc test between state2 state3 and state4, am I right?

thanks

Best Answer

From what I see you are trying to use glm.nb which is a modification of the system function glm(). Such function includes estimation of the additional parameter, theta, for a Negative Binomial generalized linear model.

Are you sure that it is necessary? I do not know the exact details, but please consider the using of "classical" glm.

I think you should:

construct a full model including interactions (one dependent variable and many explanatory variables)
try to simplify the model into MAM (minimal adequate model) by removing non-significant interactions (or main effects). This link might help you in this procedure: How to get only desirable comparisons from post-hoc

When you have MAM next step is to take a look inside the model to see which group(s) are different from each other. You may either make all possible comparisons (this should work well if you have orthogonal design in you data), or see only comparisons in which you are interested by setting contrasts (see again the link given above).

R output in your question suggests that state2, 3 and 4 are all different from state1. So your interpretation is correct. This is (most probably) due to default setting of "treatment" contrast in your model which compares only first group with each other (how to change such contrast see again the given link above).

If you need any additional help, please do not hesitate to ask.

Related Solutions

Solved – GLM post hoc with non-parametric tests

The question seems to rely on a mistaken notion$^\dagger$.

A generalized linear model (GLM) does not in general assume constant variance.

Instead, there's an assumed variance function, $v(\mu)$, that relates the variance to the mean, $\text{Var}(Y_i)= \phi\,v(\mu_i)$, based on the particular distribution family in the exponential-family class of distributions.

So in the generalized linear model, interest focuses not on constant variance, but on a correctly specified variance function*.

* as in any model, George Box's famous aphorism applies - so we don't generally believe a variance function to be exactly correct, just a close enough description that the resulting inferences will be good enough for our particular purposes.

As a result a formal test of correctly specified variance doesn't really make sense, since it's answering a question we already know the answer to (no, it's not exactly correct), and any sufficiently large sample would tell us so.

Further, and more practically, even with the potential for an incorrectly specified variance (to an extent where the effect is substantial), choosing your procedures on the basis of a formal test of assumptions may be less advisable than simply not making an assumption you're not comfortable with. In the case of normal models at least, a number of papers indicate that it's better not to use a procedure that doesn't assume constant variance.

$\\$

$\dagger$(or, just possibly, a difference from the usual terminology, in which case your intent should be made more explicit)

Solved – Post-hoc test after a GLM with binary data

The problem you haveg is that are abusing (misusing) the formula notation. You never use a formula like this:

Periosna$Periostitis ~ Periosna$Cemetery

because, as you've found out it, it totally breaks the logic that you want to then use for subsequent operations.

You asked mcp() to set up Tukey contrasts for the covariate Cemetery, yet you fitted a model with covariate name Periosna$Cemetery, hence it quite rightly told you that a variable of the stated name was not involved in the model fit.

Instead, the call should have been

glm(Periostitis ~ Cemetery, data = Periosna, family = binomial)

noting that we use the variable names directly, as they exist in Periosna, and we tell R where to locate the variables in the formula via the data argument.

Now when you call glht() it should be able to find the Cemetery variable.

Best Answer

Related Solutions

Solved – GLM post hoc with non-parametric tests

Solved – Post-hoc test after a GLM with binary data

Related Question