Solved – Coefficients in Anova

anovageneralized linear modellinear model

When considering anova as a linear model where the variables of the model are categorical, I've heard that the coefficient given to a variable is the mean of the response in the group of that variable. Is this always true? Even in glms? If this is true, does this mean that adding new variables should not change the value of the coefficients?

Thanks!

Best Answer

Not exactly true. If the categorical predictor with its dummies are your only variables in the model, the intercept is the mean of the missing dummy variable and the other coefficients are the gap between respective groups and the intercept.

In GLMs, the same holds but for the models with a non-identity link function (logistic/Poisson/...), the coefficients are not on the same scale as the original response variable. So you would need to calculate prediction for each group and transform using the mean function to see that this is a model for group means. You cannot transform the coefficients directly.

Once you add additional variables to the model, the coefficients begin to represent adjusted differences between groups, so you now have adjusted group means. These group mean differences are adjusted for the other variables in the model. If your categorical variable is unrelated to these other variables, the coefficients should be relatively stable if using GLMs with identity (as with linear regression) or log link (as with Poisson regression); citation below documents this. If your categorical variable is related with other variables and those variables relate to the outcome, the coefficients should swing about.

By "related to other variables", is there a relationship between the group individuals finds themselves and these other variables in the model? For example, if the categorical predictor is age groups, and another variable in the model is wealth, then age groups will have a relationship to wealth and the coefficients for the age groups should change when wealth is included in the model. This is assuming wealth is related to whatever the outcome variable is.


M. H. GAIL, S. WIEAND, S. PIANTADOSI; Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates, Biometrika, Volume 71, Issue 3, 1 December 1984, Pages 431–444, https://doi.org/10.1093/biomet/71.3.431