How do I check in R whether coefficients of different levels of categorical variables are statistically the same. The model that I have is:
Y = Intercept + X1 + X2 + X3 + X4
Both X1
and X2
are categorical variables with 3 and 4 levels each. X3
and X4
are continuous variables. I came to know that in R we can check whether coefficients of two continuous variables are the same using the following procedure. For instance, if I want to check whether X3
and X4
have the same coefficient, I could do the following:
Model1: lm(y~X1+X2+I(X3+X4))
Model2: lm(y~X1+X2+X3+X4)
Then, I can do an anova
test as follows:
anova(Model1,Model2)
Now, how do I check whether the coefficients for different levels of X1
are the same. The regression output will give 2 coefficient estimates (or should I say intercepts?) for X1
as it has three levels. How to check whether these estimates are statistically different from each other?
Best Answer
There is nothing too special about categorical variables when we use
lm
. IfX1
has three levels, what happens is that we representX1
in terms of three binary variables whose sum is always one (i.e., only one of them equals one at any observation). So, then we want to test whether all the levels have the same coefficient. LetHence, we want to test H0 that
x1
,x2
, andx3
have the same coefficients.As expected, we cannot reject the null in this example.
Then there's another, somewhat simpler way to see this. Let now
so that now the interpretation of the coefficients of
x2
andx3
is "additive". E.g., when the level ofx
is2
, how much higher isy
than when the level is1
? So, in this case, if the effect of all three levels is the same, in this specificationx2
andx3
will have zero coefficients. Thus,gives, as expected, the same p-value.
On the other hand, if all the levels have the same effect, then
x
is nothing but a constant variable, like the intercept. So then the first testing option above can be seen as testing thatx
is as useful as the intercept, while the second one, equivalently, thatx
doesn't add anything useful over the intercept.