Solved – Test in R whether coefficient estimates of categorical variables are different in linear regression

rregression

How do I check in R whether coefficients of different levels of categorical variables are statistically the same. The model that I have is:

Y = Intercept + X1 + X2 + X3 + X4

Both X1 and X2 are categorical variables with 3 and 4 levels each. X3 and X4 are continuous variables. I came to know that in R we can check whether coefficients of two continuous variables are the same using the following procedure. For instance, if I want to check whether X3 and X4 have the same coefficient, I could do the following:

Model1: lm(y~X1+X2+I(X3+X4))
Model2: lm(y~X1+X2+X3+X4)

Then, I can do an anova test as follows:

anova(Model1,Model2)

Now, how do I check whether the coefficients for different levels of X1 are the same. The regression output will give 2 coefficient estimates (or should I say intercepts?) for X1 as it has three levels. How to check whether these estimates are statistically different from each other?

Best Answer

There is nothing too special about categorical variables when we use lm. If X1 has three levels, what happens is that we represent X1 in terms of three binary variables whose sum is always one (i.e., only one of them equals one at any observation). So, then we want to test whether all the levels have the same coefficient. Let

set.seed(1)
df <- data.frame(y = rnorm(10), x = factor(sample(1:3, 10, replace = TRUE)))
(mod <- lm(y ~ x - 1, data = df))
#
# Call:
# lm(formula = y ~ x - 1, data = df)
#
# Coefficients:
#       x1        x2        x3  
#  0.64897  -0.30579  -0.02534

Hence, we want to test H0 that x1, x2, and x3 have the same coefficients.

library(car)
linearHypothesis(mod, c("x1 = x2", "x2 = x3"))
# Linear hypothesis test
#
# Hypothesis:
# x1 - x2 = 0
# x2 - x3 = 0
#
# Model 1: restricted model
# Model 2: y ~ x - 1
#
#   Res.Df    RSS Df Sum of Sq      F Pr(>F)
# 1      9 5.4838                           
# 2      7 3.5987  2    1.8852 1.8335 0.2289

As expected, we cannot reject the null in this example.


Then there's another, somewhat simpler way to see this. Let now

(mod <- lm(y ~ x, data = df))
#
# Call:
# lm(formula = y ~ x, data = df)
#
# Coefficients:
# (Intercept)           x2           x3  
#      0.6490      -0.9548      -0.6743  

so that now the interpretation of the coefficients of x2 and x3 is "additive". E.g., when the level of x is 2, how much higher is y than when the level is 1? So, in this case, if the effect of all three levels is the same, in this specification x2 and x3 will have zero coefficients. Thus,

linearHypothesis(mod, c("x2 = 0", "x3 = 0"))
# Linear hypothesis test
#
# Hypothesis:
# x2 = 0
# x3 = 0
#
# Model 1: restricted model
# Model 2: y ~ x
#
#   Res.Df    RSS Df Sum of Sq      F Pr(>F)
# 1      9 5.4838                           
# 2      7 3.5987  2    1.8852 1.8335 0.2289

gives, as expected, the same p-value.


On the other hand, if all the levels have the same effect, then x is nothing but a constant variable, like the intercept. So then the first testing option above can be seen as testing that x is as useful as the intercept, while the second one, equivalently, that x doesn't add anything useful over the intercept.

Related Question