Regression – Literal Interpretation of Factor-by-Factor Interaction Term

categorical datainteractioninterpretationrregression

Following the explanations in What is the baseline level in a factor-by-factor interaction?, it is my understanding that a factor-by-factor interaction term has no literal interpretation. At the very least, it has no clear, straightforward interpretation…

Consider this example from Fox 2003. In the regression below, these two variables are categorical: year={1997,..,2002} and colour={black,white}.

require(effects)
require(lmtest)
Arrests$year <- as.factor(Arrests$year)
arrests.mod <- glm(released ~ employed + citizen + checks
                         + colour*year + colour*age,
                         family=binomial, data=Arrests)

Which yields:

> coeftest(arrests.mod)

z test of coefficients:

                       Estimate Std. Error  z value  Pr(>|z|)    
(Intercept)           0.3444334  0.3100749   1.1108 0.2666514    
employedYes           0.7350645  0.0847701   8.6713 < 2.2e-16 ***
citizenYes            0.5859841  0.1137717   5.1505 2.598e-07 ***
checks               -0.3666425  0.0260322 -14.0842 < 2.2e-16 ***
colourWhite           1.2125167  0.3497751   3.4666 0.0005272 ***
year1998             -0.4311794  0.2603589  -1.6561 0.0977023 .  
year1999             -0.0944343  0.2615447  -0.3611 0.7180519    
year2000             -0.0108975  0.2592073  -0.0420 0.9664655    
year2001              0.2430630  0.2630151   0.9241 0.3554129    
year2002              0.2129549  0.3532786   0.6028 0.5466444    
age                   0.0287279  0.0086191   3.3330 0.0008590 ***
colourWhite:year1998  0.6519565  0.3134898   2.0797 0.0375555 *  
colourWhite:year1999  0.1559504  0.3070430   0.5079 0.6115161    
colourWhite:year2000  0.2957537  0.3062034   0.9659 0.3341076    
colourWhite:year2001 -0.3805413  0.3040538  -1.2516 0.2107305    
colourWhite:year2002 -0.6173178  0.4192551  -1.4724 0.1409086    
colourWhite:age      -0.0373729  0.0102003  -3.6639 0.0002484 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In the table above, how would one interpret the coefficient for e.g. colourWhite:year1998 (significant at 5%)?

Since the baseline level is colourBlack:year1997 (or the Intercept), the level for Blacks in 1998 would be computed as follows:

Intercept + year1998

Whereas the level for Whites in 1998 would be:

Intercept + year1998 + colourWhite + colourWhite:year1998

Thus it seems to me that the coefficient for colourWhite:year1998 doesn't stand for much, really. At least it doesn't look like having any intuitive, straightforward interpretation. Does it?

Best Answer

colourWhite:year1998 is called an interaction effect. As you say, blacks in 1998 would be Intercept + year1998. Also whites in 1997 would be Intercept + colourWhite. We would hope that color and year would just be a simple additive effect and that whites in 1998 would be Intercept + colourWhite + year1998. But sometimes, the two explanatory variables may "interact" with each other and, when combined, give a larger/smaller effect than when considered individually.

If modeled without interaction effects, with all other variables known and the same, year1998 has an -0.4311794 effect on the logit. (This is not exactly true because the estimates would be different) Now, with interaction effects, year1998 has an -0.4311794 effect if black and (-0.4311794 + 0.6519565) effect if white.

For your data, it looks like all the variables are factor variables except for age and checks? If this is correct, then variables for all the factors that are not White or 1998 are zero and drop out of the equation. The formula becomes

\begin{align} \log{ \frac{p}{1-p}} &= Intercept + -0.3666425 * checks + 0.0287279 * age \\ & \qquad + colourWhite + year1998 + colourWhite:year1998 \\ & \qquad -0.0373729 * colourWhite:age \\ & \\ &= 0.3444334 - 0.3666425 * checks + 0.0287279 * age \\ & \qquad + 1.2125167 - 0.4311794 + 0.6519565 \\ & \qquad -0.0373729 * age \end{align}

Related Question