Solved – Contrast to test significant interaction – why not include main effect

anovacontrastslinear modelmultiple regressionregression

Given a data set with a continuous outcome variable $Y$ and two nominal IV $X_1$ and $X_2$ which have 4 (A,B,C,D) and 3 (E,F,G) levels respectively, we can assign the first level of each as a reference level and create dummy or indicator variables:

$X_1: X_1B$ , $X_1C$ , $X_1D $

$X_2: X_2F$ , $X_2G$

An ordinary linear regression of $Y$ on these variables can be written as (including the interaction terms):

$Y = \beta_0 + \beta_1X_1B + \beta_2X_1C + \beta_3X_1D + \beta_4X_2F + \beta_5X_2G + \beta_6X_1B*X_2F + \beta_7X_1B*X_2G + \beta_8X_1C*X_2F + \beta_9X_1C*X_2G + \beta_{10}X_1D*X_2F + \beta_{11}X_1D*X_2G + e $

where $e $~$N(0,\sigma{^2})$, the error is normally distributed with constant variance.

If we wished to test the significance of the interactions $X_1$ x $X_2$ to determine if, say, the effect of $X_1$ on $Y$ differs at different levels of $X_2$, one method would be to use a contrast by constructing the appropriate matrix $L$ to multiply against the regression beta vector with the null hypothesis $L\beta=0$. See for example here.

My understanding is that we would be (with the example above) using null hypothesis involving just the interaction terms:

$Ho:$
$\beta_{6}=0$
and
$\beta_{7}=0$
and
$\beta_{8}=0$
and
$\beta_{9}=0$
and
$\beta_{10}=0$
and
$\beta_{11}=0$

but, why wouldn't testing $\beta_{1}=0$ and $\beta_{2}=0$ be included because these are jointly interpreted as the effect of $X_1$ on $Y$ when $X_2= $level E (the reference level for $X_2$)?

ADD After ahfoss response:

If the interpretation of $\beta_6$ through $\beta_{11}$ = 0 is the following:

The effect of $X_1$ = B on Y is not different when $X_2$ = F

The effect of $X_1$ = B on Y is not different when $X_2$ = G

The effect of $X_1$ = C on Y is not different when $X_2$ = F

The effect of $X_1$ = C on Y is not different when $X_2$ = G

The effect of $X_1$ = D on Y is not different when $X_2$ = F

The effect of $X_1$ = D on Y is not different when $X_2$ = G

Isnt it possible that "The effect of $X_1$ = A on Y IS different when $X_2$ = E" and that this would suggest an interaction? I am missing the obvious I'm sure but it has not quite clicked for me. I am missing how the 6 statements above for $\beta_6$ through $\beta_{11}$ = 0 is equivalent to your equation 1.

Best Answer

This is an interesting question, and actually quite subtle. It gets at the core of the definition of interactions, as well as the assumptions underlying dummy coding of categorical variables.

OP's question is that, when we are testing for interactions between the levels as given above, why do we (seemingly!) not bother to take into account interactions between one variable and the other variable's reference level?

The short answer is that we actually are taking these into account, as I'll explain here. First, to simplify subsequent notation, let $Y_{AE}$ denote the random variable $Y|\{X_1=A\} \cap \{X_2=E\}$, that is, $Y$ conditional on observing levels $A$ and $E$ in variables $X_1$ and $X_2$, respectively.

Now, recall that if we claim that there is no interaction between $X_1$ and $X_2$, this is equivalent to asserting that

$$ E[Y_{AE}]-E[Y_{BE}] = E[Y_{AF}]-E[Y_{BF}] = E[Y_{AG}]-E[Y_{BG}] \qquad (Eq. 1) $$

that is, the expected difference between $Y|X_1=A$ and $Y|X_1=B$ is unrelated to the level of $X_2$, as long as $X_2$ is held constant. Similarly, no interaction means that

$$ E[Y_{AE}]-E[Y_{AF}] = \cdots = E[Y_{DE}]-E[Y_{DF}]. $$

Now, OP correctly states that given our choice of reference levels $A$ and $E$, the definition of $\beta_1$ is the expected difference between $Y$ when $X_1=B$ versus $X_1=A$, given that $X_2$ is held constant at $E$. In other words,

$$ \beta_1 = E[Y_{AE}] - E[Y_{BE}], $$

but if there is no interaction between $X_1$ and $X_2$, it follows immediately from equation 1 above that $$ \beta_1 = E[Y_{AE}]-E[Y_{BE}] = E[Y_{AF}]-E[Y_{BF}] = E[Y_{AG}]-E[Y_{BG}] $$

In other words, if there is no interaction, then $\beta_1$ is sufficient to explain the differences in $Y$ based on levels of $X_1$ when $X_2$ is held constant. This is why the test for an interaction involves leaving $\beta_1$ in the model and testing whether the interaction terms are zero.

The full answer to your question involves a simultaneous derivation for $\beta_1$ through $\beta_5$, but from this example you hopefully get the idea of what's going on.