Regression Statistical-Significance Interaction – How to Interpret Significant Interaction of Two Non-Significant Predictors

binary datainteractionregressionstatistical significance

I have two non-significant main binary variables in my binary logistic regression model, but their interaction is significant. The variables are centered and no multicollinearity is the case (all VIFs about 1.0). The main variables are nonsignificant, but their interaction is. I want to interpret that significant interaction of two non-significant estimates.

It would read something like this: the effect of the variable A is less visible in the level 1 of the variable B (B1) and more visible in the second level of B (B2). Or I can say the same thing about the effect of B being less visible in A1 and more visible in A2…

However, the problem is that neither the effect of A, nor the effect of B are significant! So the above interpretation, although seemingly correct, sounds inconsistent or strange. (How a non-significant effect is supposed to be boosted by the other variable?)…

On a second thought, it seems that it is actually possible. For example, if I exclude cases with B2 from my sample, now the effect of A in the sample would show up as significant (it is B2 cases in the sample that disallow A to appear as significant)… This is getting more clear in my head now, but do you still have something other than the above interpretation in mind?

Best Answer

You seem to have the right intuition in your last paragraph. It is possible for variable x and z in a regression to appear non significant even though they have some effect on the dependent variable y. The following small reproducible example illustrates that fact.

set.seed(890)
x <- rnorm(1000, mean=10, sd=3)
z <- rnorm(1000, mean=25, sd=6)

y <- ifelse(z>30, sqrt(x), 0) + rnorm(1000, mean=12, sd=10)

m1 <- lm(y~ x + z)
m2 <- lm(y~ x*z)

summary(m1)
summary(m2)

This produces the following output (troncated for readability):

        Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.61151    1.79312   5.918 4.48e-09 ***
x           -0.00765    0.11085  -0.069    0.945    
z            0.08651    0.05514   1.569    0.117    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.34 on 997 degrees of freedom
Multiple R-squared:  0.002464,  Adjusted R-squared:  0.000463 
F-statistic: 1.231 on 2 and 997 DF,  p-value: 0.2923

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 18.59305    5.11233   3.637  0.00029 ***
x           -0.79087    0.48273  -1.638  0.10167    
z           -0.22747    0.19625  -1.159  0.24669    
x:z          0.03077    0.01846   1.667  0.09584 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 10.33 on 996 degrees of freedom
Multiple R-squared:  0.005239,  Adjusted R-squared:  0.002243 
F-statistic: 1.749 on 3 and 996 DF,  p-value: 0.1554

As you can see, y depends on x for some levels of z (this is your significant interaction). However, in m1, there does not appear to be a significant effect of either x or z on y when you only include the main effects. In m2, the interaction becomes significant (albeit barely). Note that neither m1 or m2 are very good models for the data.

In terms of interpretation, you would probably say that x has a significant effect on z for some values of z. There are several ways of testing for this. The one you mention in the last paragraph, excluding part of your sample based on the score of observations on a certain variable is usually refered to as "split-sample" analysis in social sciences. Other ways of testing for this imply to calculate the marginal effect of an interaction, depending on the values of one of the two interacted variables.