Solved – Main effects are not significant anymore after adding interaction terms in the linear regression

interactionlinearregression

I have two models:

$levelOfCreativity = \alpha Extravert + \beta Woman + \gamma hasACollegeDegree + \zeta isOlderThan25$

$levelOfCreativity = \alpha Extravert + \beta Woman + \gamma hasACollegeDegree + \zeta isOlderThan25 + \mu Extravert * hasACollegeDegree + \delta Woman * isOlderThan25$

The dependent variable can vary between 0 and 10. The independent variables are 0 or 1. E.g. if a person is a woman, the variable will be equal to 0.

My real model exists of 16 main variables besides hasACollegeDegree and isOlderThan25 (control groups) and 32 interactions (16 with hasACollegeDegree and 16 with isOlderThan25) but I simplified this model over here for practical reasons.

When I run a normal linear regression (no interaction), I see that hasACollegeDegree has a significant influence with a coefficient of 0.13 and that isOlderThan25 is not significant with a coefficient of -0.08. When I add the interaction terms, this changes. hasACollegeDegree becomes not significant (coefficient 0.03) and isOlderThan25 becomes significant with a value of -0.47. The differences between 0.03 and 0.13 or -0.08 and -0.47 for example are quite.
The coefficients of the interaction terms are also not significant.

I understand that it is normal that my coefficients change as there are two different linear models. I also understand that the meaning of the main effects is totally different in the two models due to the interaction effect.

Taking all this into account, I have a main question:
what do these findings mean if the coefficients of hasCollegeDegree and isOlderThan25 change so drastically leading to two totally different conclusions?

My first model (based on all the data without making subgroups) says that having a college degree has a significant influence on your level of creativity and that being older than 25 doesn't have any significant influence. When I add the interaction terms, the model says that having a college degree doesn't have any influence while being older than 25 has an influence. Because of that strange behavior, I don't know what the end conclusion is: does having a college degree have or have not an impact on your creativity/does being older than 25 have or have not an impact on your creativity?

The problem is that both models make sense and that based on my research, it also makes sense to take a look at both models. Things like "you need to think which model is the best and which variables/interactions you really need to include" doesn't really apply to my situation, which results in the question mentioned above.

Thank you!

Best Answer

What you are terming 'main effects' are not really that in the usual sense. You cannot interpret lower-order effects (e.g., "main" effects) without simultaneously taking into account any higher-order (e.g. interaction) effects that are constructed of terms from lower-order effects. My Regression Modeling Strategies course notes have a detailed example of interpretation for a simple example where age has a linear effect and interacts with sex. It shows you how to do the composite "chunk" tests that are meaningful and are independent of how the variable are coded. For example, the age effect is the combined age and age x sex effects, which tests whether age is associated with Y for either sex. The sex effect is the chunk test for sex + age x sex interaction and tests whether there is a difference between the sexes at any age.

To get specific meaningful estimates, you form contrasts, e.g. difference between male and female at the median age.

Don't use statistical significance to choose a model. Stick with pre-specification, with an exception being this: if you do a chunk test of all interaction effects combined and the multiple degree of freedom test for all interactions yields p=0.4, you can fairly safely drop all the interaction terms. Some statisticians use AIC in making this decision.

The R rms package anova and summary functions do all this automatically, as shown in the detailed case studies in my notes. For more resources see http://fharrell.com/links.

Related Question