Regression – Interpretation of Interaction Term and Model in Multivariate Linear Regression

interactionregression

I intend to compare the effect of two indices (Ind_1 and Ind_2,continuous) and another dichotomous variable (E) on my dependent variable (DV-ordinal(0-30)) and I should adjust for one continuous variable (CV) for a proper interpretation. Ind_1 and Ind_2 are highly correlated and have to be investigated separately. My sample size is around 100. So I have the two following models:

model 1: DV ~ Ind_1 + E + CV
model 2: DV ~ Ind_2 + E + CV

Ind_1 is significant, Ind_2 is not significant (p=0.06) and E is significant in both models. CV is not significant in either models.

However, I expect an interaction between E and CV considering the design of the study. So I added an interaction terms to both models:

model 1_I : DV ~ Ind_1 + E + CV + CV*E
model 2_I : DV ~ Ind_2 + E + CV + CV*E

Now, both Ind_1 and Ind_2, along with E and the interaction terms are all significant in both of the models. The p value for Ind_1 and Ind_2 has been halved and the overall model is now more fit and has an increased adjusted R-Squared.
Now overall, I don't know how to exactly interpret these results. Specifically, I have a couple of questions:

  1. Should I report all 4 models or just the models with interaction?

  2. Do these results mean both of the indices are significant predictors of the DV despite the lack of significant in model_1 and model_2?

  3. How can I interpret the E variable's correlation to DV overall? In model_1 and model_2, the effect size of the E variable is in-line with other findings; adding the Interaction term requires considering CV as 0, which is impossible and has a mean of around 200. This will complicate the interpretation. Can I just rely on the effect size of E from model_1 and model_2 (which are practically the same) and report the independent effect? Does the significance of E and CV*E variables even mean anything now?

  4. Centering my variables will result in E not being significant in the interaction models, however, the interaction will remain significant. What does this mean? does this mean that E is not contributing to the DV? Do I have to center my variables?

  5. I have created a binary logistic model with the same variables (Introduced a cut-off to DV), However, the interaction effect is not significant here. Should I add the interaction effect to the model nonetheless? (considering its significant in the linear model)

  6. Is this even worth the overcomplication of the models and interpretation? Should I just report the second index as non-significant?

  7. Can I just conclude, after reporting all 4 models, that both indices and the E variable are significantly correlated to the DV, and that the E variable and its possible interactions maybe necessary to observe this correlation (i.e. The E variable is also significantly contributing to the DV)?

Best Answer

Questions 1 and 2:

You should not be doing separate models. Write a single model that includes both Ind_1 and Ind_2, and the other predictors and interaction, like:

DV ~ Ind_1 + Ind_2 + E + CV + CV:E

That allows you to evaluate all the coefficients of interest at once while accounting for all of the predictors together.

In response to comment: Even if Ind_1 and Ind_2 are correlated, I still recommend a single model. If one predictor is associated with outcome and with a second predictor, then the second predictor is also going to be associated with outcome; the question is how much. Attempts to distinguish between the two predictors because one has a p-value of 0.06 and the other managed to pass the arbitrary 0.05 threshold will tend not to extend well to new data sets. See this page among many others on this site for why you shouldn't confuse such "statistical significance" with importance.

Admittedly, with highly correlated predictors it's possible that in the combined model neither predictor individually will pass the "significance" threshold. A joint test on the two together probably would. Even better with correlated predictors, combine them into a single predictor in a way consistent with your understanding of the subject matter. See the sections of Chapter 4 of Frank Harrell's book or class notes on data reduction.

Questions 3 and 4:

The "significance" of the "main effect" coefficient of a predictor like E that's involved in an interaction is generally not worth evaluating. The whole point of the interaction with CV is that the association of E with DV depends on the level of CV. What's reported for the E coefficient is its association with outcome when the interacting predictor, CV in this case, is at its reference level or 0. What's the point of evaluating the "significance" of the E coefficient (whether it's different from a value of 0) if that coefficient's value depends on how you coded or centered the interacting CV?

Question 5:

Why categorize your outcome variable that way? If you have an ordinal outcome, why throw away that extra information?

Questions 6 and 7:

Work with the full model above and base your conclusions on it. For the interaction, don't worry about the individual E and CV coefficients; report results for realistic, illustrative combinations of values.

In response to comment: You have to apply your understanding of the subject matter to decide how to illustrate your findings. For example, if the CV is some type of nuisance variable that you just want to control for, it might be OK just to show predictions at its mean. But as it seems to have an interesting interaction with E, you are probably better off showing a couple of examples of combinations of E and CV. One set of choices might be the 25th and 75th percentiles of CV for each of the two levels of E, if those combinations make sense in your data. That would be 4 examples illustrating the joint contributions to outcome.