Regression – Interpretation of Interaction Term and Model in Multivariate Linear Regression

interactionregression

I intend to compare the effect of two indices (Ind_1 and Ind_2,continuous) and another dichotomous variable (E) on my dependent variable (DV-ordinal(0-30)) and I should adjust for one continuous variable (CV) for a proper interpretation. Ind_1 and Ind_2 are highly correlated and have to be investigated separately. My sample size is around 100. So I have the two following models:

model 1: DV ~ Ind_1 + E + CV
model 2: DV ~ Ind_2 + E + CV

Ind_1 is significant, Ind_2 is not significant (p=0.06) and E is significant in both models. CV is not significant in either models.

However, I expect an interaction between E and CV considering the design of the study. So I added an interaction terms to both models:

model 1_I : DV ~ Ind_1 + E + CV + CV*E
model 2_I : DV ~ Ind_2 + E + CV + CV*E

Now, both Ind_1 and Ind_2, along with E and the interaction terms are all significant in both of the models. The p value for Ind_1 and Ind_2 has been halved and the overall model is now more fit and has an increased adjusted R-Squared.
Now overall, I don't know how to exactly interpret these results. Specifically, I have a couple of questions:

Should I report all 4 models or just the models with interaction?
Do these results mean both of the indices are significant predictors of the DV despite the lack of significant in model_1 and model_2?
How can I interpret the E variable's correlation to DV overall? In model_1 and model_2, the effect size of the E variable is in-line with other findings; adding the Interaction term requires considering CV as 0, which is impossible and has a mean of around 200. This will complicate the interpretation. Can I just rely on the effect size of E from model_1 and model_2 (which are practically the same) and report the independent effect? Does the significance of E and CV*E variables even mean anything now?
Centering my variables will result in E not being significant in the interaction models, however, the interaction will remain significant. What does this mean? does this mean that E is not contributing to the DV? Do I have to center my variables?
I have created a binary logistic model with the same variables (Introduced a cut-off to DV), However, the interaction effect is not significant here. Should I add the interaction effect to the model nonetheless? (considering its significant in the linear model)
Is this even worth the overcomplication of the models and interpretation? Should I just report the second index as non-significant?
Can I just conclude, after reporting all 4 models, that both indices and the E variable are significantly correlated to the DV, and that the E variable and its possible interactions maybe necessary to observe this correlation (i.e. The E variable is also significantly contributing to the DV)?

Best Answer

Questions 1 and 2:

You should not be doing separate models. Write a single model that includes both Ind_1 and Ind_2, and the other predictors and interaction, like:

DV ~ Ind_1 + Ind_2 + E + CV + CV:E

That allows you to evaluate all the coefficients of interest at once while accounting for all of the predictors together.

In response to comment: Even if Ind_1 and Ind_2 are correlated, I still recommend a single model. If one predictor is associated with outcome and with a second predictor, then the second predictor is also going to be associated with outcome; the question is how much. Attempts to distinguish between the two predictors because one has a p-value of 0.06 and the other managed to pass the arbitrary 0.05 threshold will tend not to extend well to new data sets. See this page among many others on this site for why you shouldn't confuse such "statistical significance" with importance.

Admittedly, with highly correlated predictors it's possible that in the combined model neither predictor individually will pass the "significance" threshold. A joint test on the two together probably would. Even better with correlated predictors, combine them into a single predictor in a way consistent with your understanding of the subject matter. See the sections of Chapter 4 of Frank Harrell's book or class notes on data reduction.

Questions 3 and 4:

The "significance" of the "main effect" coefficient of a predictor like E that's involved in an interaction is generally not worth evaluating. The whole point of the interaction with CV is that the association of E with DV depends on the level of CV. What's reported for the E coefficient is its association with outcome when the interacting predictor, CV in this case, is at its reference level or 0. What's the point of evaluating the "significance" of the E coefficient (whether it's different from a value of 0) if that coefficient's value depends on how you coded or centered the interacting CV?

Question 5:

Why categorize your outcome variable that way? If you have an ordinal outcome, why throw away that extra information?

Questions 6 and 7:

Work with the full model above and base your conclusions on it. For the interaction, don't worry about the individual E and CV coefficients; report results for realistic, illustrative combinations of values.

In response to comment: You have to apply your understanding of the subject matter to decide how to illustrate your findings. For example, if the CV is some type of nuisance variable that you just want to control for, it might be OK just to show predictions at its mean. But as it seems to have an interesting interaction with E, you are probably better off showing a couple of examples of combinations of E and CV. One set of choices might be the 25th and 75th percentiles of CV for each of the two levels of E, if those combinations make sense in your data. That would be 4 examples illustrating the joint contributions to outcome.

Related Solutions

R Mixed Model – Interaction Term in a Linear Mixed Effect Model in R

Here's what I would do:

First, I would have a look here on how to specify the random term in your model1. I am not quite sure what you are trying to fit. There is also a lot of info on linear mixed effects models here on CV. Click on the lme4-nlme tag, which you also provided. It would also help if you could provide an example dataset, or at least the structure of your data.

Then, you most likely only need one model, which is presumably in the form of:

my_model <- lmer(carbon ~ species + landuse + species : landuse + (1|site), data = mydata)

I specified the random effect to be + (1|site), because you said:

Study sites are included as the random effect in the model.

To get the ANOVA table you can either do:

library(car)
Anova(my_model)

or:

library(afex)
mixed(carbon ~ species + landuse + species : landuse + (1|site), data = mydata)

or instead of running lmer() through the lme4 package, load the lmerTest package and run:

my_model <- lmer(carbon ~ species + landuse + species : landuse + (1|site), data = mydata)
anova(my_model)

This will give you the ANOVA table you probably need eventually. Make sure to have a look at those functions and their arguments (?Anova, ?mixed, ?lmerTest::anova).

I don't quite understand why would want to exclude species if the interaction is significant and run separate models for all species?!

However, if your main effects are not significant you could consider tossing them out and re-running the model with the interaction only. However, if one or both main effects are significant, I would keep them both in the model and report this together with a potential significant interaction.

In any case, if you have a significant interaction you should focus on interpreting the interaction and not the main effects since their interpretation could now be misleading. The interpretation of the interaction should start by visualizing it. You could do this for example using the emmip() function in the emmeans package:

library(emmeans)
emmip(my_model, landuse ~ species)

Regarding the adjustment of p-values, you only need to do that if you are following up with post-hoc tests.

This could be done with the emmeans() function (also from the emmeans package):

emmeans(my_model, pairwise ~ species : landuse)

Regression – Interpretation of Interaction Effect in Multiple Regression

When a model is fitted with only the significant main effects, $y=a+b$, this suggests that both $a$ and $b$ variable contributes to explaining the variability in $y$. And when put together, the simultaneous effect of both variable on $y$ may be either multiplicative or additive.

For example, effect of variable $a$ on $y$ alone may be $\alpha$ and effect of $b$ on $y$ alone is $\beta$. Having both variable $a$ and $b$ may produce a overall multiplicative effect $\alpha\beta$. This can be explain in a model $y=a+b+ab$. By doing so, the interpretation becomes a little tricky since the main effect cannot be interpreted alone anymore. Also, an interaction model without main effects would not make sense. The model $y=ab$ is not testing for interaction but rather has a different meaning, it will be just testing if a new variable created $p =a\times b$ is linearly associated with your $y$.

Say you have a model $y=\beta_0+\beta_1a+\beta_2b+\beta_3ab$ where $a$ is the binary variable {0,1} and $b$ is the continuous variable. The overall effect of $a$ on $y$ when $a=1$ is $\hat{\beta}_0+\hat{\beta}_1+\hat{\beta}_2b+\hat{\beta}_3b$ and when $a=0$ is $\hat{\beta}_0+\hat{\beta}_2b$. The $p$-value associated with $\hat{\beta}_3$ (for the interaction term) should be used to determined if interaction effect is significant or not.

So for your model, since the interaction effect is not significant, you should revert back to a model without interaction.

Best Answer

Related Solutions

R Mixed Model – Interaction Term in a Linear Mixed Effect Model in R

Regression – Interpretation of Interaction Effect in Multiple Regression

Related Question