Hierarchical Regression – Interpreting Hierarchical Regression Models with Interaction Terms

interactioninterpretationmultiple regressionrregression

I am running multiple regression to test my hypothesis, which includes interaction terms. I have some control variables and three key predictors A, B and C. I used hierarchical regression models which adds the predictors step by step.

Here is my sequence of models:

model 1: only include control variables to see how they relates to the dependent variable

model 2: add A B and C based on model 1

model 3: add the interaction of A and B based on model 2

model 4: add the interaction of AB and BC based on model 3

Q1: In model 2, predictor B is significant, but its not significant in model 3 and 4 when the interaction terms were added. Should I say B has a significant impact on my dependent variable?

Q2: The significance of some control variables varies between different model. How to combine the results when interpreting them?

Q3: Do I have to include the interaction term of ABC, even though its not one of my hypothesis?

Thanks a lot for your help!!!

Best Answer

With respect to your Question 1, the apparent "significance" of an individual coefficient involved in an interaction depends on how its interacting predictor variables are coded. See this page and this page and this page, among others on this site. So that problem should be ignored.

With respect to Question 2, what you want is a single model that appropriately describes your data. Your multiple models arose from your step-by-step variable selection based on "statistical significance" at each step. That is not a good idea. The "statistical significance" values in such an approach aren't even correct, as they haven't taken into account your use of the outcomes to select the predictors.

Frank Harrell describes a much better way to proceed in his course notes and book, in particular Chapter 4. Decide on the number of degrees of freedom that you can spend in terms of estimating coefficients, decide where to spend them in the model (on non-linear outcome associations with continuous covariates, interactions expected to be of interest, etc), and spend them in a single model.

With respect to Question 3, see Harrell's modeling approach outlined above. If you don't want to spend extra degrees of freedom on the 3-way interaction you don't necessarily have to. If you have enough data, however, you might find that beneficial so that you can anticipate potential arguments from skeptical reviewers of your work.

Related Solutions

Solved – Interpretation hierarchical regression

Interpretation of hierarchical regression

Model Summary Box:

Read 3rd column named 'R square' for all your models and interpret like this. Check the R Square in the Model Summary box. Variables entered in Block 1 (control variable) explained X (depends on your output) % of the variance in DV.

After Block 2 variables (IDV's) has been included , the model as a whole explained Y (depends on your output) % of variance in DV.

Adding Block 3 variable (interaction term), the model as a whole explained Z (depends on your output) % of variance in DV.

Now look at change statistics

The column labelled R Square Change shows how much change in R square (explained variation) as compare to previous model. For example, for model 1, it is same as X, for model 2, it is same as (Y - X) and so on.

To infer if this change is statistically significant or not, you need to look at the last column (Sig. F Change)

The ANOVA table:

It indicates that the models as a whole are significant or not.

Hope it helps!

Regression – Interpretation of Interaction Term and Model in Multivariate Linear Regression

Questions 1 and 2:

You should not be doing separate models. Write a single model that includes both Ind_1 and Ind_2, and the other predictors and interaction, like:

DV ~ Ind_1 + Ind_2 + E + CV + CV:E

That allows you to evaluate all the coefficients of interest at once while accounting for all of the predictors together.

In response to comment: Even if Ind_1 and Ind_2 are correlated, I still recommend a single model. If one predictor is associated with outcome and with a second predictor, then the second predictor is also going to be associated with outcome; the question is how much. Attempts to distinguish between the two predictors because one has a p-value of 0.06 and the other managed to pass the arbitrary 0.05 threshold will tend not to extend well to new data sets. See this page among many others on this site for why you shouldn't confuse such "statistical significance" with importance.

Admittedly, with highly correlated predictors it's possible that in the combined model neither predictor individually will pass the "significance" threshold. A joint test on the two together probably would. Even better with correlated predictors, combine them into a single predictor in a way consistent with your understanding of the subject matter. See the sections of Chapter 4 of Frank Harrell's book or class notes on data reduction.

Questions 3 and 4:

The "significance" of the "main effect" coefficient of a predictor like E that's involved in an interaction is generally not worth evaluating. The whole point of the interaction with CV is that the association of E with DV depends on the level of CV. What's reported for the E coefficient is its association with outcome when the interacting predictor, CV in this case, is at its reference level or 0. What's the point of evaluating the "significance" of the E coefficient (whether it's different from a value of 0) if that coefficient's value depends on how you coded or centered the interacting CV?

Question 5:

Why categorize your outcome variable that way? If you have an ordinal outcome, why throw away that extra information?

Questions 6 and 7:

Work with the full model above and base your conclusions on it. For the interaction, don't worry about the individual E and CV coefficients; report results for realistic, illustrative combinations of values.

In response to comment: You have to apply your understanding of the subject matter to decide how to illustrate your findings. For example, if the CV is some type of nuisance variable that you just want to control for, it might be OK just to show predictions at its mean. But as it seems to have an interesting interaction with E, you are probably better off showing a couple of examples of combinations of E and CV. One set of choices might be the 25th and 75th percentiles of CV for each of the two levels of E, if those combinations make sense in your data. That would be 4 examples illustrating the joint contributions to outcome.

Best Answer

Related Solutions

Solved – Interpretation hierarchical regression

Regression – Interpretation of Interaction Term and Model in Multivariate Linear Regression

Related Question