Solved – Is it possible to have negative values in a simple slope analysis for dichotomous predictors

interaction

I am currently using the simple slope analysis to examine the significant interaction effects between my continuous moderator and a dichotomous predictor (values 1 & 2)on a continuous DV. I have 2 continuous covariates in the analysis.

My simple slope graph however showed negative values in the DV (i.e: moderator values at predictor value 2 accounted for negative amounts of DV – the slopes crossed the x-axis towards the bottom right direction). Is this possible, or is there some statistical error? I have seen slopes going into the negative values of DV for continuous predictors, but not for dichotomous predictors.

Best Answer

Overview of simple slopes analysis

The points used to generate a graph for a simple slopes analyses are predicted values of the dependent variable given the model and various values of the predictor variables (see here for an example).

Are negative predicted values possible?

Yes. A regression equation can give rise to negative predicted values.
For example, if the equation was $Y = 1 - 2X -3Z - 1XZ$, then predicted $Y$ at $X=1$ and $Z=1$ is $Y = 1 - 2(1) -3(1) - 1(1)(1)$ or $1 -2 -3 - 1=-5$.

Do negative predicted values imply that you've done something wrong?

Possibly. If some or all of the following apply then negative predicted values might be a red flag on your analysis:
1. the range of the dependent variable is all positive, and particularly if the values are a long way from 0.
2. the correlation between predictors is not huge
3. you have chosen appropriate values of predictors.
When performing a simple slopes analysis typical values for predictor values include:
- continuous predictors that are part of the moderator effect: values drawn from plus or minus one or often two standard deviations above and below the mean or something similar.
- categorical predictors that are part of the moderator effect: each of the values that the categorical predictor takes
- continuous covariates: the mean of the covariate
- categorical covariates: one of the categories

Thus, choosing predictor values consistent with the above prescriptions will generally give rise to predictions on the dependent variable that are in the ballpark of the range of the dependent variable. Of course, there are situations that could legitimately give rise to negative values even when the range of the dependent variable includes all positive values. This might be related to non-normal errors, correlated predictors, etc.

Terminology and Overview

In the context of multiple regression:

a moderator effect is just an interaction between two predictors, typically created by multiplying the two predictors together, often after first centering the predictors.
a covariate is just a predictor that was not used in the formation of the moderator and that is conceptualised as something that needs to be controlled for.

Thus, you should be able to run a hiearchical regression with moderators and covariates in just about any statistical software that supports multiple regression.

Typical approach to testing moderator effect after controlling for covariates

SPSS: If you are doing the hierarchical regression in SPSS, you'd probably enter the predictors in blocks. Here's a tutorial.
R: If you are doing this in R, you'd probably define separate linear models lm each adding additional predictors and use anova to compare the models. Here's a tutorial.

Once you understand hierarchical regression in your chosen tool a simple recipe would be as follows. Let's assume that you have the following variables

main effect precitors: IV1 IV2
interaction effect: multiplication of IV1 and IV2
covariates CV1 CV2

In some cases you may need to create the moderator

If you are using SPSS, you will need to multiply the two predictor variables together (e.g., compute iv1byiv2 = iv1 * iv2.). If you want to interpret the regression coefficients, you may find it useful to center iv1 and iv2 before creating the interaction term.
If you are using R, you can just use the notation iv1*iv2 in the linear model notation.

You can then estimate the models

Block/model 1: Enter covariates m1 <- lm(DV~CV1+CV2)
Block/model 2: Enter main effect predictors m2 <- lm(DV~CV1+CV2+IV1+IV2)
Block/model 3: Enter interaction effect m3 <- lm(DV~CV1+CV2+IV1*IV2)

You can then interpret the significance of the r-square change between block 2 and 3 as a test of whether there is an interaction effect: anova(m2, m3)

Simple slopes analysis

If you want to perform simple slopes analysis, you can take the regression formula provided by the final multiple regression and calculate some appropriate values to plot.

You can do this by hand or you can use predict in R. For example, you might calculate the values predicted by the regression equation using the following values

IV1   IV2   CV1   CV2
-2sd  -2sd  mean  mean
-2sd  +2sd  mean  mean
+2sd  -2sd  mean  mean
+2sd  +2sd  mean  mean

You can then plot these values using whatever plotting tool that you like (e.g., R, SPSS, Excel).

Personally, I find Conditioning Plots a better option than simple slopes analysis. R has the coplot function. The idea is to show a scatter plot of the relationship between IV and DV in a set of arranged scatterplots defined by ranges of the moderator. When I searched, I found an example of using conditioning plots for moderator regression on page 585 of Handbook of Research Methods in Personality Psychology

Solved – How to get slope and standard error at several levels of a continuous by continuous interaction in R

In order to examine simple slopes at different levels of one of the continuous variables, you can simply center the other continuous variable to focus on the slope of interest. In a model with a continuous by continuous interaction, like so: $$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1*x_2$$ the two single predictor coefficients ($\beta_1$ and $\beta_2$) are simple slopes for the predictor when the other predictor (however it is centered) is equal to 0.

So, if I run your practice code above, I get the following output:

Call:
lm(formula = y1 ~ x1 * x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-281.996  -70.148   -3.702   70.190  209.182 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  17.7519    10.8121   1.642    0.104    
x1            1.4175     1.0151   1.397    0.166    
x2            0.8222     1.0614   0.775    0.440    
x1:x2         0.8911     0.1295   6.882 6.04e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 100.6 on 96 degrees of freedom
Multiple R-squared:  0.4283,    Adjusted R-squared:  0.4105 
F-statistic: 23.98 on 3 and 96 DF,  p-value: 1.15e-11

The x1 output gives us the test of the x1 slope at x2 = 0. Thus we get a slope, standard error, and (as a bonus) the test of that parameter estimate compared to 0. If we wanted to get the simple slope of x1 (and standard error and sig. test) when x2 = 6, we simply use a linear transformation to make a value of 6 on x2 the 0 point:

x2.6<- x2-6

By viewing summary stats, we can see that this is the exact same variable as before, but it has been shifted down on the number line by 6 units:

summary(x2)
summary(x2.6)

 > summary(x2)
   Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-31.0400  -5.9520   1.3430   0.8396   8.0090  22.3800 

 > summary(x2.6)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-37.040 -11.950  -4.657  -5.160   2.009  16.380

Now, if we re-run the same model but substitute x2 for our newly centered variable x2.6, we get this:

model1.6<- lm(y1~x1*x2.6)
summary(model1.6)


Call:
lm(formula = y1 ~ x1 * x2.6)

Residuals:  
     Min       1Q   Median       3Q      Max 
-281.996  -70.148   -3.702   70.190  209.182 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  22.6853    12.6384   1.795   0.0758 .  
x1            6.7639     1.2346   5.479 3.44e-07 ***
x2.6          0.8222     1.0614   0.775   0.4404    
x1:x2.6       0.8911     0.1295   6.882 6.04e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 100.6 on 96 degrees of freedom
Multiple R-squared:  0.4283,    Adjusted R-squared:  0.4105 
F-statistic: 23.98 on 3 and 96 DF,  p-value: 1.15e-11

If we compare this output to the old output we can see that the omnibus F is still 23.98, the interaction t is still 6.882 and the slope for x2.6 is still .822 (and nonsignificant). However, our coefficient for x1 is now much larger and significant. This slope is now the simple slope of x1 when x2 is equal to 6 (or when x2.6 = 0). By centering by several different variables, we can test several different simple effects (and obtain slopes and standard errors) without that much work. By using a (dreaded in the R community) for loop to iterate through the list, we can test several different simple effects quite efficiently:

centeringValues<- c(1,2,3,4,5,6) # Creating a vector of values to center around

for(i in 1:length(centeringValues)){     #Making a for loop that iterates through the list
  x<- x2-i         # Creating a predictor that is the newly centered variable
  print(paste0('x.',centeringValues[i])) # printing x.centering value so you can keep track of output
  print(summary(lm(y1~x1*x))[4]) # printing coefficients for the model with the center variable

}

This code first creates a vector of values you want to become the 0 point for the variable you do not want the slope for (in this example, x2). Next, create a for loop that iterates through the positions in this list (i.e. if the list has 3 items, the for loop will iterate through the values 1 to 3). Next, create a new variable that is the centered version of the variable for which you do not want centered slopes (in this case we are interested in simple slopes for x1, so we center x2). Finally, print the coefficients from the model that includes your newly centered variable in place of the raw variable. This results in the following output:

[1] "x.1"
    $coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 18.5741364 10.8815154 1.7069439 9.106513e-02
x1           2.3085985  1.0143100 2.2760286 2.506664e-02
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

[1] "x.2"
$coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 19.3963616 11.0528627 1.7548722 8.247158e-02
x1           3.1996515  1.0299723 3.1065415 2.489385e-03
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

[1] "x.3"
$coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 20.2185867 11.3215341 1.7858522 7.728065e-02
x1           4.0907045  1.0613132 3.8543802 2.096928e-04
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

[1] "x.4"
$coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 21.0408119 11.6808159 1.8013135 7.479290e-02
x1           4.9817575  1.1070019 4.5002249 1.905339e-05
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

[1] "x.5"
$coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 21.8630371 12.1226545 1.8034859 7.444873e-02
x1           5.8728105  1.1653521 5.0395160 2.193149e-06
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

[1] "x.6"
$coefficients
              Estimate Std. Error   t value     Pr(>|t|)
(Intercept) 22.6852623 12.6383944 1.7949481 7.580894e-02
x1           6.7638636  1.2345698 5.4787212 3.439867e-07
x            0.8222252  1.0613590 0.7746909 4.404262e-01
x1:x         0.8910530  0.1294695 6.8823366 6.041102e-10

Here you can see the output provides the coefficients for several tests, but the only thing that changes each time is the slope for x1. The slope for x1 in each output represents the slope for x1 when x2 is equal to whatever centering value we have assigned for that iteration. Hope this helps!