Multiple Regression – Post-hoc Test for Differences in Slopes with Interaction

interactionmultiple regressionpost-hoc

I am looking for a robust way test for differences in slopes in a range of data. Here, I am showing R code to be clear about what I am attempting, but I believe that I am looking for a general answer (which I can then implement in R, if it is not available already).

As an example, I am using Edgar Anderson's "iris" data (builtin in R, and available here as a csv) to build a prediction of Petal.Length by Sepal.Width when I can visually see that the Species have different relationships:

enter image description here

So, I build a linear model with the interaction between Species and Sepal.Width, and can see that there is a significant interaction. In R, the model is built as:

irisLM <-
  lm(Petal.Length ~ Sepal.Width*Species
     , data = iris)

and anova(irisLM) gives the following table:

Response: Petal.Length
                     Df Sum Sq Mean Sq   F value    Pr(>F)    
Sepal.Width           1  85.23  85.232  574.1677 < 2.2e-16 ***
Species               2 355.76 177.879 1198.2894 < 2.2e-16 ***
Sepal.Width:Species   2   1.96   0.979    6.5973  0.001814 ** 
Residuals           144  21.38   0.148                        

I can run summary(irisLM) to get the full linear model as well:

Call:
lm(formula = Petal.Length ~ Sepal.Width * Species, data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.03337 -0.22012 -0.03026  0.17149  1.60468 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)   
(Intercept)                    1.18292    0.50072   2.362  0.01949 * 
Sepal.Width                    0.08141    0.14520   0.561  0.57589   
Speciesversicolor              0.75200    0.69983   1.075  0.28437   
Speciesvirginica               2.32798    0.71507   3.256  0.00141 **
Sepal.Width:Speciesversicolor  0.75797    0.22770   3.329  0.00111 **
Sepal.Width:Speciesvirginica   0.60490    0.22408   2.699  0.00778 **

Here, the coefficient Sepal.Width:Speciesversicolor is (effectively) testing the null-hypothesis of no difference in the slope for Iris versicolor and Iris setosa. When there are only two species, that is sufficient, but what about in cases like this where I also want to compare versicolor and virginica?

I know that I can change the order of the factor to treat each level as the baseline, but that seems incomplete, unelegant, and does not correct for multiple testing. (Not to mention that it is cumbersome for data with many levels and/or where levels may change from one analysis to the next.)

I have been searching extensively, and I have not found concrete answers. A few places have suggested using the estimates and standard errors to perform t-tests, but I am not sure that is a valid approach. (Further, it is unclear what degrees of freedom should be used for such tests.) I would hope that there would be a formal post-hoc test like Tukey's HSD for this (my understanding is that Tukey does not apply here because it is not a test of means; in any case, R throws an error if I try it with TukeyHSD(aov(irisLM), "Sepal.Width:Species")).

There are two questions that seem near duplicates, but they don't quite address what I want. This question asks something very similar to what I want, but the answers either suggest changing which level is the baseline, or how to work around the interaction term (but not test it directly). This question has very good answers, but they appear to only apply when both predictors are factors. They do not work for me with a continuous variable.

If it helps, here is a version of the coefficient table that estimates the value of each slope/intercept, instead of showing the difference from the baseline:

without_intercept <-
  lm(Petal.Length ~ Species/Sepal.Width - 1
     , data = iris)

summary(without_intercept)

Gives

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
Speciessetosa                  1.18292    0.50072   2.362 0.019494 *  
Speciesversicolor              1.93492    0.48891   3.958 0.000118 ***
Speciesvirginica               3.51090    0.51049   6.877 1.72e-10 ***
Speciessetosa:Sepal.Width      0.08141    0.14520   0.561 0.575889    
Speciesversicolor:Sepal.Width  0.83938    0.17540   4.785 4.18e-06 ***
Speciesvirginica:Sepal.Width   0.68632    0.17067   4.021 9.30e-05 ***

Best Answer

The lsmeans package does provide for this type of comparison. For the example at hand,

library("lsmeans")
iris.lst <- lstrends(irisLM, ~ Species, var = "Sepal.Width")
iris.lst          # slope estimates and CIs
pairs(iris.lst)   # comparisons