I am looking for a robust way test for differences in slopes in a range of data. Here, I am showing R code to be clear about what I am attempting, but I believe that I am looking for a general answer (which I can then implement in R, if it is not available already).
As an example, I am using Edgar Anderson's "iris" data (builtin in R, and available here as a csv) to build a prediction of Petal.Length
by Sepal.Width
when I can visually see that the Species
have different relationships:
So, I build a linear model with the interaction between Species
and Sepal.Width
, and can see that there is a significant interaction. In R, the model is built as:
irisLM <-
lm(Petal.Length ~ Sepal.Width*Species
, data = iris)
and anova(irisLM)
gives the following table:
Response: Petal.Length
Df Sum Sq Mean Sq F value Pr(>F)
Sepal.Width 1 85.23 85.232 574.1677 < 2.2e-16 ***
Species 2 355.76 177.879 1198.2894 < 2.2e-16 ***
Sepal.Width:Species 2 1.96 0.979 6.5973 0.001814 **
Residuals 144 21.38 0.148
I can run summary(irisLM)
to get the full linear model as well:
Call:
lm(formula = Petal.Length ~ Sepal.Width * Species, data = iris)
Residuals:
Min 1Q Median 3Q Max
-1.03337 -0.22012 -0.03026 0.17149 1.60468
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.18292 0.50072 2.362 0.01949 *
Sepal.Width 0.08141 0.14520 0.561 0.57589
Speciesversicolor 0.75200 0.69983 1.075 0.28437
Speciesvirginica 2.32798 0.71507 3.256 0.00141 **
Sepal.Width:Speciesversicolor 0.75797 0.22770 3.329 0.00111 **
Sepal.Width:Speciesvirginica 0.60490 0.22408 2.699 0.00778 **
Here, the coefficient Sepal.Width:Speciesversicolor
is (effectively) testing the null-hypothesis of no difference in the slope for Iris versicolor and Iris setosa. When there are only two species, that is sufficient, but what about in cases like this where I also want to compare versicolor and virginica?
I know that I can change the order of the factor to treat each level as the baseline, but that seems incomplete, unelegant, and does not correct for multiple testing. (Not to mention that it is cumbersome for data with many levels and/or where levels may change from one analysis to the next.)
I have been searching extensively, and I have not found concrete answers. A few places have suggested using the estimates and standard errors to perform t-tests, but I am not sure that is a valid approach. (Further, it is unclear what degrees of freedom should be used for such tests.) I would hope that there would be a formal post-hoc test like Tukey's HSD for this (my understanding is that Tukey does not apply here because it is not a test of means; in any case, R throws an error if I try it with TukeyHSD(aov(irisLM), "Sepal.Width:Species")
).
There are two questions that seem near duplicates, but they don't quite address what I want. This question asks something very similar to what I want, but the answers either suggest changing which level is the baseline, or how to work around the interaction term (but not test it directly). This question has very good answers, but they appear to only apply when both predictors are factors. They do not work for me with a continuous variable.
If it helps, here is a version of the coefficient table that estimates the value of each slope/intercept, instead of showing the difference from the baseline:
without_intercept <-
lm(Petal.Length ~ Species/Sepal.Width - 1
, data = iris)
summary(without_intercept)
Gives
Coefficients:
Estimate Std. Error t value Pr(>|t|)
Speciessetosa 1.18292 0.50072 2.362 0.019494 *
Speciesversicolor 1.93492 0.48891 3.958 0.000118 ***
Speciesvirginica 3.51090 0.51049 6.877 1.72e-10 ***
Speciessetosa:Sepal.Width 0.08141 0.14520 0.561 0.575889
Speciesversicolor:Sepal.Width 0.83938 0.17540 4.785 4.18e-06 ***
Speciesvirginica:Sepal.Width 0.68632 0.17067 4.021 9.30e-05 ***
Best Answer
The lsmeans package does provide for this type of comparison. For the example at hand,