Regression Analysis – How to Compare Slopes from Multiple Regression Models

data visualizationhypothesis testingmultivariate analysisr

I would like to test the difference in response of two variables to one predictor. Here is a minimal reproducible example.

library(nlme) 
## gls is used in the application; lm would suffice for this example
m.set <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "setosa")
m.vir <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "virginica")
m.ver <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "versicolor")

I can see that the slope coefficients are different:

m.set$coefficients
(Intercept) Petal.Width 
  4.7771775   0.9301727
m.vir$coefficients
(Intercept) Petal.Width 
  5.2694172   0.6508306 
m.ver$coefficients
(Intercept) Petal.Width 
   4.044640    1.426365 

I have three questions:

  1. How can I test the difference between slopes?
  2. How can I test the difference between residual variances?
  3. What is a simple, effective way to present these comparisons?

A related question, Method to compare variable coefficient in two regression models, suggests re-running the model with a dummy variable to differentiate the slopes, are there options that would allow the use of independent data sets?

Best Answer

How can I test the difference between slopes?

Include a dummy for species, let it interact with $P_i$, and see if this dummy is significant. Let $L_i$ be the sepal length and $P_i$ be the pedal width and $S_1, S_2, S_3$ be the dummy variables for the three species. The compare the model

$$ E(L_i) = \beta_0 + \beta_1 P_i $$

with the model that allows the effect of $P_i$ to be different for each species:

$$ E(L_i) = \alpha_0 + \alpha_1 S_2 + \alpha_2 S_3 + \alpha_4 P_i + \alpha_5 P_iS_2 + \alpha_6 P_i S_3 $$

The GLS estimators are MLEs and the first model is a submodel on the second, so you can use the likelihood ratio test here. The likelihoods can be extracted using the logLik function and the degrees of freedom for the test will be $4$ since you've deleted $4$ parameters to arrive at the submodel.

What is a simple, effective way to present the comparison?

I think the most appealing way would be to plot the regression lines for each species all on the same axes, maybe with error bars based on the standard errors. This would make the difference (or non-difference) between the species and their relationship to $P_i$ very apparent.

Edit: I noticed another question has been added to the body. So, I'm adding an answer to that:

How can I test the difference between residual variances?

For this, you'll need to stratify the data set and fit separate models since, the interaction-based model I suggested will constraint the residual variance to be the same in every group. If you fit separate models, this constraint goes away. In that case, you can still use the likelihood ratio test (the likelihood for the larger model is now calculated by summing the likelihoods from the three separate models). The "null" model depends on what you want to compare it with

  • if you only want to test the variance, while leaving the main effects in, then the "null" model should be the model with the interactions I've written above. The degrees of freedom for the test are then $2$.

  • If you want to test the variance jointly with the coefficients, then the null model should be the first model I've written above. The degrees of freedom for the test is then $6$.