Regression Analysis – How to Compare Slopes from Multiple Regression Models

data visualizationhypothesis testingmultivariate analysisr

I would like to test the difference in response of two variables to one predictor. Here is a minimal reproducible example.

library(nlme) 
## gls is used in the application; lm would suffice for this example
m.set <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "setosa")
m.vir <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "virginica")
m.ver <- gls(Sepal.Length ~ Petal.Width, data = iris, 
               subset = Species == "versicolor")

I can see that the slope coefficients are different:

m.set$coefficients
(Intercept) Petal.Width 
  4.7771775   0.9301727
m.vir$coefficients
(Intercept) Petal.Width 
  5.2694172   0.6508306 
m.ver$coefficients
(Intercept) Petal.Width 
   4.044640    1.426365

I have three questions:

How can I test the difference between slopes?
How can I test the difference between residual variances?
What is a simple, effective way to present these comparisons?

A related question, Method to compare variable coefficient in two regression models, suggests re-running the model with a dummy variable to differentiate the slopes, are there options that would allow the use of independent data sets?

Best Answer

How can I test the difference between slopes?

Include a dummy for species, let it interact with $P_i$, and see if this dummy is significant. Let $L_i$ be the sepal length and $P_i$ be the pedal width and $S_1, S_2, S_3$ be the dummy variables for the three species. The compare the model

$$ E(L_i) = \beta_0 + \beta_1 P_i $$

with the model that allows the effect of $P_i$ to be different for each species:

$$ E(L_i) = \alpha_0 + \alpha_1 S_2 + \alpha_2 S_3 + \alpha_4 P_i + \alpha_5 P_iS_2 + \alpha_6 P_i S_3 $$

The GLS estimators are MLEs and the first model is a submodel on the second, so you can use the likelihood ratio test here. The likelihoods can be extracted using the logLik function and the degrees of freedom for the test will be $4$ since you've deleted $4$ parameters to arrive at the submodel.

What is a simple, effective way to present the comparison?

I think the most appealing way would be to plot the regression lines for each species all on the same axes, maybe with error bars based on the standard errors. This would make the difference (or non-difference) between the species and their relationship to $P_i$ very apparent.

Edit: I noticed another question has been added to the body. So, I'm adding an answer to that:

How can I test the difference between residual variances?

For this, you'll need to stratify the data set and fit separate models since, the interaction-based model I suggested will constraint the residual variance to be the same in every group. If you fit separate models, this constraint goes away. In that case, you can still use the likelihood ratio test (the likelihood for the larger model is now calculated by summing the likelihoods from the three separate models). The "null" model depends on what you want to compare it with

if you only want to test the variance, while leaving the main effects in, then the "null" model should be the model with the interactions I've written above. The degrees of freedom for the test are then $2$.
If you want to test the variance jointly with the coefficients, then the null model should be the first model I've written above. The degrees of freedom for the test is then $6$.

Related Solutions

Solved – Standardized coefs in regression with a categorical predictor: there’s something wrong

First, recall that categorical variables are silently recoded into dummies in lm function.

MuMIn package standardizes these dummy variables (which is straightforward since they contain only 0's and 1's).

On the other hand, you did not standardize them when creating your df object.

This is why you get two different results.

To find difference between setosa and versicolor in units of Sepal Length SD, you need to standardize only Sepal Length, which is exactly what you did in your df object.

Solved – What test can I use to compare intercepts from two or more regression models when slopes might differ

I will answer the technical question, then try to talk you out of doing this.

The intercept is the predicted value when the abscissa is equal to zero. Hence, the intercepts in the example are obtained via:

> mod = lm(Sepal.Length ~ Petal.Width*Species, data = iris)

> library("emmeans")
> (emm = emmeans(mod, "Species", at = list(Petal.Width = 0)))
NOTE: Results may be misleading due to involvement in interactions
 Species    emmean    SE  df lower.CL upper.CL
 setosa       4.78 0.173 144     4.43     5.12
 versicolor   4.04 0.464 144     3.13     4.96
 virginica    5.27 0.509 144     4.26     6.28

Confidence level used: 0.95

... and the comparisons thereof can be tested this way:

> pairs(emm)
 contrast               estimate    SE  df t.ratio p.value
 setosa - versicolor       0.733 0.495 144  1.480  0.3037 
 setosa - virginica       -0.492 0.538 144 -0.915  0.6316 
 versicolor - virginica   -1.225 0.689 144 -1.779  0.1804 

P value adjustment: tukey method for comparing a family of 3 estimates

That said, it is an unusual instance that the intercept is an interesting or meaningful quantity to want to do inferences about. In many datasets, the intercept is a severe extrapolation because a predictor value of zero is distant from its observed values. Models are only approximations to the truth, and it is highly questionable that one should believe that the straight line you have fitted actually represents the trend at a distant point.

Thus, I urge you to re-think what you are trying to do and decide on what meaningful question you are really trying to answer.

Best Answer

Related Solutions

Solved – Standardized coefs in regression with a categorical predictor: there’s something wrong

Solved – What test can I use to compare intercepts from two or more regression models when slopes might differ

Related Question