Solved – R mtcars dataset – linear regression of MPG in Auto and Manual transmission mode

hypothesis testinglinear modelrregression

I was looking at the mtcars dataset and exploring the relationship between MPG and the transmission modes (auto/manual). I decided to use the following linear models with the regressors specified in the below R code:

> data(mtcars)
> fit <- lm(mpg ~ I(wt - mean(wt)) + I(qsec - mean(qsec)) + factor(am), data = mtcars)
> round(summary(fit)$coeff, 4)
                     Estimate Std. Error t value Pr(>|t|)
(Intercept)           18.8979     0.7194 26.2707   0.0000
I(wt - mean(wt))      -3.9165     0.7112 -5.5069   0.0000
I(qsec - mean(qsec))   1.2259     0.2887  4.2467   0.0002
factor(am)1            2.9358     1.4109  2.0808   0.0467

From the above, the P-value of the slope coefficient "factor(am)1" is lower than 0.05 and hence we'd reject the null hypothesis and infer that cars of manual transmission has higher MPG value than those of manual transmission.

However, I also tried to explore the equivalent linear model without intercept terms, as per R code below:

> fit2 <- lm(mpg ~ I(wt - mean(wt)) + I(qsec - mean(qsec)) + factor(am)-1, data=mtcars)
> round(summary(fit2)$coeff, 4)
                     Estimate Std. Error t value Pr(>|t|)
I(wt - mean(wt))      -3.9165     0.7112 -5.5069    0e+00
I(qsec - mean(qsec))   1.2259     0.2887  4.2467    2e-04
factor(am)0           18.8979     0.7194 26.2707    0e+00
factor(am)1           21.8338     0.9438 23.1344    0e+00
> confint(fit2)
                          2.5 %    97.5 %
I(wt - mean(wt))     -5.3733342 -2.459673
I(qsec - mean(qsec))  0.6345732  1.817199
factor(am)0          17.4244109 20.371471
factor(am)1          19.9005360 23.767021

From the 95% Confidence Interval constructed, a car (of auto transmission) with average wt and qsec has a MPG interval [17.4244, 20.3714], while a car (of manual transmission) with average wt and qsec has a MPG interval [19.9005. 23.7670].

The two Confidence Intervals overlap, and we failed to reject the null hypothesis where statistically there's no difference between the MPG performance of cars (with auto transmission) and MPG performance of cars (with manual transmission).

I used two equivalent linear models and they gave me different conclusions. Could you enlighten me on what I may have missed here?

Best Answer

The models are equivalent. The misconception here is that overlapping confidence intervals do not mean you "failed to reject the null hypothesis." You do not compare the upper/lower bounds of confidence intervals and call it a day. Thomas' comment links1 to a good general explanation of why this is, though it doesn't directly apply to what's happening under the hood in a regression setting2.

Paul has linked to an explanation of how the lm t-statistic is calculated, and that test is a test of that coefficient against zero (e.g. $\beta_3 = 0$ and $\beta_4 = 0$). But you want to test $\beta_3 - \beta_4 = 0$.

Performing this test will then have the equivalent result to fit1 - there are quite a few R packages that do this type of test - the following uses car:

> library(car)
> linearHypothesis(fit2, "factor(am)0 = factor(am)1")
Linear hypothesis test

Hypothesis:
factor(am)0 - factor(am)1 = 0

Model 1: restricted model
Model 2: mpg ~ I(wt - mean(wt)) + I(qsec - mean(qsec)) + factor(am) - 
    1

  Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
1     29 195.46                              
2     28 169.29  1    26.178 4.3298 0.04672 *
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

For reference, this type of test (and more generalized versions of it) are called an absolutely silly number of different names depending on software, book, and field. (general linear hypothesis, linear contrasts, differences of least square means, regression Wald tests, more...), if you want to look up the theory and how the manual calculation is done.


1 imgur mirror of contents
2 This is because the estimated coefficients and their standard errors are not independent, and so you have to take that into account when running the test. If we just run a straight t-test (which is what the theory reduces down to in this case) without taking this into account with the given values results in the incorrect:

> delta = fit2$coefficients[4]-fit2$coefficients[3]
> deltstat = delta / sqrt(stderr[4]^2 + stderr[3]^2)
> delpval = 2*pt(deltstat, df=df.residual(fit2), lower.tail = FALSE)
> delpval
 0.01968865