R Programming – How to Use ANOVA and Linear Regression Together

anovarregression

I am new to statistics and I am trying to understand the difference between ANOVA and linear regression. I am using R to explore this. I read various articles about why ANOVA and regression are different but still the same and how the can be visualised etc. I think I am pretty there but one bit is still missing.

I understand that ANOVA compares the variance within groups with the variance between groups to determine whether there is or is not a difference between any of the groups tested. (https://controls.engin.umich.edu/wiki/index.php/Factor_analysis_and_ANOVA)

For linear regression, I found a post in this forum which says that the same can be tested when we test whether b (slope) = 0.
(Why is ANOVA taught / used as if it is a different research methodology compared to linear regression?)

For more than two groups I found a website stating:

The null hypothesis is: $\text{H}_0: µ_1 = µ_2 = µ_3$

The linear regression model is: $y = b_0 + b_1X_1 + b_2X_2 + e$

The output of the linear regression is, however, then the intercept for one group and the difference to this intercept for the other two groups.
(http://www.real-statistics.com/multiple-regression/anova-using-regression/)

For me, this looks like that actually the intercepts are compared and not the slopes?

Another example where they compare intercepts rather than the slopes can be found here:
(http://www.theanalysisfactor.com/why-anova-and-linear-regression-are-the-same-analysis/)

I am now struggling to understand what is actually compared in the linear regression? the slopes, the intercepts or both?

Best Answer

this looks like that actually the intercepts are compared and not the slopes?

Your confusion there relates to the fact that you must be very careful to be clear about which intercepts and slopes you mean (intercept of what? slope of what?).

The role of a coefficient of a 0-1 dummy in a regression can be thought of both as a slope and as a difference of intercepts, simply by changing how you think about the model.

Let's simplify things as far as possible, by considering a two-sample case.

We can still do one-way ANOVA with two samples but it turns out to essentially be the same as a two-tailed two sample t-test (the equal variance case).

Here's a diagram of the population situation:

two group means as regression, population situation

If $\delta = \mu_2-\mu_1$, then the population linear model is

$y = \mu_1 + \delta x + e$

so that when $x=0$ (which is the case when we're in group 1), the mean of $y$ is $\mu_1 + \delta \times 0 = \mu_1$ and when $x=1$ (when we're in group 2), the mean of $y$ is $\mu_1 + \delta \times 1 = \mu_1 + \mu_2 - \mu_1 = \mu_2$.

That is the coefficient of the slope ($\delta$ in this case) and the difference in means (and you might think of those means as intercepts) is the same quantity.

$ $

To help with concreteness, here are two samples:

Group1:  9.5  9.8 11.8
Group2: 11.0 13.4 12.5 13.9

How do they look?

sample plot

What does the test of difference in means look like?

As a t-test:

    Two Sample t-test

data:  values by group
t = -5.0375, df = 5, p-value = 0.003976
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.530882 -1.469118
sample estimates:
mean in group g1 mean in group g2 
             9.9             12.9 

As a regression:

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.9000     0.4502  21.991 3.61e-06 ***
groupg2       3.0000     0.5955   5.037  0.00398 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7797 on 5 degrees of freedom
Multiple R-squared:  0.8354,    Adjusted R-squared:  0.8025 
F-statistic: 25.38 on 1 and 5 DF,  p-value: 0.003976

We can see in the regression that the intercept term is the mean of group 1, and the groupg2 coefficient ('slope' coefficient) is the difference in group means. Meanwhile the p-value for the regression is the same as the p-value for the t-test (0.003976)