Solved – ANOVA test p-value interpretation

anovamultiple regressionr

model = lm(Sepal.Width ~ Sepal.Length + Petal.Length, data = iris)
> summary(model)

Call:
lm(formula = Sepal.Width ~ Sepal.Length + Petal.Length, data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.86412 -0.21142  0.00315  0.20406  0.73806 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   1.03807    0.28817   3.602 0.000431 ***
Sepal.Length  0.56119    0.06533   8.590 1.16e-14 ***
Petal.Length -0.33527    0.03065 -10.940  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3235 on 147 degrees of freedom
Multiple R-squared:  0.4564,    Adjusted R-squared:  0.449 
F-statistic: 61.71 on 2 and 147 DF,  p-value: < 2.2e-16

I have a regression model $Y_i = \beta_0 + \beta_1 * SepalLength_i + \beta_2 * PetalLength_i + \epsilon_i$. Looking at the Pr(>|t|) column, I know that these are the p-values of a t-test for the significance of a corresponding $\beta_i$. For instance, the p value 1.16e-14 corresponds to a t-test for $H_0: \beta_1 = 0$ v.s $H_1: \beta_1 \neq 0$. As for the p-value associated with the F statistic (p-value: < 2.2e-16) that corresponds to the test of $H_0: \beta_1 = \beta_2 = 0$ v.s $H_1:$ at least one of $\beta_1 or \beta_2 \neq 0$.

> anova(model)
Analysis of Variance Table

Response: Sepal.Width
              Df  Sum Sq Mean Sq F value  Pr(>F)    
Sepal.Length   1  0.3913  0.3913   3.738 0.05511 .  
Petal.Length   1 12.5284 12.5284 119.689 < 2e-16 ***
Residuals    147 15.3872  0.1047                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Looking at the ANOVA table, how do I interpret the last column with the p-values? What hypotheses am I testing here?

Best Answer

That F table is built from sequential sums of squares (Type I); you're effectively considering a test of each effect given the previous ones are in the model. So you test Sepal length as if Petal length were not there (compare it with a regression just on Sepal length), but you test Petal length given Sepal length is present.

By contrast the earlier regression table would correspond to each coefficient being tested with all the other terms in the model no matter what order they're in.

Note that the last row of both tables should give the same p-value (in your example, that's the "petal length" variable -- and they do)

Related Question