ANOVA – Interpreting R Output

anovainterpretationrself-study

I have a question on how a statistician would normally interpret an anova output. Say I have anova output from R.

> summary(fitted_data)

Call:
lm(formula = V1 ~ V2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.74004 -0.33827  0.04062  0.44064  1.22737 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  2.11405    0.32089   6.588  1.3e-09 ***
V2           0.03883    0.01277   3.040  0.00292 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.6231 on 118 degrees of freedom
Multiple R-squared: 0.07262,    Adjusted R-squared: 0.06476 
F-statistic:  9.24 on 1 and 118 DF,  p-value: 0.002917 

> anova(fit)
Analysis of Variance Table

Response: V1
           Df Sum Sq Mean Sq F value   Pr(>F)   
V2          1  3.588  3.5878  9.2402 0.002917 **
Residuals 118 45.818  0.3883                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

From the above, I guess the most important value is Pr(>F), right? So this Pr, is less than 0.05 (95% level). How should my "explain" this? Do I explain it in "association", ie, V2 and V1 are associated (or not) ? or in terms of "significance"? I always felt that I couldn't understand when people say "This value is significant….". So what is "significant"? Is there a more intuitive form of explanation? like "I am 95% confident that …." .

Also, is the Pr value the only important piece of information? or can i also look at residuals and the rest of the output to "explain" the result? thanks

Best Answer

From the above, i guess the most important value is Pr(>F), right?

Not to me. The idea that the size of the p-value is the most important thing in an ANOVA is pervasive but I think almost entirely misguided. For a start the p-value is a random quantity (moreso when the null is true, when it is uniformly distributed between 0 and 1). As such a lower p-value may not be particularly informative in any case, but even beyond the issue of the size of the p-value things like effect sizes are generally much more important.

You may like to read around a bit

Cohen, J. (1990). Things I have learned (so far), American Psychologist 45, 1304-1312.

Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/

http://www.biostat.jhsph.edu/~cfrangak/cominte/goodmanvalues.pdf

http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Ongoing_Controversy

--

I didn't really address interpreting the output when a p-value is below $\alpha$. Without saying exactly what hypothesis is being considered, mentioning "significance" seems pointless. In that sense, then it would be preferable to mention the conclusion that results from the rejection of the null.

In the case you present, it's hard to interpret without context (I don't even know if V2 is categorical or continuous), but if V2 was continuous I might say something about concluding there's an association between V1 and V2. If V2 was categorical (0-1), I might say something about differences in mean V1 for the two categories, and so on.

Now some things NOT to say:

is less than 0.05 (95% level)

Never call p<0.05 "significant at the 95% level". That's wrong. Nor indeed should you call it 95% anything else.

like "I am 95% confident that ...." .

Never say that either. It's wrong.