I have a question on how a statistician would normally interpret an anova output. Say I have anova output from R.
> summary(fitted_data)
Call:
lm(formula = V1 ~ V2)
Residuals:
Min 1Q Median 3Q Max
-2.74004 -0.33827 0.04062 0.44064 1.22737
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.11405 0.32089 6.588 1.3e-09 ***
V2 0.03883 0.01277 3.040 0.00292 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6231 on 118 degrees of freedom
Multiple R-squared: 0.07262, Adjusted R-squared: 0.06476
F-statistic: 9.24 on 1 and 118 DF, p-value: 0.002917
> anova(fit)
Analysis of Variance Table
Response: V1
Df Sum Sq Mean Sq F value Pr(>F)
V2 1 3.588 3.5878 9.2402 0.002917 **
Residuals 118 45.818 0.3883
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
From the above, I guess the most important value is Pr(>F), right? So this Pr, is less than 0.05 (95% level). How should my "explain" this? Do I explain it in "association", ie, V2 and V1 are associated (or not) ? or in terms of "significance"? I always felt that I couldn't understand when people say "This value is significant….". So what is "significant"? Is there a more intuitive form of explanation? like "I am 95% confident that …." .
Also, is the Pr value the only important piece of information? or can i also look at residuals and the rest of the output to "explain" the result? thanks
Best Answer
Not to me. The idea that the size of the p-value is the most important thing in an ANOVA is pervasive but I think almost entirely misguided. For a start the p-value is a random quantity (moreso when the null is true, when it is uniformly distributed between 0 and 1). As such a lower p-value may not be particularly informative in any case, but even beyond the issue of the size of the p-value things like effect sizes are generally much more important.
You may like to read around a bit
Cohen, J. (1990). Things I have learned (so far), American Psychologist 45, 1304-1312.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997-1003.
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1119478/
http://www.biostat.jhsph.edu/~cfrangak/cominte/goodmanvalues.pdf
http://en.wikipedia.org/wiki/Statistical_hypothesis_testing#Ongoing_Controversy
--
I didn't really address interpreting the output when a p-value is below $\alpha$. Without saying exactly what hypothesis is being considered, mentioning "significance" seems pointless. In that sense, then it would be preferable to mention the conclusion that results from the rejection of the null.
In the case you present, it's hard to interpret without context (I don't even know if V2 is categorical or continuous), but if V2 was continuous I might say something about concluding there's an association between V1 and V2. If V2 was categorical (0-1), I might say something about differences in mean V1 for the two categories, and so on.
Now some things NOT to say:
Never call p<0.05 "significant at the 95% level". That's wrong. Nor indeed should you call it 95% anything else.
Never say that either. It's wrong.