I'm reading a paper and the author wrote:
The effect of A,B, C on Y was studied through the use of multiple regression analysis. A,B,C were entered into the regression equation with Y as the dependent variable. The analysis of variance is presented in Table 3.
The effect of B on Y was significant, with B correlating .27 with Y.
English is not my mother tongue and I got really confused here.
First, he said he would run a regression analysis, then he showed us the analysis of variance. Why?
And then he wrote about the correlation coefficient, is that not from correlation analysis? Or this word could also be used to describe regression slope?
Best Answer
Analysis of variance (ANOVA) is just a technique comparing the variance explained by the model versus the variance not explained by the model. Since regression models have both the explained and unexplained component, it's natural that ANOVA can be applied to them. In many software packages, ANOVA results are routinely reported with linear regression. Regression is also a very versatile technique. In fact, both t-test and ANOVA can be expressed in regression form; they are just a special case of regression.
For example, here is a sample regression output. The outcome is miles per gallon of some cars and the independent variable is whether the car was domestic or foreign:
You can see the ANOVA reported at top left. The overall F-statistics is 13.18, with a p-value of 0.0005, indicating the model being predictive. And here is the ANOVA output:
Notice that you can recover the same F-statistics and p-value there.
Assuming the analysis involved using only B and Y, technically I would not agree with the word choice. In most of the cases, slope and correlation coefficient cannot be used interchangeably. In one special case, these two are the same, that is when both the independent and dependent variables are standardized (aka in the unit of z-score.)
For example, let's correlate miles per gallon and the price of the car:
And here is the same test, using the standardized variables, you can see the correlation coefficient remains unchanged:
Now, here are the two regression models using the original variables:
... and here is the one with standardized variables:
As you can see, the slope of the original variables is -0.0009192, and the one with standardized variables is -0.4686, which is also the correlation coefficient.
So, unless the A, B, C, and Y are standardized, I would not agree with the article's "correlating." Instead, I'd just opt of a one unit increase in B is associated with the average of Y being 0.27 higher.
In more complicated situation, where more than one independent variable is involved, the phenomenon described above will no longer be true.