Solved – What does $\beta$ tell us in linear regression analysis

correlationregression

I came across this note from a book: "..the correlation between the active and passive portfolios is greater when the $\beta$ of the active portfolio is higher.."

The author runs regression of active portfolio on passive portfolio.

This does not make sense to me. If I understand correctly, beta in a single variable linear regression analysis gives us the best relationship/fit between two variables (or the slope of independent vector on dependent vector), but it is not indicative of strength of that relationship. We need to look at correlation or $R^2$ value to determine how good is the relationship between the variables. As long as beta is statistically significant, we can look at $R^2$ to determine the strength of relationship. Here are couple of examples:

  1. Consider three variables $A_1$, $A_2$ (dependent variables) and $B$ (independent variable). If $A_1$ and $B$ have $\beta=2.0$; correlation=0.8, and $A_2$ and $B$ have $\beta=0.5$; correlation=0.8, then both $A_1$ and $A_2$ are equally well explained by $B$.

  2. Consider two series: A=$\{1,2,3,4,5,6,7,8,9,10\}$ and $B=2\times A$. In this case, correlation between A and B is 1. However, regression of $A$ on $B$ gives $\beta$ as 0.5 and $B$ on $A$ gives 2.0. However, each variable is completely explained by the other.

Is my understanding correct? Please highlight if I am missing something here.

Best Answer

I think your understanding of linear regression is fine. One thing that may interest you to know is that if both of your variables (e.g., A1 and B) are standardized, the $\beta$ from a simple regression will equal the r-score (i.e., the correlation coefficient, which when squared gives you the model's $R^2$), but this is not the issue here. I think what the book is talking about is the measure of volatility used in finance (which is also called 'beta', unfortunately). Although the name is the same, this is just not quite the same thing as the $\beta$ from a standard regression model. One other thing, neither of these is terribly closely related to beta regression, which is a form of the generalized linear model when the response variable is a proportion that is distributed as beta. I find it unfortunate, and very confusing, that there are terms (such as 'beta') that are used differently in different fields, or where different people use the same term to mean very different things (and that sometimes people use different terms to mean the same thing, as well), but these are just facts of life.

Related Question