Solved – Understanding OLS regression slope formula

data visualizationinterpretationintuitionleast squaresregression

I understand the intuition behind the OLS model: to minimize the squared residuals. Is there a way, however, to interpret the formula for the slope of the regression line intuitively? That is $m = r(sd_y/sd_x)$. I know the formula gives me the slope, but how? Put in another way, what is the most intuitive way to visualize or think about the slope formula for the regression line?

Best Answer

The correlation coefficient $r$ gives you a measurement between $-1$ to $+1$. This gives you information about the strength of the linear relationship that can be interpreted independently of the scale of the two variables. Again, when $sd_y=sd_x$, then $m=r$.

So, $r$ is the slope of the regression line when both $X$ and $Y$ are expressed as z-scores (i.e. standardized). Remember that $r$ is the average of cross products, that is,

$r=\frac{\sum Z_xZ_y}{N}$

So, it turns out that $r$ is the slope of $Y$ on $X$ in z-score form. This correlation coefficient tells us how many standard deviations that $Y$ changes when $X$ changes $1$ standard deviation. When there is no correlation ($r = 0$), $Y$ changes zero standard deviations when $X$ changes $1$ standard deviation. When $r$ is $1$, then $Y$ changes $1$ standard deviation when $X$ changes $1$ standard deviation.

The regression $m$ weight is expressed in raw score units rather than in z-score units. To move from the correlation coefficient to the regression coefficient, we can simply transform the units:

$m=r(sd_y/sd_x)$

This says that the regression weight is equal to the correlation times the standard deviation of $Y$ divided by the standard deviation of $X$. Note that $r$ shows the slope in z-score form, that is, when both standard deviations are $1.0$, so their ratio is $1.0$. But we want to know the number of raw score units that $Y$ changes and the number that $X$ changes. So to get new ratio, we multiply by the standard deviation of $Y$ and divide by the standard deviation of $X$, that is, multiply $r$ by the raw score ratio of standard deviations.

Related Question