Solved – Can the coefficient of determination (R-squared) for a linear regression ever be zero

r-squaredrandom-generationregression


EDIT NOTE: I'm mostly concerned with linear regression with unconstrained y-intercept, but reading about constrained y-intercept, if relevant, is also helpful.


I noticed that for linear regression, the coefficient of determination ($R^2$) can be as high as 1 (if the linear regression fits the given data perfectly). But, can it ever be zero (can there be two variables whose linear $R^2$ is zero)?

I grabbed some random data for dependent variable data (presumably not pseudo-random but ACTUAL random), and I plotted them against an independent variable with the linear regression; you can never really get exactly zero for $R^2$ (so I guess I'm answering my own question).
random data with non-zero R-squared
Why is that? Does this mean that ANY two variables do have SOME linear correlation (no matter how minuscule or small this relation is)?, i.e. can I look at ANY two variables and say that there is SOME linear correlation between them?

So, the matter is NOT whether two variables are linearly correlated, but rather the EXTENT that this linear relationship explains the correlation between the two variables… right? Doesn't this apply to all other regression models (non-linear)?

Best Answer

Yes, when ever there is no linear relationship between variables. For example, when either X or Y are constant, or where each high-low data points are balanced by high-high, or low-low data points. For example, $X=(1,1,2,2)$, $Y=(1,2,1,2)$, or $X=(-2,1,0,1,2)$, $Y=X^2$

Here are some examples: all of these have correlation of 0, and hence a coefficient of determination of zero:

No correlation graphs

It's worth noting that as soon as there's any randomness, then there's almost certainly going to be some correlation. With a small sample size, that correlation might be quite high - wouldn't be unusual to have a correlation as high as $\pm$0.3, with a sample size of 20.