R-Squared – Why Is the Coefficient of Determination Less Than or Equal to 1?

r-squaredself-study

I have been reading about the Coefficient of Determination and am wondering why it is necessarily less than or equal to 1.

formula

I understand that RSS is the sum of the difference between each dependent variable and it's prediction, squared.

So it makes sense that RSS will be zero if the independent variables perfectly predict the dependent ones.

I understand that TSS is the sum of the difference between the dependent variable and the mean, squared.

But why is RSS/TSS necessarily less than 1?

Best Answer

A simple thought experiment will help answer your question.

TSS is the sum of (Yi - meanY)^2.

Let us assume that we have a regression line with the value of the mean. If the mean is 5, it will simply be a horizontal line with the value of 5 throughout. And, the squared deviations from this line will be (Yi - meanY)^2 because it represents the mean. Now, we have 2 scenarios here:

  • The regression line that we estimate and predict is the same as this mean line. In that case, TSS = RSS because both the lines are the same. The squared deviations from the mean (Yi - meanY) and squared residuals (Yi - Yhat) will be exactly the same because the meanY and Yhat are the same.
  • The regression line that we estimate is different from the mean line. In that case, RSS < TSS because the model minimizes squared residuals by definition. Since the squared residuals (Yi - Yhat) are minimum by definition, they will always be less than the squared deviations from mean (Yi - meanY).

So, we only have 2 possible scenarios: either RSS = TSS or RSS < TSS. This implies that R-square will always be between 0 and 1.