Regression – Why Define Coefficient of Determination as 1 – RSS/TSS?

r-squaredregressionregression coefficients

Based on wiki, the definition of coefficient of determination is defined as $1 – RSS/TSS$, where RSS is the residual sum of squares ($\sum(y-\hat{y})^2$) and TSS is the total sum of squares ($\sum(y-\bar{y})^2$), with $\hat{y}$ the model prediction and $\bar{y}$ the sample mean.

It can be shown that $1 – RSS/TSS = ESS/TSS$, where ESS is the explained sum of squares ($\sum(\hat{y}-\bar{y})^2$). It is remarked that only in some cases such as simple linear regression with intercept (are there other cases?), this equality holds.

The right hand side (ESS/TSS) is obviously positive, but the left hand size is not guaranteed to be positive. That is why in some cases we see negative coefficient of determinations. However, almost all the sources I read online define coefficient of determination as $1 – RSS/TSS$.

My question is that, why not define coefficient of determination as $ESS/TSS$ instead?
In this way it is always positive.

My attempt. I guess that the disadvantage of the alternative definition $ESS/TSS$ is that it can be larger than 1. (The advantage of the normal definition $1-RSS/TSS$ is that it is always less than 1 as it is 1 subtracts something positive). So I compute
$$
TSS – ESS = \sum_i (y_i – \bar{y})^2 – \sum_i (\hat{y}_i – \bar{y})^2 \\
= \sum_i (y_i^2 – 2\bar{y}(y_i – \hat{y}_i) – \hat{y}_i^2).
$$

Assuming

  1. the mean of the errors is zero (so that sample mean is the same as mean of predicted values) and
  2. the error has zero correlation with the predicted value,
    we then have
    $$
    TSS – ESS = \sum_i ((y_i – \hat{y}_i)^2 + 2(\hat{y}_i – \bar{{y}})(y_i – \hat{y}_i))\\
    = RSS + C = RSS,
    $$

    where the term $C$ is proportional to the correlation $Corr(\hat{y}, \epsilon)$, where $\epsilon_i = y_i – \hat{y}_i$.
    It seems we always have $TSS = ESS + RSS$ suppose the two assumptions hold. So, is it correct that the two assumptions 1 and 2 are so strong that they are often not satisfied so that $ESS/TSS > 1$?

Best Answer

It can be shown that $1 - RSS/TSS = ESS/TSS$, .... It is remarked that only in some cases such as simple linear regression with intercept (are there other cases?), this equality holds.

This is not true. Remaining in the realm of linear regression this equality holds quite generally. Indeed it holds in any multiple regression if the constant is included, not only in the simple (one regressor) case. In this setting $R^2$ is bounded between $0$ and $1$ and such a thing is good even to facilitate communications, especially among non-specialist people.

The right hand side (ESS/TSS) is obviously positive, but the left hand size is not guaranteed to be positive. That is why in some cases we see negative coefficient of determinations. However, almost all the sources I read online define coefficient of determination as $1 - RSS/TSS$.

My question is that, why not define coefficient of determination as $ESS/TSS$ instead? In this way it is always positive.

The presence of the constant is important and involves the definition and properties of: TSS, ESS, RSS. See here https://math.stackexchange.com/questions/2398194/why-is-r-square-not-well-defined-for-a-regression-without-a-constant-term

Note that if the intercept is not included It is usual to re-open discussion of the definitions. Then it is true that we can achieve forms similar to:

$R^2 = 1-RSS/TSS$

or

$R^2 = ESS/TSS$

but the two are no longer equal in general.

So, as you said, in such a situation it can be shown that the form like $R^2 = 1-RSS/TSS$ can return negative values. However, it can be shown that even the form like $R^2=ESS/TSS$ returns unbounded values: indeed, they can be greater than $1$ (for some insight read here: Can $R^2$ be greater than 1?). So your proposal is not a panacea: in both cases the readings is not so easy.

Now I see that you see the drawbacks of the form $ESS/TSS$. So you can simply consider that the form $1-RSS/TSS$ is preferable because it explicitly considers residuals, a desirable property for a definition of a fitting measure.