[Math] why is R-square NOT well-defined for a regression without a constant term

linear regressionMATLABregressionstatistics

We often have a constant term in a linear regression such as, $y=\beta_1 x+\beta_0$. The $R^2$ or the coefficient of determination, is defined as $R^2=1-\frac{SS_{res}}{SS_{tot}}$, where $SS_{res}$ and $SS_{tot}$ are residual sum of squares and total sum of squares. Let's say now we drop the constant term $\beta_0$, the above formula for calculating $R^2$ still works, or is it?

If we try this in Matlab, we receive a Warning that says, R-square and the F statistic are not well-defined unless X has a column of ones. What does "not well-defined" mean here? It seems okay.

Best Answer

In a nutshell, the definition of the $R$ squared statistic is based on the orthogonal decomposition of the total sum of squares, namely $$ SS_{total}=SS_{res}+SS_{reg} $$ dividing by the $SS_{total}$, you will get
$$ \frac{SS_{reg}}{SS_{total}} = \frac{SS_{total}}{SS_{total}} - \frac{SS_{res}}{SS_{total}} = 1 - \frac{SS_{res}}{SS_{total}} = R^2. $$ Now, this decomposition is basically a consequence of the inclusion of the constant term $\beta_0$, hence from the first order condition you'll have $\sum_{i=1}^n \hat{y}_i = 0$ which guaranties the orthogonality of $e$ and $\hat{y}$. A very good and elaborated answer you can find here.

Related Question