Solved – Comparing two regressions models by looking at $R^2$

goodness of fitr-squaredregression

I recently found out that there are some situations where we can not use $R^2$ to compare goodness of fit between two regression models.

Let

$\begin{aligned}Y &= \beta_1 X_1 + \beta_2 X_2 + u \\ \ln(Y) &= \beta_1 X_1 + \beta_2 X_2\end{aligned}$

be two regressions. I want to compare these two by looking at their $R^2$. At first glance I thought that since $R^2$ explains us how close the explained variation is to the total variation, it could be a good way of comparing models. But for some reason I can not use $R^2$ in this case because their "ESS" are different. I didn't get why that's a reason.

Best Answer

Are $u$ the residuals; is the second model also supposed to have residuals?

Choosing whether to log-transform the response variable should be based on the type of relationship you expect between the explanatory and response variable. The explained variance ($R^2$) can be low or high despite the relationship being logarithmic.

About R-squared
$R^2$ is not a model selection criterium because it does not take into account the degrees of freedom used by the model (i.e. the number of parameters). However, since these models have the same number of parameters, this would not be a problem. More important is that they describe something completely different:

At first glance I thought that since $R^2$ explains us how close the explained variation is to the total variation

This is true, $R^2$ is the amount of variance in the response variable that is explained by the model. However, the $R^2$ of the first model describes the variance explained of $Y$, which is something else than in the second model ($\ln(Y)$). This is probably why you found that they cannot be compared this way: ESS from your question is Explained Sum of Squares.

Related Question