Solved – Comparing residuals between OLS and non-OLS regressions

loss-functionsregression

Suppose you want to estimate a linear model: ($n$ observations of the response, and $p+1$ predictors)
$$\mathbb{E}(y_i) = \beta_0 + \sum_{j=1}^p \beta_j x_{ij}$$

One way to do this is through the OLS solution, i.e. choose the coefficients so that the sum of square errors is minimum:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left( y_i – \beta_0 – \sum_{j=1}^p \beta_j x_{ij} \right)^2 $$

Alternatively, you could use another loss function, like the sum of the absolute deviations, so that:

$$(\beta_0,\beta_1,\cdots,\beta_p)^T = \underset{\beta_0,\beta_1,\cdots,\beta_p}{\arg \min} \sum_{i=1}^{n} \left| y_i – \beta_0 – \sum_{j=1}^p \beta_j x_{ij} \right| $$

Suppose you have found the parameters for the two models, and want to choose the model with the smallest value of the loss function. How can you compare the minimum values attained by the loss functions in general? (i.e. not just this specific case – we could also try other $L_p$ based loss functions) There seems to be a difference in the scale of the functions – one deals with squares while the other does not.

Best Answer

(Converting my comment into an answer.)

I think you cannot compare the fits that come from different loss functions, because they are answers to different questions. Once you decide that a given loss function is the appropriate one for your situation, the fit follows from that decision. You cannot fold it back to validate the choice of loss function without this becoming circular. If you have some other criterion that both loss functions can be understood to be encompassed by, you could use that, but you need to have defined that in advance.