Solved – the difference between “coefficient of determination” and “mean squared error”

r-squaredregression

For regression problem, I have seen people use "coefficient of determination" (a.k.a R squared) to perform model selection, e.g., finding the appropriate penalty coefficient for regularization.

However, it is also common to use "mean squared error" or "root mean squared error" as a measure of regression accuracy.

So what is the main difference between these two? Could they be used interchangeably for "regularization" and "regression" tasks?
And what are the main usage of each in practice, such as in machine learning, data mining tasks?

Best Answer

$R^2=1-\frac{SSE}{SST}$, where $SSE$ is the sum of squared error (residuals or deviations from the regression line) and $SST$ is the sum of squared deviations from the dependent's $Y$ mean.

$MSE=\frac{SSE}{n-m}$, where $n$ is the sample size and $m$ is the number of parameters in the model (including intercept, if any).

$R^2$ is a standardized measure of degree of predictedness, or fit, in the sample. $MSE$ is the estimate of variance of residuals, or non-fit, in the population. The two measures are clearly related, as seen in the most usual formula for adjusted $R^2$ (the estimate of $R^2$ for population):

$R_{adj}^2=1-(1-R^2)\frac{n-1}{n-m}=1-\frac{SSE/(n-m)}{SST/(n-1)}=1-\frac{MSE}{\sigma_y^2}$.

Best Answer

Related Solutions

Solved – Comparison of predictive models

Solved – What are the assumptions of ridge regression and how to test them

What is an assumption of a statistical procedure?

Assumptions of penalized regression techniques

But what about the mathematical result that ridge regression always beats OLS?

Okay, but how do I know if I can apply ridge regression or not?

Related Question