Solved – Fraction of variance unexplained

residualsvariance

I'm trying to understand fraction of unexplained variance on wikipedia: https://en.wikipedia.org/wiki/Fraction_of_variance_unexplained

$$\text{unexplained variance} (y, \hat y) =\frac{\operatorname{Var}\{y-\hat y\}}{\operatorname{Var}\{y\}}$$


Correct me if I'm wrong, it's basically find out the variance between the difference of the expected value $y$ and predicted value $\hat y$, where $y$ is the real value, and $\hat y$ is the predicted value.

In a linear regression setup: $\hat y=mx+c+e$, where $e$ is the error term (residual or unexplained error). So if $e$ is $0$ for all predictions, then the unexplained variance would be $0$, since there is no variance.

However, $\operatorname{unexplained variance} (y, \hat y) =\dfrac{\operatorname{Var}\{y-\hat y\}}{\operatorname{Var}\{y\}}= \frac{SS_\text{err}}{SS_\text{tot}}$ is the confusion part.

$$SS_\text{err}(\text{Residual Sum of Squares})=\sum_{i=0}^n (y_i-\hat y_i)^2$$

$SS_\text{tot}(\text{Total Sum Of Squares})=\sum_{i=0}^n (y_i-\bar y_i)^2$, where $\bar y_i$ is the average value.


$\operatorname{Var}\{y-\hat y\}=E[((y-\hat y)-\sum_{i=0}^n(y-\hat y))^2]$, somehow this does not look like Residual sum of squares?

Best Answer

You did most of the work yourself. From the final line you wrote:

$\operatorname{Var}\{y-\hat y\}=E[((y-\hat y)-\sum_{i=0}^n(y-\hat y))^2]$

This isn't quite correct, remember that the variance of some $X$ is the expected value of $X^2$ minus the expected value of $X$ squared, i.e. $var[X]=E[X^2]-E[X]^2$

Therefore the variance of $y-\hat y$ is:

$var[y-\hat y]= E[(y-\hat y)^2]-E[y-\hat y]^2$

$var[y-\hat y]= \frac{1}{n} \sum_{i=1}^n(y_i-\hat y_i)^2- \left(\frac{1}{n}\sum_{i=1}^n(y_i-\hat y_i)\right)^2$

In linear regression we always set the parameters so that they overestimate some $\hat{y}$ and underestimate some $\hat{y}$, the amount of overestimation and underestimation is equal. Therefore, when you sum over all $y-\hat{y}$ the total is $0$, this simplifies things a lot. Although I mentioned linear regression, most other forms of regression do this too.

$var[y-\hat y]= \frac{1}{n} \sum_{i=1}^n(y_i-\hat y_i)^2$

Clearly $var[y-\hat{y}]=\frac{1}{n}SS_{err}$

Likewise, $var[y]=\frac{1}{n}SS_{tot}$

Hence $\frac{var[y-\hat{y}]}{var[y]}= \frac{SS_{err}}{SS_{tot}}$