Solved – Does $R^2$ interpretable as the proportion of *variation* explained, or the proportion of *variance* explained

regressionterminology

What is the correct term to be used in the title phrase? Wikipedia says:

$\dots$ The coefficient of determination $R^2$ is a measure of the global fit of
the model. Specifically, $R^2$ is an element of $[0, 1]$ and represents the
proportion of variability in $Y_i$ that may be attributed to some linear
combination of the regressors (explanatory variables) in X.

$R^2$ is often interpreted as the proportion of response variation
"explained" by the regressors in the model. Thus, $R^2 = 1$ indicates
that the fitted model explains all variability in y, while $R^2 = 0$
indicates no 'linear' relationship (for straight line regression, this
means that the straight line model is a constant line (slope = 0,
intercept = $\bar{y}$) between the response variable and regressors). An
interior value such as $R^2 = 0.7$ may be interpreted as follows:
"Seventy percent of the variance in the response variable can be
explained by the explanatory variables. The remaining thirty percent
can be attributed to unknown, lurking variables or inherent
variability."

Are any of variability, variance, or variation malapropisms? Which is standard terminology?

Best Answer

I've heard all three but I don't particularly like any of them (with variance being at the bottom of the list ). Really, I prefer calling $R^{2}$ a measure of fit for our model to the data (and if you really want to use $R^{2}$, use its adjusted version instead.)

The reason I don't like calling it a variance is because only one term in the expression for it proportional to the variance. It's form is $1-\frac {SS_{E}}{SS_{T}} $, right? In that expression only $SS_{T}$ is proportional to a variance. Considering, as the comment claimed, that variance is a well defined formal quantity it doesn't seem accurate to call what we get from $R^{2} $ a proportion of variance explained.

Related Question