Solved – How to quantify bias and variance in simple linear regression

biasmachine learningmseregressionvariance

In terms of predictive modeling, how can I calculate the bias and variance in a given model (e.g. simple linear regression)? I know that the bias and variance of an estimator (linear regression model) for a single prediction is:

$Bias(\hat Y)=E \hat Y-Y$

$Var(\hat Y) = E(E\hat Y-\hat Y)^2$

and that the Mean Squared Error can be decomposed into

$MSE = Bias^2 + Var + error$

But these are all theoretical formulas. I can't seem to apply any of these quantities to evaluate my linear regression model. To my understanding, these quantities can only be calculated if I know the true distribution of $\hat Y$ for a given X, which we never do when we are working with real, sampled data. From this question, I learnt that the bias for a single prediction isn't something that you can calculate because you need to know the true distribution of our estimator (model). As for the variance of my estimator, I still don't know whether it can be calculated or not.

Let's say I have $\hat Y = 0.3 + 0.7X$. For X=5, I know that the actual value is $Y=4$, while my estimator/model predicts $\hat Y=3.8$. For this single prediction, can I calculate the variance of my model? My goal is to decompose the MSE for this single prediction into bias and variance.

My question is then, how are these formulas useful for practical applications if we aren't able to quantify them?

Best Answer

Quoting from ISLR, pages 33 to 34, on the bias-variance tradeoff:

... the expected test MSE, for a given value $x_0$, can always be decomposed into the sum of three fundamental quantities: the variance of $\hat f(x_0)$, the squared bias of $\hat f(x_0)$ and the variance of the error terms $\epsilon$. That is, $$ E\left( y_0 − \hat f(x_0)\right)^2 = \text{Var}\left( \hat f(x_0) \right) + \left[ \text{Bias} \left( \hat f(x_0) \right) \right]^2 + \text{Var}(\epsilon)$$ Here the notation $E\left( y_0 − \hat f(x_0)\right)^2$ defines the expected test MSE, and refers to the average test MSE that we would obtain if we repeatedly estimated $f$ using a large number of training sets, and tested each at $x_0$. The overall expected test MSE can be computed by averaging $E\left( y_0 − \hat f(x_0)\right)$ over all possible values of $x_0$ in the test set.

So the random variable in this context is related to the predicted fitted values at a series of given values of $x_0$ over a series of training sets.

If you are willing to apply the bootstrap principle--the population is to your data set as your data set is to bootstrapped samples from it--Dave's initial sense of how to proceed was correct. You repeat the modeling process on a set of bootstrapped resamples from your data set, representing multiple training sets. You evaluate bias and variance and error with respect to the full data set, representing the population. You do that over the range of $x_0$ values of interest, and average.

This is only an estimate of the true bias and variance of your modeling process, but it might be the closest that you can get without having access to the full population for testing and multiple samples from the population for training.

Related Question