Regression – Comprehensive Understanding of Residual Standard Error (RSE)

linear modelregressionresidualsstandard deviationstandard error

I am reading the book " An Introduction to Statistical Learning " and I have trouble understanding their explanation of RSE(Residual Standard Error ) . This is what the book says :

"The RSE is an estimate of the standard deviation of \epsilon . Roughly speaking, it is the average amount that the response will deviate from the true regression line. It is computed using the formula :"

What I don't understand here is definition of RSE as " Standard deviation of \epsilon " .

If Residual = y(i) – yHat(i)

and going by my assumption that \epsilon is also = y(i) – yHat(i)

Isn't the formula of RSE just computing the "root(squared mean)" of \epsilon ?

If \epsilon = y(i) – yHat(i) ,

Then , standard deviation of \epsilon would be
sum[y(i) - yHat(i) - mean(Y - Yhat)] / n-2 which is not what the above formula does .

So , Technically , I think RSE is just squared mean of the Residuals or \epsilon and would be wrong to call it " Standard deviation of \epsilon " , if we go by the actual formula for standard deviation .

Or , In my opinion this should mean the standard deviation of predicted yHat values rather than \epsilon .

So , is this a misuse of the term " standard deviation " in the book or Am I missing something ?

Please correct or help me understand .

Best Answer

I believe this is because in OLS the average of residuals is $0$. Hence the formula for standard deviation of residuals ($\epsilon)$ can be given as:

$$\sqrt\frac{\sum (\epsilon_i - \mu_{\epsilon})^2}{n-2}$$

but since in OLS $\mu_{\epsilon}=0$ by construction (see explanation of why this is so in this Mathematics.SE answer) you are left with:

$$\sqrt\frac{\sum (\epsilon_i - 0)^2}{n-2}= \sqrt\frac{\sum (\epsilon_i)^2}{n-2} \implies \sqrt\frac{\sum (y_i - \hat{y_i})^2}{n-2}$$

since $\epsilon_i=y_i - \hat{y_i}$

Related Solutions

Solved – Standard error of residuals and standard error of regression

The standardized residual is the residual divided by its standard deviation. Standardizing residual is a method for transforming data so that its mean is zero and standard deviation is one. If the distribution of the residuals is approximately normal, then $95\%$ of the standardized residuals should fall between $-2$ and $+2$. If many of the residuals fall outside of $+2$ or $–2$, then they could be considered unusual. However, about $5\%$ of the residuals could fall outside of this region due to chance. Consider the regression equation $\hat y_i = \beta_0 + \beta_1x_i + \epsilon_i$ and to compute standardized residuals,

Find the mean of residual, $\bar \epsilon = \frac{\sum_{i=1}^{n} \epsilon_i}{n}$
Calculate the standard deviation of the series, $SD_\epsilon = \sqrt \frac{\sum_{i = 1}^{n} (\epsilon_i - \bar \epsilon)^2}{n} $
Find standardized residual, $s_{\epsilon_i} = \frac{\epsilon_i- \bar \epsilon}{SD_\epsilon}$

Linear Regression – Understanding F Statistic, R Squared, and Residual Standard Error

The best way to understand these terms is to do a regression calculation by hand. I wrote two closely related answers (here and here), however they may not fully help you understanding your particular case. But read through them nonetheless. Maybe they will also help you conceptualizing these terms better.

In a regression (or ANOVA), we build a model based on a sample dataset which enables us to predict outcomes from a population of interest. To do so, the following three components are calculated in a simple linear regression from which the other components can be calculated, e.g. the mean squares, the F-value, the $R^2$ (also the adjusted $R^2$), and the residual standard error ($RSE$):

total sums of squares ($SS_{total}$)
residual sums of squares ($SS_{residual}$)
model sums of squares ($SS_{model}$)

Each of them are assessing how well the model describes the data and are the sum of the squared distances from the data points to fitted model (illustrated as red lines in the plot below).

The $SS_{total}$ assess how well the mean fits the data. Why the mean? Because the mean is the simplest model we can fit and hence serves as the model to which the least-squares regression line is compared to. This plot using the cars dataset illustrates that:

The $SS_{residual}$ assess how well the regression line fits the data.

The $SS_{model}$ compares how much better the regression line is compared to the mean (i.e. the difference between the $SS_{total}$ and the $SS_{residual}$).

To answer your questions, let's first calculate those terms which you want to understand starting with model and output as a reference:

# The model and output as reference
m1 <- lm(dist ~ speed, data = cars)
summary(m1)
summary.aov(m1) # To get the sums of squares and mean squares

The sums of squares are the squared distances of the individual data points to the model:

# Calculate sums of squares (total, residual and model)
y <- cars$dist
ybar <- mean(y)
ss.total <- sum((y-ybar)^2)
ss.total
ss.residual <- sum((y-m1$fitted)^2)
ss.residual
ss.model <- ss.total-ss.residual
ss.model

The mean squares are the sums of squares averaged by the degrees of freedom:

# Calculate degrees of freedom (total, residual and model)
n <- length(cars$speed)
k <- length(m1$coef) # k = model parameter: b0, b1
df.total <- n-1
df.residual <- n-k
df.model <- k-1

# Calculate mean squares (note that these are just variances)
ms.residual <- ss.residual/df.residual
ms.residual
ms.model<- ss.model/df.model
ms.model

My answers to your questions:

Q1:

This is thus actually the average distance of the observed values from the lm line?

The residual standard error ($RSE$) is the square root of the residual mean square ($MS_{residual}$):

# Calculate residual standard error
res.se <- sqrt(ms.residual)
res.se

If you remember that the $SS_{residual}$ were the squared distances of the observed data points and the model (regression line in the second plot above), and $MS_{residual}$ was just the averaged $SS_{residual}$, the answer to your first question is, yes: The $RSE$ represents the average distance of the observed data from the model. Intuitively, this also makes perfect sense because if the distance is smaller, your model fit is also better.

Q2:

Now I'm getting confused because if RSE tells us how far our observed points deviate from the regression line a low RSE is actually telling us "your model is fitting well based on the observed data points" --> thus how good our models fits, so what is the difference between R squared and RSE?

Now the $R^2$ is the ratio of the $SS_{model}$ and the $SS_{total}$:

# R squared
r.sq <- ss.model/ss.total
r.sq

The $R^2$ expresses how much of the total variation in the data can be explained by the model (the regression line). Remember that the total variation was the variation in the data when we fitted the simplest model to the data, i.e. the mean. Compare the $SS_{total}$ plot with the $SS_{model}$ plot.

So to answer your second question, the difference between the $RSE$ and the $R^2$ is that the $RSE$ tells you something about the inaccuracy of the model (in this case the regression line) given the observed data.

The $R^2$ on the other hand tells you how much variation is explained by the model (i.e. the regression line) relative the variation that was explained by the mean alone (i.e. the simplest model).

Q3:

Is it true that we can have a F value indicating a strong relationship that is NON LINEAR so that our RSE is high and our R squared is low

So the $F$-value on the other is calculated as the model mean square $MS_{model}$ (or the signal) divided by the $MS_{residual}$ (noise):

# Calculate F-value
F <- ms.model/ms.residual
F
# Calculate P-value
p.F <- 1-pf(F, df.model, df.residual)
p.F

Or in other words the $F$-value expresses how much of the model has improved (compared to the mean) given the inaccuracy of the model.

Your third question is a bit difficult to understand but I agree with the quote your provided.

Best Answer

Related Solutions

Solved – Standard error of residuals and standard error of regression

Linear Regression – Understanding F Statistic, R Squared, and Residual Standard Error

Related Question