[Math] When residual standard error is equal to standard deviation of dependent variable in linear regression

regressionstatistics

I wonder when residual standard error is equal to standard deviation of dependent variable in linear regression? Could someone provide some information on this topic and explanation?

Best Answer

When you perform regression, the model utilizes the parameters to obtain estimate predictions. These can be interpreted as the average of observed responses that we could obtain by replicating the study with the same X values an infinite number of times.

The difference between these "predicted" values and the "observed" ones (used to fit the model) are defined "residuals". In ordinary least squares regression, it is assumed that these residuals are individually described by a normal distribution with mean $0$ and a certain standard deviation.

The "residual standard error" (a measure given by most statistical softwares when running regression) is an estimate of this standard deviation, and substantially expresses the variability in the dependent variable "unexplained" by the model. Accordingly, decreasing values of the RSE indicate better model fitting, and vice versa. The relationship between the RSE and the SD of the dependent variable is $RSE=\sqrt{1-R^2}SD$, where $R^2$ is the coefficient of determination. Also note that $R^2$ expresses the proportion of the variance in the dependent variable that is "explained" by the model.

Thus, the RSE can be equal to the SD of the dependent variable only in a theoretical model where $R^2=0$, i.e., a model with no relationship between the dependent variable and the independent ones. In most of real models, since $R^2>0$, the RSE is lower than the SD.