Solved – How exactly are standardized residuals calculated

excelregressionresiduals

I'm working on a model for something and at the moment I prefer working solely in Excel. I've been double checking the results of the linear model in JMP, Minitab, and Statistica, and (more or less) been getting the same answers.

One thing that's coming out odd though is my standardized residuals, I'm getting much different answers than Excel's regression routine, and I know it has to do with how I am calculating them:

The standard deviation of our population varies relative to the output, so we work in terms of the relative standard deviation. We have an assumed %RSD of 5% (based on a lot of previous work, we also have reason to assume normality). From this I standardize the residuals by saying $\frac{(x-u)}{u\cdot RSD}$ where x = the observed value and u = the predicted value, so x-u = the residual.

Note that $u\cdot RSD = s$. Simple z-score. Problem is that the values Excel is giving me for the standardized residuals are much different than mine. This isn't exactly surprising since I am using a varying standard deviation. But their values don't seem to be tied to the reality of the data. One observation could be off by as much as 50% (around 6 standard deviations away) and the standardized residuals I'm given are only like 2 or 3.

Anyways, I'm having a really hard time finding out exactly how the residuals are standardized in a linear regression. Any help would be appreciated

Best Answer

The statistical tools in Excel have always been black boxes. There's nothing for it but to do some forensic reverse-engineering. By performing a simple regression in Excel 2013, involving the data $x=(1,2,3,4,5,6,7,8,9)$ and $y=(2,1,4,3,6,5,9,8,7)$, and requesting "standardized residuals" in the dialog, I obtained output that states

  • The "Standard Error" is $1.3723\ldots$.

  • There are $9$ observations.

  • The residuals $r_i$ are $(0.5333\ldots, -1.35, \ldots, 0.35, -1.533\ldots)$.

  • The corresponding "Standard Residuals" are $(0.4154\ldots, -1.0516\ldots, \ldots, 0.2726\ldots, -1.1944\ldots)$.

Since "standardized" values are typically numbers divided by some estimate of their standard error, I compared these "Standard Residuals" to the residuals and to the "Standard Error." Knowing that various formulas for variances are sums of squares of residuals $r_i$ divided variously by $n$ (the number of data) or $n-p$ (the number of data reduced by the number of variables, in this case two: one for the intercept and a second for the slope), I squared everything in sight. It became immediately obvious that Excel is computing the "Standard Residual" as

$$\frac{r_i}{\sqrt{\frac{1}{n-1}\sum_{i=1}^n r_i^2}}.$$

This formula reproduced Excel's output exactly--not even a trace of floating point roundoff error.

The denominator is what would be computed by Excel's STDEV function. For residuals from a mean, it is an unbiased estimate of their variance. For residuals in a regression, however, it has no standard meaning or value. It's garbage! But now you know how to compute it... .