Solved – does rstandard standardize in z

diagnosticlmrresidualsstandardization

I'm new to R, so please be gentle.

I was under the impression that rstandard(model) returns the z-scores of the residuals in model. However, when I standardized the residuals myself in z, the result was different. In fact, rstandard(model) had a mean different from 0 and had a standard deviation different from 1. The differences seem to be nonnegligible. What exactly does rstandard(model) do? Or am I doing something wrong here?

> x = rnorm(30, mean = 100, sd = 15)
> y = rnorm(30, mean = 100, sd = 15)
> model = lm(y ~ x)
> z.res = residuals(model)/sd(residuals(model)) #standardizing it myself
> rstandard(model) - z.res #difference between rstandard and what i did
            1             2             3             4             5 
-4.422354e-04  1.556269e-04 -4.576832e-03 -1.274350e-03  1.048068e-01 
            6             7             8             9            10 
-2.333922e-02  1.820134e-02 -3.307542e-03  3.368978e-02 -1.804108e-04 
           11            12            13            14            15 
-1.100621e-01 -1.343715e-03 -1.300427e-03  1.509862e-03  3.246602e-03 
           16            17            18            19            20 
 3.734255e-03 -1.821539e-06 -1.153190e-02 -1.713254e-06 -2.185101e-02 
           21            22            23            24            25 
-2.681935e-02  2.562472e-03 -4.721114e-02 -1.084481e-04 -3.430827e-03 
           26            27            28            29            30 
 4.149684e-04  7.705807e-04  2.166815e-03  2.537837e-02  4.182761e-04 
> mean(z.res) 
[1] -9.428041e-18 
#as expected of z-scores, mean is 0
> sd(z.res)
[1] 1 
#as expected of z-scores standard deviation is 1
> mean(rstandard(model)) 
[1] -0.001990908
#not really 0
> sd(rstandard(model)) 
[1] 1.019699
#not really 1

Also, the way I understood Standardized residuals in R's lm output, rstandard is actually studentized residuals. But isn't there already rstudent?

I'm using R version 2.14.1 in Xubuntu 12.04.

Thank you.

Best Answer

rstandard() produces standardised residuals via normalisation to unit variance using the overall error variance of the residuals/model.

rstudent() produces Studentized residuals in the same way, but it uses a leave-one-out estimate of the error variance.

The key line in rstandard() is

res <- infl$wt.res/(sd * sqrt(1 - infl$hat))

where sd is defined as

sqrt(deviance(model)/df.residual(model))

where model is the object returned by lm. But note this is not the same as sd(resid(model))

Note that the sd is also scaled by the hat values $1 - h_{ii}$ which together explain the discrepancy with your values.

The key line in rstudent() is

res <- res/(infl$sigma * sqrt(1 - infl$hat))

which is almost the same but sd is replaced via the leave-one-out estimate of the error variance (sd) infl$sigma