I'm new to R, so please be gentle.
I was under the impression that rstandard(model)
returns the z-scores of the residuals in model
. However, when I standardized the residuals myself in z, the result was different. In fact, rstandard(model)
had a mean different from 0 and had a standard deviation different from 1. The differences seem to be nonnegligible. What exactly does rstandard(model)
do? Or am I doing something wrong here?
> x = rnorm(30, mean = 100, sd = 15)
> y = rnorm(30, mean = 100, sd = 15)
> model = lm(y ~ x)
> z.res = residuals(model)/sd(residuals(model)) #standardizing it myself
> rstandard(model) - z.res #difference between rstandard and what i did
1 2 3 4 5
-4.422354e-04 1.556269e-04 -4.576832e-03 -1.274350e-03 1.048068e-01
6 7 8 9 10
-2.333922e-02 1.820134e-02 -3.307542e-03 3.368978e-02 -1.804108e-04
11 12 13 14 15
-1.100621e-01 -1.343715e-03 -1.300427e-03 1.509862e-03 3.246602e-03
16 17 18 19 20
3.734255e-03 -1.821539e-06 -1.153190e-02 -1.713254e-06 -2.185101e-02
21 22 23 24 25
-2.681935e-02 2.562472e-03 -4.721114e-02 -1.084481e-04 -3.430827e-03
26 27 28 29 30
4.149684e-04 7.705807e-04 2.166815e-03 2.537837e-02 4.182761e-04
> mean(z.res)
[1] -9.428041e-18
#as expected of z-scores, mean is 0
> sd(z.res)
[1] 1
#as expected of z-scores standard deviation is 1
> mean(rstandard(model))
[1] -0.001990908
#not really 0
> sd(rstandard(model))
[1] 1.019699
#not really 1
Also, the way I understood Standardized residuals in R's lm output, rstandard
is actually studentized residuals. But isn't there already rstudent
?
I'm using R version 2.14.1 in Xubuntu 12.04.
Thank you.
Best Answer
rstandard()
produces standardised residuals via normalisation to unit variance using the overall error variance of the residuals/model.rstudent()
produces Studentized residuals in the same way, but it uses a leave-one-out estimate of the error variance.The key line in
rstandard()
iswhere
sd
is defined aswhere
model
is the object returned bylm
. But note this is not the same assd(resid(model))
Note that the
sd
is also scaled by the hat values $1 - h_{ii}$ which together explain the discrepancy with your values.The key line in
rstudent()
iswhich is almost the same but
sd
is replaced via the leave-one-out estimate of the error variance (sd)infl$sigma