Solved – Weighted regression for categorical variables

anovaregressionresidualsweighted-regression

I have been trying to use weighted regression on some data where there are three categorical variables with the

lm(y ~ A*B*C) 

command in R.

This was to try and deal with the problem that my residual plot does not have constant variance, and that for one of the factors the variance for two of the levels is much smaller than the other level and in the other factors, so I define weights, in a vector ww, for each observation, so that the corresponding observations do not contribute as much as the other observation, and use

lm(y ~ A*B*C, weights=ww)

My problem is that I get exactly the same residuals when I use weights than when I don't use weights. Is this supposed to happen? Is it because all the variables A, B, C are categorical?

It sort of seems to make some sense because the regression coefficients (the beta's in the linear model) are just the means and the differences from the reference mean in the un-weighted model, and I think that will also be the case using weighted regression, but I am not sure.

Can anyone help me with this? I am a bit lost. i.e. can someone tell me whether I should NOT be using weighted least squares to fix problems with heterogeneous variance of residuals? (I have also tried transforming the response variable, $\log(y)$, it helped but didn't fix the problem.

Best Answer

You should not define weights by hand. Use the gls function from nlme (see help, you probably want option weights = varIdent(form = ~ 1 | group) ) to estimate the weights, and then use Pearson residuals (which divide the raw residual by the expected / fitted variance) to check the model.