Solved – Spread-Level Plot versus Power Transformation Functions in R

data transformationheteroscedasticityrtime series

I'm having trouble interpreting the results from the Spread-Level Plot function in R (car package). The documentation says:

PowerTransformation
spread-stabilizing power transformation, calculated as 1 – slope of the line fit to the plot.

This is not explicit enough for me. Should this transformation be applied to every variable in the regression?

For example, assume I have an lm object given by:

myFit <- lm(y ~ x1 + x2)

Then I use Spread-Level Plot:

slp(myFit)

If the 'suggested power transformation' is 0.5, then does that imply a homoscedastic model could be fit using one of the following?

refitA <- lm(sqrt(y) ~ sqrt(x1) + sqrt(x2))
refitB <- lm(sqrt(y) ~ x1 + x2)
refitC <- lm(sqrt(y) ~ sqrt(x1 + x2))

If I understand correct, refitA would be the suggested model to approximate homoscedasticity. On the other hand, if I only want to transform the LHS, I would use the powerTransform function (also from car package). i.e., an "estimated transform parameter" of 0.5 from the powerTransform function would imply that refitB is homoscedastic.

Is this correct?

Thanks!

Best Answer

The idea is to identify a possible transformation of the response to improve the heteroskedasticity, assuming the model fitted well enough for the spread and level to have been sufficiently accurately estimated.

Which is to say, try refitB, but beware that if the original model was reasonably linear in unstransformed X, the new one generally won't be.

Things to watch out for: possible need to transform X as well, interaction where there wasn't any, or loss of interaction where there was.

If the noise level is high you may not be able to easily tell linear from not-linear though, at least not without something like a loess fit or other similar smooth to pick it out from the noise.