Solved – Resolving heteroskedasticity in linear regression models

heteroscedasticityrregressionweighted-regression

I am working on an age estimation method using 4 types of biological measurements as age predictors. I am using RStudio.
So far, I have good results when I use linear regression (lm(age~predictor)), but I am encountering heteroskedasticity, and therefore cannot build prediction intervals for my models.

I have tried transformations to normalize the predictors using ln, inverse, and square root, but to no avail.
I have found a paper explaining the wls function, and I have used it in my models with the weight: $$\frac 1 {1+\frac{\text{predictor}^2} 2}$$
This has given me better age predictions, but does not solve the heteroskedasticity problem.

I have done some research, and apparently, one of my options is to create homoscedastic groups in my data by finding the data points where the residual variances change.
For that, I have used the breakpoints function of strucchange, which gave me 5 breakpoints by default.
I now want to give 6 different weights (weights are $\frac 1 {\text{var(age)}}$ of each interval) to my 6 intervals of data, but I cannot find a function to do that. I would greatly appreciate any help on the subject.
Thanks.

Best Answer

Perhaps you can use the rlm() function from the MASS package. I believe it's intended for datasets affected by outliers, but it's at least worth a try with your dataset.

Here is a paper that describes the details of the function.

Description: Fit a linear model by robust regression using an M estimator.

Details: Fitting is done by iterated re-weighted least squares (IWLS).

# Package examples
require(MASS)
summary(rlm(stack.loss ~ ., stackloss))
rlm(stack.loss ~ ., stackloss, psi = psi.hampel, init = "lts")
rlm(stack.loss ~ ., stackloss, psi = psi.bisquare)

# Using the diamonds data set
diam_rlm <- rlm(price ~ ., diamonds)
summary(diam_rlm)

# A plot as example
ggplot(diamonds, aes(carat,price)) +
  geom_point() +
  stat_smooth(method = "rlm") +
  xlim(0,2.9)

enter image description here

For the diamonds dataset, i'm sure you could improve the model with a transformation. However, as you aren't saying this works with your data i left the diamonds data "as is".