I am working on an age estimation method using 4 types of biological measurements as age predictors. I am using RStudio.
So far, I have good results when I use linear regression (lm(age~predictor)
), but I am encountering heteroskedasticity, and therefore cannot build prediction intervals for my models.
I have tried transformations to normalize the predictors using ln, inverse, and square root, but to no avail.
I have found a paper explaining the wls
function, and I have used it in my models with the weight: $$\frac 1 {1+\frac{\text{predictor}^2} 2}$$
This has given me better age predictions, but does not solve the heteroskedasticity problem.
I have done some research, and apparently, one of my options is to create homoscedastic groups in my data by finding the data points where the residual variances change.
For that, I have used the breakpoints function of strucchange, which gave me 5 breakpoints by default.
I now want to give 6 different weights (weights are $\frac 1 {\text{var(age)}}$ of each interval) to my 6 intervals of data, but I cannot find a function to do that. I would greatly appreciate any help on the subject.
Thanks.
Best Answer
Perhaps you can use the
rlm()
function from the MASS package. I believe it's intended for datasets affected by outliers, but it's at least worth a try with your dataset.Here is a paper that describes the details of the function.
Description: Fit a linear model by robust regression using an M estimator.
Details: Fitting is done by iterated re-weighted least squares (IWLS).
For the diamonds dataset, i'm sure you could improve the model with a transformation. However, as you aren't saying this works with your data i left the diamonds data "as is".