The first thing to note is that the estimators in the linear regression model are not particularly sensitive to heavy tails in the error distribution (so long as the error variance is finite). Fitting a standard linear regression to data with excessively heavy tail will mean that data points in the tails are penalised excessively, but the coefficient estimators in the model are still usually quite reasonable. The main drawback in this situation is that prediction intervals for values will be too short, since they do not account for the heavy tails.
If you would like to adapt your model to deal with the heavier tails, you can use the heavyLm
function in the heavy
package in R
. This function fits a linear model using the T-distribution as the error distribution, which allows you to use an error distribution with heavier tails than the normal. The only drawback of the package is that it requires you to specify the degrees-of-freedom parameter for the error distribution, rather than just estimating this from the data. However, with some creative looping, you could even estimate this parameter if you wanted to. In any case, this model should allow you to get estimates for a linear regression, where the error distribution has heavier tails than the normal distribution, and so your corresponding residual density plot and residual QQ plot should be close to the stipulated error distribution.
Cauchy. The reason for the the strange histogram from Cauchy data is precisely because
you are getting many extreme values in the tails--too sparse and too extreme to show
well on your histogram. A data summary or boxplot might be more useful to visualize
what's going on.
set.seed(999)
x = rcauchy(10000)
summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5649.323 -0.970 0.021 -0.037 1.005 2944.847
x.trnk = x[abs(x) < 200] # omit a few extreme values for hist
length(x.trnk)
[1] 9971
par(mfrow=c(2,1))
dcauchy(0)
[1] 0.3183099 # Height needed for density plot in histogram
hist(x.trnk, prob=T, br=100, ylim=c(0,.32), col="skyblue2")
curve(dcauchy(x), add=T, col="red", n=10001)
boxplot(x.trnk, horizontal=T, col="skyblue2", pch=20)
par(mfrow=c(1,1))
The standard Cauchy distribution (no parameters specified) is the same as Student's t
distribution with DF = 1. The density function integrates to $1,$ as appropriate, but
its tails are so heavy that the integral for its 'mean' diverges, so its mean doesn't
exist. One speaks of its median as the center of the distribution.
Student's t, DF = 10. There is nothing particularly unusual about Student's t distribution with DF = 10.
Its tails are somewhat heavier than for standard normal, but not so much heavier
that it's hard to make a useful histogram (no truncation needed). And its mean is $\mu=0.$
y = rt(10000, 10)
summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-5.988219 -0.698855 -0.006711 -0.005902 0.685740 6.481538
dt(0,10)
[1] 0.3891084
par(mfrow=c(2,1))
hist(y, prob=T, br=30, ylim=c(0,.4), col="skyblue2")
curve(dt(x,10), add=T, col="red", n=10001)
boxplot(y, horizontal=T, col="skyblue2", pch=20)
par(mfrow=c(1,1))
The distribution $\mathsf{T}(10)$ is sufficiently heavy-tailed that
samples from it as large as $n=10\,000$ tend to show many boxplot outliers---as seen above. In a simulation of $100\,000$ samples of size $10\,000,$
almost every sample showed at least one outlier and the average number
of outliers per sample was more than 180. [This simulation runs slowly
because each sample of $10,000$ needs to be sorted in order to determine
its outliers.]
set.seed(2020)
nr.out = replicate(10^5, length(boxplot.stats(rt(10000,10))$out))
mean(nr.out)
[1] 188.5043
mean(nr.out>0)
[1] 1
Best Answer
You could use iterative feasible generalized least squares.
Start by setting weights for each datapoint to 1, i.e. no weighting, and use the following algorithm: