Solved – Linear Regression with heavy tailed noise

heavy-tailed

The model is linear $y_i = a\cdot x_i + b + e_i,~ i = 1,2,\ldots,N $. It is given that the noise is heavy tailed. However the distribution of noise conditional on $x$ is the same for all data points. My question is that how should I model the data generating process? Should I use a Student-t distribution for the noise process? Should I use M estimator in R?
Facts:
1. OLS is not to be used.
2. 2. The noise distribution can depend on $x$, but is independent across samples.
3. conditioned on the value of $x$, noise's mean is 0.
More clarification: the distribution of noise at $x$, i.e., $N(x)$ is a function of $x$. However for different data samples $i=1,2,\ldots,M$, the distribution of noise samples remains the same.

Best Answer

You could use iterative feasible generalized least squares.

Start by setting weights for each datapoint to 1, i.e. no weighting, and use the following algorithm:

  1. Fit a weighted regression model for each dataset using weights.
  2. Create a single dataset combining squared residuals/errors, $e_i^2$ and their respective $x$ values.
  3. Fit $e_i^2 = a\cdot x_i + b$. If the noise is zero mean, $e_i^2$ is equal to the variance of the error at $x_i$.
  4. Update your weights with the squared errors prediction model
  5. Go back to 1 until convergence.
Related Question