Solved – Linear regression with changing variance

regression

I want to perform linear regression on some data. For every value of x, the data values are distributed normally across y, around some mean. However, the variance increases linearly as x increases. I made this example graph:

linear regression graph

Blue is the regression line, red are data points, black shows the normal distribution, and green visualizes the variance increasing.

How can I calculate a regression for the change in variance, while also performing a linear regression of the data? The data is heteroscedastic, and I've read up on methods for doing linear regression on such data. However, I haven't found anything on estimating the actual change in variance of the data.

I haven't studied stats rigorously, so any simple explanations or resources I could look at further would be appreciated.

More Details:

The original dataset follows $y = a/x + b$. The variance as $x$ changes follows a similar model $s^2 = c/x + d$. I transformed the data using $x' = 1/x$ to make the data linear (just to simplify the problem). Here is a sample graph (left is transformed, right is original):

enter image description here

Best Answer

This sounds like a special case of heteroscedasticity.

There are two issues:

  1. What estimator should you use in the presence of heteroscedasticity?
  2. How should you calculate your standard errors?

The most straightforward thing to do is run a regular regression but use heteroscedastic robust standard errors. As @Glen_b suggests in the comments though, you probably can do better than this by efficiently exploiting known structure on your problem.

What estimator to use?

  • You could just run a normal regression.

  • You could run weighted least squares, an application of generalized least squares. The loose idea is to give more weight to observations with low variance error terms.

    • Since you probably don't know ex-ante how the variance of the error term varies with $x$, you probably have to do something like feasible gls.

If you run a regular OLS regression, you should not use the usual standard errors based upon assumptions of homoscedasticity. Instead you should use heteroscedastic robust standard errors. Any stats package can do this.

Related Question