Multicollinearity vs Autocorrelation – Understanding the Differences

autocorrelationmulticollinearitymultiple regression

I am learning about MLR and I don't quite understand the difference between these two terms? Could someone explain it to me?
Additionally, if I made a model and I wanna validate the conditions necessary for it to be a good model, should I test both conditions (multicollinearity and autocorrelation)? or only one of those.
I believe this might be a very basic question, I hope you can help me. Thanks!

Best Answer

Multicollinearity is high correlation between two or more independent regressors.

For example, in a regression:

$$y_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + e_i$$

multicollinearity would be present if $x_{1i}$ and $x_{2i}$ would be highly correlated together. In fact one way how to detect multicollinearity is to run auxiliary regression between the above two independent variables:

$$x_{1i} = \alpha_0 + \alpha_1 x_{2i} + \epsilon_i$$

collect the $R^2$ and calculate variance inflation factor $VIF= \frac{1}{1-R^2}.$ Higher values of VIF indicate higher multicollinearity with values above 5 respectively 10 (different textbooks will offer slightly different rule of thumb) would indicate problematic multicollinearity.

Multicollinearity, itself does not lead to biased results but it inflates variance of standard errors so you would want to avoid it if possible.

Autocorrelation might refer either to autocorrelation in errors, or also more generally to time series models where variables are related to their past realizations. From your question I assume that you want to know about the former rather than latter.

When it comes to autocorrelation in error term then it means that different errors are correlated with each other (usually across time but spatial autocorrelation can sometiems exists as well). A simple example of autocorrelation would be some simple time series model such as:

$$y_t = \gamma_0 + \gamma_1 x_t +\varepsilon_t$$

where $\varepsilon_t$ would be given by:

$$\varepsilon_t = \rho \varepsilon_{t-1} +v_t$$

with $\rho \neq 0$. The above essentially means that errors today will be correlated with errors yesterday.

Autocorrelation in itself does not bias the coefficient of the model of interest, but it does bias the estimation of standard error so without detecting and correcting for autocorrelation you would not be able to make correct inference (i.e. $t$-statistics and $p$-values will be wrong even if $\gamma$ will be correct).

There are various tests for autocorrelation. For example, Durbin-Watson test for first order autocorrelation or Breusch-Godfrey test for higher order autocorrelation to name just two. You can read more about these tests in Verbeek (2008), A Guide to Modern Econometrics 4th ed. pp 116.