Standard error and multicollinearity

linear regressionstatistics

"The two most correlated features and A and B, and neither of them are significant in the linear regression. This could be due to the inflation of standard errors caused by multicollinearity."

Two questions regarding the statement above:

  1. Why does multicollinearity cause the inflation of standard errors?
  2. Why can the inflation of standard errors cause variables to be insignificant?

Thanks!

Best Answer

  1. A regression is meant to measure the relationship between each independent variable and the outcome, holding all other covariates constant. If you have multiple independent variables that are highly correlated, the estimates of the individual relationships become more difficult to tease apart from each other, since your data doesn't have a lot of variation in one variable while holding another constant. This doesn't affect the point estimate of any of the coefficients, but makes the ones from highly correlated variables more uncertain, hence the higher standard errors.

  2. Statistical significance in linear regression is typically measured by a $t$-statistic of the form $$ t = \frac{\hat{\beta}_i}{\text{se}_i} , $$ where $\text{se}_i$ is the standard error of the estimate of $\beta_i$. If the standard error is higher, the observed $t$ statistic will be lower, potentially to the point that the estimate of $\beta_i$ is no longer statistically significant.

Related Question