GLM Collinear Predictors – How Generalized Linear Model Handles Collinear Predictors

generalized linear modelmulticollinearity

In the case of an ordinary least squares GLM with two nearly collinear predictors, how does this shared variance get reflected in the parameter estimates? My understanding is that the parameter estimates will reflect the unique effect of each parameter, i.e., controlling for all other parameters.

Consider a case where the predictors are correlated very highly, such that the "unique" effect of each predictor is small. But, the shared variance among them explains much variance in the outcome. How is this shared variance allocated between the two predictors?

Best Answer

Let's predict income with two highly positively correlated variables: Years of work experience and number of carrots eaten in one's lifetime. Let's ignore omitted variable bias issues. Also, let's say years of work experience has a much greater impact on income than carrots eaten.

Your beta parameter estimates would be unbiased, but the standard errors of the parameter estimates would be greater than if the predictors were not correlated. Collinearity does not violate any assumptions of GLMs (unless there is perfect collinearity).

Collinearity is fundamentally a data problem. In small datasets, you might not have enough data to estimate beta coefficients. In large datasets, you likely will. Either way, you can interpret the beta parameters and the standard errors just as if collinearity were not an issue. Just be aware that some of your parameter estimates might not be significant.

In the event your parameter estimates are not significant, get more data. Dropping a variable that should be in your model ensures your estimates are biased. For example, if you were to drop the years of experience variables, the carrots eaten variables would become positively biased due to "absorbing" the impact of the dropped variable.

To answer the shared variance question, here is a fun test you can do in a statistical program of your choice:

  • Make two highly correlated variables (x1 and x2)
  • Add an error term (normally distributed, zero mean)
  • Create y by adding x1 to the error term. (i.e. The actual beta values of x1 and x2 are 1 and 0 respectively)
  • Regress y on x1 and x2 with a large data set.

Although there is a very large shared variance between x1 and x2, only x1 has a ceteris paribus, marginal effect relationship to y. In contrast, holding x1 constant and changing x2 does nothing to the expected value of y, so the shared variance is irrelevant.