Solved – Collinearity in polynomial regression

multicollinearityrregression

I executed some experiments to find the influence of x on z. It shows that z is also correlated to y. To find the influence of x without ignoring y's influence, I tried to find a model z(x,y) with polynomial terms (max. 3rd power):

I compared some fits and chose
fit1: z = B0 + B1*x + B2*x² + B3*y + B4*y²

This fit seems good (adj. R² = 84%, all p's << 0,05).
But looking at my standardized regression coefficients (betas), to estimate the influence of x and y on x, I get betas > |1|.

So my questions are:

  1. This is due to the collinearity, right?
  2. Is collinearity a problem in this case? (Do I have to get orthogonal polynoms?)
  3. Are betas above |1| valid to estimate the "size" of influences?

Thanks in advance!

Best Answer

Introduction of higher powered terms will invariably result in issues of multicollinearity.  This is essentially unavoidable.  Thus, it is best to only include the terms if theoretically justifiable.  (That's the standard caveat spiel...now that that’s out of the way...)

Though you cannot have a standardized coefficient greater than one in a bivariate regression, it is possible in multiple regressions.  If you do observe them, then you would justifiably be weary of the results.  One possible strategy is to run the regression with mean centered values.  Instead of using $x$ and $y$, replace them with $x_c = x - \bar{x}$ and $y_c = y - \bar{y}$.  Then you can use the higher powers of these “smaller” variables:  $x_c^2$ or $y_c^3$.  This may address the problem here, as the issue of collinearity may decrease (depending on the distribution of $x$ and $y$) if you are using just the quadratic terms.

Hope this helps...and always happy to elaborate further if need be.

Related Question