Solved – PLS Regression and collinearity

multicollinearitymultivariate analysispartial least squaresregression

From what i know PLS regression is used when there is more variables than observations and when there exist multicollinearity between the independent variables. I have data for a regression model that does not show signs of multicollinearity between independent variables, but the relationship between independent and dependent variables is not very strong although there is a clear positive relationship with correlation values of 0.1 to 0.5 for different independent variables compared to the dependent variable.

I would like to know whether the concept of PLS regression can extend to collinearity in general, meaning if the relationship between the independent variables and dependent variables is not strong then PLS regression is a suggested regression model or are there other models that are more suitable?

Best Answer

PLS isn't necessarily a cure for insufficient sample size (though I recognize you're not claiming otherwise exactly). Check out Westland (2010) for a discussion of factors that determine the necessary sample size (and often determine that it's much larger than published articles often use). He discusses PLS too. One notable argument is that:

In PLS, the sample size question is probably both less relevant and less critical, because hypothesis testing is better left to LISREL and systems of equation approaches.

Some collinearity is generally expected among independent variables in basically any general(ized) linear model or structural equation model. It's mostly a problem when collinearity is very strong. Several methods exist for managing excessive collinearity, including ridge and principal components regression. Low collinearity is not a problem for any of these methods, including PLS. That they are somewhat more robust against high multicollinearity does not imply that they are biased by a lack thereof. Hence whether PLS is advisable in your circumstance probably depends on other factors.

You might try calculating the variance inflation factors for your set of regressors. These can give you a useful index of how much collinearity exists among them – it is often not as bad as one fears initially. This was the case in my own research: my concern about multicollinearity in my own regressions led me here to Cross Validated several months ago, incidentally.

Reference
Westland, C. J. (2010). Lower bounds on sample size in structural equation modeling. Electronic Commerce Research and Applications, 9(6), 476–487.

Related Question