Solved – Using PCA vs Linear Regression

pcaregression

I'm looking to analyzing data from a study and previous studies that are similar have used either PCA or hierarchical linear regression to analyze the data. I've used both PCA and linear regression previously. From my understanding PCA breaks the data down into principal components and is useful for learning what factors may be strong indicators of our dependent variable, and that linear regression can be used to compare correlation.

How should I be approaching this? If I'm simply wanting to find out what correlates the strongest with my studies dependent variable what would be the best option? Can I use both PCA and then hierarchical linear regression?

Best Answer

PCA does not involve a dependent variable: All the variables are treated the same. It is primarily dimension reduction method.

Factor analysis also doesn't involve a dependent variable, but its goal is somewhat different: It is to uncover latent factors.

Some people use either the components or the factors (or a subset of them) as independent variables in a later regression. This can be useful if you have a lot of IVs: If you want to reduce the number while losing as little variance as possible, that's PCA. If you think these IVs represent some factors, that's FA.

If you think there are factors, then it may be best to use FA; but if you are just trying to reduce the number of variables, then there is no guarantee that the components will relate well to the DV. Another method is partial least squares. That does include the DV.