Solved – Recover data after PCA

pca

I have a linear regression problem with about 120 predictors and I tried to remove a number of predictors from it.
First I tried to remove multi-collinearities by calculating the variance inflation factor. This left me with about 20 different (hopefully not collinear anymore) predictors.
Then I used a PCA to reduce dimensionality even further. Because the predictors' variances are very different to one another I used the correlation matrix for this.

I can get the 'final' data when I multiply the eigenvectors with the largest eigenvalues with my original data, right?

In the end I want to find out which original predictors are left and how I can recover the 'new' original data. But for some reason I am not able to recover correct numbers.

Best Answer

PCA does not get rid of any of the variables, although some may be very unimportant. Suppose you use the first three components from your PCA, each of these is a linear combination of the 20 variables you put into it.

If you want to be able to interpret the importance of the original variables in the regression, I don't think PCA is the way to go. I would consider one of the penalized regression methods such as LASSO or LAR