Solved – Evaluate the relative importance of variables using PCA

dimensionality reductionfeature selectionfeature-engineeringpca

I have a large set (100+) of variables on which I'm performing PCA. The PCA returns me a list of components, each of which is in turn a list of the weights to be placed on my variables in a linear combination.

What I want to do is use these numbers to evaluate the relative importance of each variable in describing the dataset. I don't want to just look at the absolute values of the weights in the first principal component, as that would ignore all the other components. However I also don't want to just sum the weights of each variable in all the principal components, since that doesn't take into account the fact that earlier components are more important than later ones.

One idea I've been told is to, for each variable, do a weighted average of the weights, where each weight is multiplied by the variance that the corresponding component represents. The logic is that that does make the earlier components (i.e. the ones with larger variance) have greater importance than the later ones (the ones with smaller variance). Would this be a sound idea? Is there another, better solution?

Best Answer

A problem with your question is illustrated in the example below.

The points vary mostly in two directions (it is roughly a disk shape) and so the data may be reduced into two dimensions without loosing lot's of information (in terms of variance, possibly that tiny bit of variation may be important, first determine whether that viewpoint of amount of variance=information/importance applies).

Rather than selecting the largest PC's, PC1 and PC2 (which are transformations of x1, x2 and x3), one may consider instead to choose a set of the original parameters that also describes a large amount of the variation. Reducing the number of x's, for instance because it requires time to measure or space to store the information. This seems to be your goal.

Note in this example that x1 and x2 correlate strongly with the PC's and have high weights. Yet, it is better to select x1 + x3 or x2 + x3. This is because x1 and x2 correlate strongly with each other and after selecting one of x1 and x2 as important, the other one does not provide much more value. The contrast X1-X2 correlates only with the small variance of PC3).

What the PCA does is just showing you a structure perpendicular components generated in order of maximum variance. The goals of a PCA is to observe an underlying structure in a complex system of many variables. It does not give you an answer to reduce the dimensionality by selecting less variables (instead, it allows you to reduce the dimensionality by transformation, but this still requires all of the original variables).

What you could better do is write an algorithm selecting and switching variables until a maximum explained variance is achieved.

example of data with two strongly correlated parameters

R code to generate the image

#generating three random PC's
set.seed(1)
PC1 = rnorm(100,0,1)
PC2 = rnorm(100,0,0.5)
PC3 = rnorm(100,0,0.1)

#transformation back into underlying parameters
x1 = PC1 - PC3  
x2 = PC1 + PC3
x3 = 0.2*PC1 + PC2

#plotting
library("plot3D")
for (theta in c(0:120)*3) {
  if (theta < 10) {n = paste0('000',theta)}
  if (theta < 100 && theta >= 10) {n = paste0('00',theta)}
  if (theta >= 100) {n = paste0('0', theta)}


  name=paste0("~/Desktop/gifs/image_",n,".png")
  png(name)
  scatter3D(x1,x2,x3,xlab="x1",ylab="x2",zlab="x3",col=1,pch=19,theta=theta,phi=30)
  dev.off()
}

system("convert ~/Desktop/gifs/image*.png -delay 1 -loop 0 ~/Desktop/gifs/3D.gif")
Related Question