Solved – Validating principal component analysis

pca

I just wanted to do this small experiment to make sure I understand PCA correctly. My dataset contains 8 columns. The first two columns are randomly generated in excel => randbetween(4, 5) and the other 6 columns are also generated in the same way but the formula used is => randbetween(1,3)

When I do PCA on this I am not getting good results. I expect that the result should indicate high eigenvalues for a factor that is a combination of first two columns and low on other colums. This is my code in R :

sensex.dat = read.csv('C:/Study/_SEM4/brand man/emperical/dice.csv', header = T)
attach(sensex.dat)
sensex.cov = cov(sensex.dat)
sensex.eigen = eigen(sensex.cov, symmetric = T)
sensex.eigen$values
sensex.eigen$vectors

Best Answer

As others have told you PCA does not look for amplitude - in fact it is standard procedure to normalize your variables before a PCA. You did not do this by the way. It looks for correlations between the columns.

The result you want to generate you would get by

  1. Randomly generating a column
  2. Generating a second random column with similar parameters but also adding the first column to it. In your example this would basically be first column + randbetween.
  3. Generate additional uncorrelated columns as in 1
  4. Normalize and then get eigenvalues and vectors
Related Question