Solved – Applying principal component analysis on variables of different dimensions

pca

Is it possible to perform principal component analysis on variables that have different number of rows of data?

For instance, if there were just three variables – Var1, Var2, Var3 – where Var1 has 100 rows of data, Var2 has 50 rows, and Var3 has 20 rows. Can PCA be performed on such a data set? If PCA is possible, do I simply pad the empty ones with zero or is some sort of transformation needed?

Had all variables been the same dimension (e.g. Var1, Var2, Var3 all has 100 rows of data), then there would be no issue. Unfortunately my data set is not so trivial.

Please advise. Thanks!

Best Answer

You are going to have issues with performing PCA with such amounts of missing data. The analysis is either going to exclude cases with missing values, or some type of imputation will need to be performed. Say you have 100 rows, with Var3 which has only 20 rows (i.e. only 20% of rows have values) you may have issues with accurate imputation, which of course will then create issues for the validity of the PCA results.

Related Question