Solved – Using Principal Component Analysis to determine variable interaction

pca

I'm characterizing a communication channel by measuring bit error rate (ber) as a function of 15 channel settings: ber = Func(settings[1:15]). The dataset is about 100,000 samples.

I wonder if I can use PCA to explore interaction of settings and their effect on the bit error rate. Specifically:

  1. How to prepare data for PCA; do I run PCA on {ber, settings[1:15]}, or only on settings[1:15] ?
    I've seen PCA used on "symmetric" data, such as images, text, etc.

  2. What do projections of settings[1:15] on principal components tell me in this case

  3. What do outlier datapoints in principal component space tell me.

Best Answer

Principal Component Analysis is an unsupervised method -- that is, you use it on independent variables, settings in your case.

Once you have your principal components, you can partition the variance of your independent variables into orthogonal projections. That is, a certain amount of variance goes into component 1 (which is a linear combination of your variables), which is orthogonal to (uncorrelated with, but see here) the other components.

Usually, one uses PCA to replace a large number of variables by a much smaller number of principal components that are (i) uncorrelated and (ii) contain most of the variance.

However, given that you have a large number of samples and only a small number of variables, and given that you have a dependent variable, and given that you want to "explore interaction of settings and their effect on the bit error rate", I don't think PCA is the right method for you. You should consider either linear modelling, linear discriminant analysis, or maybe a method like PLS (which is to PCA what linear regression is to calculation of a mean). Or even a supervised machine learning method (like SVM, random forests, PLS-DA).

Related Question