Solved – PCA/MFA for (graphical) dimension reduction: what to do with very small explained variance

dimensionality reductionpcavariance

I ran a Multiple Factor Analysis on a data set with 3,924 rows and 96 columns, of which six are (unordered) categorical, with 12-14 categories in each, and the rest are numeric, mean-centered and scaled by one-standard-deviation. My goal is dimension reduction, in order to visualize the results of PAM clustering by plotting the first two or three dimensions and coloring the points by assigned partition, as well as highlighting each medoid.

I found that no one dimension of PCA space explains more than a small fraction of variance in the data:

       eigenvalue percentage of variance cumulative percentage of variance
comp 1  1.0350075               2.466873                          2.466873
comp 2  0.8243004               1.964666                          4.431539
comp 3  0.8093599               1.929057                          6.360596
comp 4  0.7587070               1.808329                          8.168924
comp 5  0.6495978               1.548274                          9.717198
comp 6  0.6328384               1.508329                         11.225527

What should I make of this situation? Can I still use the first two PCA dimensions as a quick 2D approximation of the data set, or will they just fail to represent the data accurately?

Is there an alternative dimension reduction technique I could/should use? All of the reviews of nonlinear dimension reduction I've read were somewhat equivocal on their usefulness compared to PCA, except on fabricated data like the swiss roll data set, so I've been hesitant to use them.

Edit: here are the PCA results from just the numerical variables:

        eigenvalue percentage of variance cumulative percentage of variance
comp 1   5.1704992              5.7449991                          5.744999
comp 2   4.0469449              4.4966055                         10.241605
comp 3   3.8800122              4.3111247                         14.552729
comp 4   3.0606430              3.4007144                         17.953444
comp 5   2.7176048              3.0195609                         20.973005
comp 6   2.4725503              2.7472781                         23.720283

Best Answer

Despite the term multiple factor analysis (MFA), used to describe the factor analysis (FA) that you've performed, it seems to me like a standard PCA approach (or, FA via PCA, at best), which focuses on principal components. Instead, I suggest you to use exploratory factor analysis (EFA) and then confirmatory factor analysis (CFA), both of which focus on latent variables approach. EFA serves as an alternative dimensionality reduction technique with an added benefits of discovering latent factor structure, which has more explanatory power. Let me know, if you need further help.

Related Question