Solved – PCA vs FA vs ICA for dimensionality reduction in questionaire data

dimensionality reductionfactor analysisindependent component analysispcasurvey

I am trying to identify personality traits underlying the multidimensional data from a questionnaire. In more abstract terms this means reducing the dimensionality of my data from N-dimensional (where N is the number of questions) to a more manageable number (usually chosen based on how much variance these dimensions may contain). A key thing to note is that given the fuzzy nature of personality traits it is expected that these dimensions are not orthogonal.

Generally psychologists like to do what I described above via Factor Analysis. I have a basic understanding of the distinctions between PCA, FA, and ICA. I am also aware that ICA is not commonly used for dimensionality reduction.

I have constructed a set of 2D data points distributed normal-ish along two non-orthogonal dimensions to assess the suitability of these methods. The full script for generating the data and plotting the figure can be found here. Admittedly this is about re-mapping the dimensionality, but reducing it would require data of a higher dimensionality than I can nicely plot.

An example of the sort of figure the script would produce is displayed below:

enter image description here

  • The second Factor for FA is [0,0]. This does not change even if I manually require the function to return two factors. Why does FA try to squeeze everything into one factor (when it is obvious that is not the latent variable generating my data)? I heard one of the strengths of FA was that it could return non-orthogonal dimensions. Why is that not happening here?
  • ICA seems to be doing the right job here. So why is it not used to re-map questionnaire data to more meaningful dimensions? I have heard ICA components are unordered – is that part of the issue? If so, why can't one determine how much of the variance each component explains, and order them accordingly?

So, why would anyone rather use FA than ICA when analyzing questionnaire data?

Best Answer

I was curious about your question, because I had never even heard of Independent Component Analysis (ICA), but I use factor analysis all the time. So looking up ICA, I found that one of the key assumptions was that "the values in each source signal have non-Gaussian distributions" (Wikipedia). This doesn't seem like a very helpful assumption if we're trying to discern or confirm a latent construct -- like a personality trait, if we're assuming that our item-responses are being drawn from a normal distribution, or that our latent construct is normally distributed. As such, ICA seems to be used for things like studying radio signals, and not personality traits.