Solved – Polychoric PCA and component loadings in Stata

categorical datacorrelationordinal-datapcastata

I’m using Stata 12.0, and I’ve downloaded the polychoricpca command written by Stas Kolenikov, which I wanted to use with data that includes a mix of categorical and continuous variables. Given the number of variables (around 25), my hunch is that I will need to generate more than 3 components. Ultimately, I would like to generate a handful of meaningful components (rather than dozens of variables) and use the components as independent variables in logistic regression.

Using polychoricpca, I am able to generate a table showing the eigenvalues and the eigenvectors (loadings) for each variable for the first three (3) components only. polychoricpca appears to call these loadings “scoring coefficients” and produces these for every level of the variable, such that if a variable has three categories you’ll see three scoring coefficients (“loadings”) for that variable. Never having worked with polychoric PCA before, I’m used to only seeing one loading per variable/item. I want to examine these coefficients (“loadings”) to try to understand what the components are and how they might be labelled.

My questions:

(1) What if it looks as if I should generate 4 components? It seems as if I wouldn’t be able to examine and understand what that 4th component is because I can’t see how each of the items load on that 4th component, only the first 3. Is there a way to see how each item loads on more than the first three components?

(2) Can I simply use the polychoric correlation matrix combined with Stata’s pcamat command to examine how each item loads on each component (the eigenvector table). I thought this might be a way of being able to examine loadings if I have more than 3 components. The idea came from this UCLA stats help post on using factormat with a polychoric correlation matrix. pcamat in Stata, however, produces only 1 loading (coefficient) per variable, not 1 loading for every level of the variable. Any thoughts on whether it would be appropriate just to report the single loading from pcamat?

Best Answer

Although you can store all scores in variables, you cannot display the weights for all of them. But as they are important for a meaningful interpretation of the components, you could use the generated variables containing the score to get back on the weights.