Solved – Is it acceptable to rotate factors with PCA for binary data

binary datafactor analysispcarotation

What issues, if any, might there be in rotating factors in order to obtain factor/component loadings of binary data? Is it acceptable to rotate the factors when doing a traditional PCA? (Assuming I’m using a polychoric correlation matrix.)

I am not a statistician, but an applied educational researcher making sure my approach is appropriate to the problem at hand.

My doubt comes from a note in the SAS support site that advises to use PROC PRINCOMP or PROC CORRESP instead of PROC FACTOR (in all cases to be careful to factor a tetrachoric correlation matrix). One important difference between these two procedures and PROC FACTOR is that the former do not implement factor rotation, even though a PCA can be specified within PROC FACTOR, and would otherwise seem to be preferable for exploratory analysis. It occurred to me that perhaps rotation of factors is inappropriate for binary data, but I cannot find a discussion of this in literature or stat forums.


The problem, in case you care to know the context: We have data from several hundred colleges around the U.S. regarding what programmatic/curricular elements they implement in certain kinds of cohort educational programs. 24 variables, all dichotomous. We are looking to find patterns of how those elements tend to group together (in addition to a separate latent class analysis or cluster analysis to categorize the programs). Factor analysis is not appropriate since there is not a theoretical underlying latent structure that these elements are “measuring”.

Best Answer

Rotation of extracted factors or principal components is not prohibited for binary data: it is not the data that are rotated, it is the loading matrix. By the way, your data for which loadings are computed are not seen as binary anymore since you computed tetrachoric (polychoric) correlations for it.

Unlike factor analysis application, typical application of PCA involves interpretation of the components only rarely, because the components are usually seen as derivative simplifying variables which only summarize the multivariate cloud, they do not pretend to be latent essences behind the observed variables, like factors do: factors govern the covariations and always call for interpretation. Rotation aids interpretation. That's probably why PRINCOMP doesn't offer rotation and FACTOR does, as you say.