Normally when I do factor analysis, I have a whole bunch of variables that need to be reduced. But here I only have two binary variables (yes/no) that I need to reduce into one interval factor. Is Principle Components / Factor Analysis appropriate for this? When I do it, my extraction communalities are really high. I might need a reference to back this up with reviewers.
Solved – How to factor analyze two binary variables only
binary datacorrespondence-analysisfactor analysispcareferences
Related Solutions
You might read my own current opinion about binary variables here. In short, it is not a sin to use binary vars with PCA if you use the analysis simply as variable-reduction technique - for example, for plotting purpose, - without attempting to interpret the components as latent features. If you go as far as to interpret you should better use factor analysis in proper sense, not PCA; and then binary variables posit a problem since factor analysis assumes contunuous variables, what binary variables are clearly not.
Two or three items per factor is a question of identification of your CFA (confirmatory FA) model.
Let us for simplicity assume that the model is identified by setting the variance of each factor to 1. Assume also that there are no correlated measurement errors.
A single factor model with two items has two loadings and two error variances to be estimated = 4 parameters, but there are only 3 non-trivial entries in the variance-covariance matrix, so you don't have enough information to estimate the four parameters that you need.
A single factor model with three items has three loadings and three error variances. The variance-covariance matrix has six entries, and careful analytic examination shows that the model is exactly identified, and you can algebraically express the parameter estimates as functions of the variance-covariance matrix entries. With more items per single factor, you have an overidentified model (more degrees of freedom than parameters), which usually means you are good to go.
With more that one factor, the CFA model is always identified with 3+ items per each factor (because a simple measurement model is identified for each factor, so roughly speaking you can get predictions for each factor and estimate their covariances based on that). However, a CFA with two items per factor is identified provided that each factor has a non-zero covariance with at least one other factor in population. (Otherwise, the factor in question falls out of the system, and a two-item single factor model is not identified.) The proof of identification is rather technical, and requires good understanding of matrix algebra.
Bollen (1989) fully and thoroughly discusses the issues of identification of CFA models in chapter 7. See p. 244 specifically regarding three- and two-indicator rules.
Best Answer
It is normally considered that three is the minimum number of variables to conduct factor analysis; amongst elsewhere this is maintained in the Wikipedia article (which has a reference) and in some (most? all?) statistical software.
There is no reason however that you can't do principal components analysis (which is not the same as factor analysis, although closely related) to identify which principal component explains most of the variance, even if you only have two binary variables. The correlations between the two can still be calculated.
See for example the below, where Bin1 and Bin3 are correlated binary variables. The first principal component explains most of the variance, and naturally is equally weighted on both of the original variables.
The scatterplot of the two binary correlated variables (points are jittered):
Component eigenvalues, left, and biplot (loadings+scores), right: