I have some dichotomous data, only binary variables, and my boss asked me to perform a factor analysis using the tetrachoric correlations matrix. I’ve previously been able to teach myself how to run different analyses based on the examples here and at the UCLA’s stat site and other sites like it, but I can’t seem to find a step through an example of a factor analysis on dichotomous data (binary variables) using R.
I did see chl's response to a somewhat simular question and I also saw ttnphns' answer, but I am looking for something even more spelled out, a step through an example I can work with.
Does anyone here know of such a step through an example of a factor analysis on binary variables using R?
Update 2012-07-11 22:03:35Z
I should also add that I am working with an established instrument, that have three dimension, to which we have added some additional questions and we now hope to find four distinct dimension.
Furthermore, our sample size is only $n=153$, and we currently have $19$ items. I compared our sample size and our number of items to a number of psychology articles and we are definitely in the lower end, but we wanted to try it anyway.
Though, this is not important for the step through example I am looking for and caracal’s example below looks really amazing. I will work my way thru it using my data first thing in the morning.
Best Answer
I take it the focus of the question is less on the theoretical side, and more on the practical side, i.e., how to implement a factor analysis of dichotomous data in R.
First, let's simulate 200 observations from 6 variables, coming from 2 orthogonal factors. I'll take a couple of intermediate steps and start with multivariate normal continuous data that I later dichotomize. That way, we can compare Pearson correlations with polychoric correlations, and compare factor loadings from continuous data with that from dichotomous data and the true loadings.
Now simulate the actual data from the model $x = \Lambda f + e$, with $x$ being the observed variable values of a person, $\Lambda$ the true loadings matrix, $f$ the latent factor score, and $e$ iid, mean 0, normal errors.
Do the factor analysis for the continuous data. The estimated loadings are similar to the true ones when ignoring the irrelevant sign.
Now let's dichotomize the data. We'll keep the data in two formats: as a data frame with ordered factors, and as a numeric matrix.
hetcor()
from packagepolycor
gives us the polychoric correlation matrix we'll later use for the FA.Now use the polychoric correlation matrix to do a regular FA. Note that the estimated loadings are fairly similar to the ones from the continuous data.
You can skip the step of calculating the polychoric correlation matrix yourself, and directly use
fa.poly()
from packagepsych
, which does the same thing in the end. This function accepts the raw dichotomous data as a numeric matrix.EDIT: For factor scores, look at package
ltm
which has afactor.scores()
function specifically for polytomous outcome data. An example is provided on this page -> "Factor Scores - Ability Estimates".You can visualize the loadings from the factor analysis using
factor.plot()
andfa.diagram()
, both from packagepsych
. For some reason,factor.plot()
accepts only the$fa
component of the result fromfa.poly()
, not the full object.Parallel analysis and a "very simple structure" analysis provide help in selecting the number of factors. Again, package
psych
has the required functions.vss()
takes the polychoric correlation matrix as an argument.Parallel analysis for polychoric FA is also provided by the package
random.polychor.pa
.Note that the functions
fa()
andfa.poly()
provide many many more options to set up the FA. In addition, I edited out some of the output which gives goodness of fit tests etc. The documentation for these functions (and packagepsych
in general) is excellent. This example here is just intended to get you started.