Solved – How to use variables derived from factor analysis as predictors in logistic regression

factor analysislogistic

Context

I have a survey that asks 11 questions about self-efficacy.
Each question has 3 response options (disagree, agree, strongly agree).
Nine questions ask about self-esteem.
I have used a factor analysis of the 11 self-efficacy items and extracted two factors.

$x_1$ to $x_{11}$ denote the 11 self-efficacy questions in the survey, and $f_1$ ($x_1$ to $x_6$) , $f_2$ ($x_7$ to $x_{11}$) denote the two factors I got from the factor analysis.
$y$ is a Dependent variable.

Then I created two new variables:

   f1=mean(x1 to x6); 
   f2=mean(x7-x11). 

So the logistic regression would looks like this:

   y=a+bf1+cf2+....

My question:

  • Can i use these two factors as predictor variables in my multivariate logistic regression model?
  • Should I calculate the mean of each items in each factor and use this mean as a continuous variable in my logistic regression model?
  • Is this an appropriate use of factor analysis?

Best Answer

If I understand you correctly, you are using FA to extract two subscales from your 11-item questionnaire. They are supposed to reflect some specific dimensions of self-efficacy (for example, self-regulatory vs. self-assertive efficacy).

Then, you are free to use individual mean (or sum) scores computed on the two subscales as predictors in a regression model. In others words, instead of considering 11 item scores, you are now working with 2 subscores, computed as described above for each individual. The only assumption that is made is that those scores reflect one's location on an "hypothetical construct" or latent variable, defined as a continuous scale.

As @JMS said, there are other issues that you might further clarify, especially which kind of FA was done. A subtle issue is that measurement error will not be accounted for by a standard regression approach. An alternative is to use Structural Equation Models or any latent variables model (e.g. those coming from the IRT literature), but here the regression approach should provide a good approximation. The analysis of ordinal variables (Likert-type item) has been discussed elsewhere on this site.

However, in current practice, your approach is what is commonly found when validating a questionnaire or constructing scoring rules: We use weighted or unweighted combination of item scores (hence, they are treated as numeric variables) to report individual location on the latent trait(s) under consideration.