Solved – How to Factor Analysis be used to remove questions from a survey

confirmatory-factordimensionality reductionfactor analysislikert

Suppose I have a psychological questionnaire asking 30 questions about a person's mental health (on a Likert-scale 1-7). These 30 questions fall into 7 separate, but correlated categories.

The questionnaire has been used for several years, but I would like to develop a shorter version of it for better respondent experience (and to simply reduce survey length)

Suppose I have data collected from around 1000 individuals, and would like to use that information to reduce survey length.

My goal is to remove questions from the survey that might be adding to the survey length but not providing any additional information.

I read that Factor Analysis can be used in cases like this, but am not sure how to apply it in my scenario.

What would the steps look like using Factor Analysis for removing redundant questions from the questionnaire to shorten the survey?

In simple terms, how can Factor Analysis be used to remove questions from my survey without losing relevant information? Would you be able to provide a reference to a book/website that explains Factor Analysis for this purpose?

Here's my understanding so far, but I am not sure these are the right steps or if I am missing anything:

  1. Run Exploratory Factor Analysis on data to identify factor patterns (would CFA be more appropriate here?)
  2. Remove items that do not load on any factors

How can identify if a question is not needed? For example, if it is almost the same as another question on the survey.

Best Answer

If you are doing the analysis for serious research projects, I would recommend not to remove any questions. As you said, "The questionnaire has been used for several years." That means they seem to work pretty well previously. Unless you have evidence that these questions are not valid measurement items, keeping them would be better. Another reason is that you have 30 questions falling into 7 major categories. That is to say, on average each category (I would like to call it a subconstruct) only has around 4 questions/items. Removing questions from your current pool might lead to some subconstruct having not sufficient number of items to accurately measure them. Generally it is hard to accurately measure a construct with just 2 or 3 items (usually Cronbach's alpha would be low). You need to take into account how the question removal would impact your measurement model. I would say having a valid measurement model is much more important than reducing the number of questions.

I only recommend you to remove some subconstructs (categories) if they are not very relevant to your analysis. Not all the categories are equal. Only keep those that are important to your study. In this way, you can remove questions associated with those unimportant categories and at the same time still be able to build valid measurement models for those important categories.

Technically speaking, you can conduct a pilot study to collect data with all questions. If you already collected the full data in a previous study, it would be fine to reuse the old data. Then use factor analysis in statistics packages such as R to see if some questions are not highly loaded to their categories. For an example, refer to https://www.statmethods.net/advstats/factor.html. For those questions having small leadings (< 0.4), you can remove them from your measurement model. Once you fit a good measurement model without these questions, you have a smaller set of questions that can be used for your final study.

Related Question