Solved – How to reduce number of items using factor analysis, internal consistency, and item response theory in conjunction

I am in the process of empirically developing a questionnaire and I will be using arbitrary numbers in this example to illustrate. For context, I am developing a psychological questionnaire aimed at assessing thought patterns commonly identified in individuals who have anxiety disorders. An item could look like "I need to check the oven repeatedly because I can't be sure its off".

I have 20 questions (5-point Likert) which may be comprised of one or two factors (note that in reality I have closer to 200 questions, comprised of 10 scales, and each scale may be comprised of two factors). I am willing to erase about half the items, leaving 10 questions on one of two factors.

I am familiar with exploratory factor analysis (EFA), internal consistency (Cronbach's alpha), and item characteristic curves in item response theory (IRT). I can see how I would use any single of these methods to determine which items are the "worse" within any single scale. I appreciate that each method also answers different questions, although they may lead to similar results and I am not sure what "question" is most important.

Before we start, lets make sure I know what I am doing with each of these methods individually.

Using EFA, I would identify the number of factors, and remove the items that load the least (lets say <.30) on their respective factor or that cross-load substantially across factors.
Using internal consistency, I would remove items that have the worse "alpha if item deleted". I could do so assuming one factor in my scale, or do it after an initial EFA to identify the number of factors and subsequently run my alpha for each factor.
Using IRT, I would remove items that do not assess the factor of interest along their (5 Likert) response options. I would be eyeballing item characteristic curves. I would basically be looking for a line on a 45 degree angle going from option 1 on the Likert scale all the way up to 5 along the latent score. I could do so assuming one factor, or do it after an initial
EFA to identify number of factors, and subsequently run the curves for each factor.

I am unsure which of these methods to use in order to best identify which items are the "worst". I use worst in a broad sense such that the item would be of detriment to the measure, either in terms of reliability or validity, both of which are equally important to me. Presumably I can use them in conjunction, but I am not sure how.

If I was to go ahead with what I know now and give it my best shot I would do the following:

Do an EFA to identify number of factors. Also delete items with bad loadings on their respective factors, since I don't want items that load badly regardless of how they would do in other analyses.
Do IRT and remove bad items judged by that analysis as well, if any remain from the EFA.
Simply report Cronbach's Alpha and don't use that metric as a means to delete items.

Any general guidelines would be greatly appreciated!

Here is also a list of specific questions that you can perhaps answer:

What is the practical difference between removing items based on factor loadings and removing items based on Chronbach's alpha (assuming you use the same factor layout for both analyses)?
Which should I do first? Assuming I do EFA and IRT with one factor, and both identify different items that should be removed, which analysis should have priority?

I am not hard set on doing all of these analyses, although I will report Chronbach's alpha regardless. I feel like doing just IRT would leave something missing, and likewise for just EFA.

Best Answer

I don't have any citations, but here's what I'd suggest:

Zeroth: If at all possible, split the data into a training and test set.

First do EFA. Look at various solutions to see which ones make sense, based on your knowledge of the questions. You'd have to do this before Cronbach's alpha, or you won't know which items go into which factor. (Running alpha on ALL the items is probably not a good idea).

Next, run alpha and delete items that have much poorer correlations than the others in each factor. I wouldn't set an arbitrary cutoff, I'd look for ones that were much lower than the others. See if deleting those makes sense.

Finally, choose items with a variety of "difficulty" levels from IRT.

Then, if possible, redo this on the test set, but without doing any exploring. That is, see how well the result found on the training set works on the test set.

Best Answer

Related Solutions

Solved – Advice on scientifically sound scale construction

Solved – Is it plausible to get a Cronbach’s alpha of .85 with only two Likert-type items

Related Question