Solved – Item analysis for a likert-type questionnaire – item discrimination, point-biserial, factor analysis, cronbach’s alpha, other diagnostics

factor analysislikertpsychometricsscale-construction

I'm working on a questionnaire that is intended to measure two constructs. There are presently 26 items in the questionnaire — eleven for construct 1 and 15 for construct 2 — and each item has five available responses: "strongly disagree" "disagree" "neutral" "agree" "strongly agree".

In our first iteration of the questionnaire, we used ten items for each construct, and had about 60 respondents. After the results came in, we made some judgments based on item discrimination (dichotomized), point-biserial correlation, cronbach's alpha after item deletion, and factor analysis, and dropped several of these questions and replaced them with others.

We now have 68 responses to the present questionnaire (the one with 26 items). I'd like to make sure I cover my bases as we do the reliability and validity analysis, and am specifically wondering if we should carry out additional diagnoses or approach the ones we've already calculated in different ways. Our approach to identifying potentially problematic items is to look for:

  1. item discriminations below .3 on the dichotomized responses – where agree/strongly is coded as 1 and the others are coded as zero, and we take the difference between the top 3'rd of respondents and the bottom 3'rd of respondents.

  2. low point-biserial (spearman's) correlations between items and the total score for the associated construct (total score = mean response for that construct). We don't have a threshold defining "low" … and is spearman's or pearson's the appropriate measure?

  3. items whose inclusion causes cronbach's alpha to decrease — identified by running tests with "cronbach's alpha after item is deleted." Would G6 or Omega be more appropriate here?

  4. Factor analysis — we've been looking for items with low loadings in either dimension. Here, we're wondering about two thngs:
    a) whether to "swap" factors from their intended construct to the one which they appear to load with more strongly. Is that a defensible approach, even if the measure doesn't have face validity with respect to its intended construct?
    b) if an item loads strongly in both dimensions (where it appears that there's little separation) is that a problem? I didn't think it was but I thought it might be worthwhile to add.

If anyone has any thoughts on these questions, or if an important procedure comes to mind that we haven't consider, I'd love to hear it. Thanks in advance, and I apologize if this post is not properly directed!

Best Answer

Your method #1 loses information by dichotomizing in two different ways. I'd instead look at each item's correlation with the sum of all the other items (in software such as SPSS this is called "Corrected Item-Total Correlation"). For #2, where you have done something close to this, you could make a case for either Spearman's or Pearson's, and they'll hardly differ since with a 1-5 per-item range there shouldn't be many extreme outliers. You'll have to establish your own threshold, I'm afraid: how exacting do you want to be? How desirable is it to preserve a large number of items for your scales? And how concerned are you about your case-to-item ratio?

As for your questions about factor analysis, yes, to build the scales based on empirical criteria can be defensible, just as it can be to do so using a priori ideas about which item belongs to which dimension. Good research will hopefully reconcile any conflicts between the two. Items with multiple high loadings are a problem if you want uncorrelated factors, something that is often unrealistic in opinion research. At a more general level, I think you have some sense that factor analysis and scale development is best seen as a largely creative process where there are many subjective decisions to be made and often much work to do in justifying them!

Related Question