Solved – Reliability testing for questionnaire with nominal multiple choice questions (one possible answer)

categorical datacronbachs-alphareliability

I have used a questionnaire to collect data from 200 people.

This questionnaire was obtained from a published research paper in which researchers had used it in a specific population. It involves 34 items that are nominal multiple choice questions with one possible answer.

For example: Which factor do you consider?

  1. X
  2. Y
  3. Z

Questions do not have all the same number of possible answers. For example, question1 has 3 possible answers, question2 4 and so on.

According to the paper, Cronbach's $\alpha$ was 0.62.

I wanted to use the questionnaire in a different population.
I carried out a pilot study in 20 people and made some changes. My colleague in charge of the statistics calculated Cronbach's $\alpha$. It was 0.65. When I finished the study, gathering data from 200 people, the calculated Cronbach's $\alpha$ was 0.57.

However, I myself think that Cronbach's $\alpha$ is not suitable in this case, as my questionnaire is not Likert, nor scaling. I believe that neither was it suitable in the original published paper. Unfortunately, time has passed and test-retest cannot be performed.

So, do you think that Cronbach's $\alpha$ can be applied in this questionnaire?
Which is the correct way of coding these data (different coding results in a different Cronbach's $\alpha$)?
What other reliability tests can be performed in this case?

Best Answer

Currently, Cronbach's alpha is not applicable, either for the instrument as a whole (grouping together all 34 items) or to any subset of items. Alpha requires that each item be scored numerically -- perhaps on a scale such as 1-5, but at least on a "scale" from 0 to 1. With nominal answer choices this is not workable except, possibly, with some additional steps.

Is there some thread of responses running through the various items that, if selected, would express some common opinion or theme? For example, if a preference for Beethoven is reflected in Question 1 Choice A (for short, "Q1-A"), Q2-C, Q3-B, and Q4-B, then you could, as @Jeremy Miles seems to be suggesting, score those responses as "1," and all other responses in those four items as "0." Then the scores of those four items would allow computation of Alpha.

Perhaps some other theme could be given similar treatment, using some other set of items. Then you would have a second scale for which to report alpha.

Without such scoring, it hardly has meaning to seek to measure reliability via internal consistency. There are other types of reliability. In addition to Jeremy's point about test-retest, it is conceivable, if a longshot, that inter-rater reliability would apply, or even parallel-forms.