Solved – Confidence Interval of Categorical Data with Multiple responses

categorical dataconfidence intervalmany-categoriesmultivariate analysis

What is the best practice to calculate confidence intervals for Categorical Data with multiple responses? All of the tutorials and information I find online use boolean examples.. (Do you support this candidate, yes or no? Do you like this product, yes or no?)

I would like to know how to deal with a question such as: which is your favorite food: Watermelon, Lemons, Strawberries, or Prunes? (pick one) Where I would survey a random sample of the population.

Should I simply calculate the sample percentage of "Lemons" versus the total sample size and extract the confidence interval using only that information, or is there a way to incorporate the percentages of each response in the confidence interval calculation?

Best Answer

If there is just one category of interest, e.g. lemons, there's no problem with "binarizing" it and extracting a CI for the population proportion who like lemons. However, if you do this for each category, you might end up with something misleading because the numbers of responses for each category are dependent and you are making multiple probability statements, and so the probability that each proportion lies in your interval could be very different than you might expect. You could report a "confidence region," or "simultaneous confidence intervals" that capture this dependency better. See Sison and Glaz.

Related Question