Solved – Choosing alternatives to Poisson regression for overdispersed count data

count-datapoisson distribution

I'm currently analyzing data from a series of behavioral experiments that all use the following measure. The participants in this experiment are asked to select clues that (fictitious) other people could use to help solve a series of 10 anagrams. The participants are led to believe that these other people will either gain or lose money, depending on their performance in solving the anagrams. The clues vary in how helpful they are. For example, for the anagram NUNGRIN, an anagram of RUNNING, three clues might be:

  1. Moving quickly (unhelpful)
  2. What you do in a marathon race (helpful)
  3. Not always a healthy hobby (unhelpful)

To form the measure, I count the number of times (out of 10) a participant chooses an unhelpful clue for the other person. In the experiments, I'm using a variety of different manipulations to affect the helpfulness of the clues that people select.

Because the helpfulness / unhelpfulness measure is fairly strongly positively skewed (a large proportion of people always choose the 10 most helpful clues), and because the measure is a count variable, I've been using a Poisson Generalized Linear Model to analyze these data. However, when I did some more reading on Poisson regression, I discovered that because Poisson regression does not independently estimate the mean and variance of a distribution, it often underestimates the variance in a set of data. I started to investigate alternatives to Poisson regression, such as quasipoisson regression or negative binomial regression. However, I admit that I'm rather new to these kinds of models, so I'm coming here for advice.

Does anybody have any recommendations about which model to use for this kind of data? Are there any other considerations that I should be aware of (for example, is one particular model more powerful than another?)? What sort of diagnostics should I look at to determine if the model I select is handling my data appropriately?

Best Answer

Your outcome is the number of helpful clues out of 10, which is a binomial random variable. So you should analyze it with some sort of binomial regression, probably quasi-binomial to allow for overdispersion. Note that the Poisson and the misleadingly named negative binomial distributions are suited for unbounded count data.