There is a maximum possible number of counted answers, related to the number of questions asked. Although one can model this as a Poisson process of the counting type, another interpretation is that a Poisson process has no theoretical limit for the number of counted answers, that is, it is on $[0,\infty)$. Another distribution, i.e., a discrete one that has finite support, e.g., the beta binomial, might be more appropriate as it has a more mutable shape. However, that is just a guess, and, in practice, I would search for an answer to a more general question using brute force...
Rather than check for overdispersion, which has no guarantee of leading to a useful answer, and, although one can examine indices of dispersion to quantify dispersion, I would more usefully suggest searching for a best distribution using a discrete distribution option of a fit quality search program, e.g., Mathematica's FindDistribution routine. That type of a search does a fairly exhaustive job of guessing what known distribution(s) work(s) best not only to mitigate overdispersion, but also to more usefully model many of other data characteristics, e.g., goodness of fit as measured a dozen different ways.
To further examine my candidate distributions, I would post hoc examine residuals to check for homoscedasticity, and/or distribution type, and also consider whether the candidate distributions can be reconciled as corresponding to a physical explanation of the data. The danger of this procedure is identifying a distribution that is inconsistent with best modelling of an expanded data set. The danger of not doing a post hoc procedure is to a priori assign an arbitrarily chosen distribution without proper testing (garbage in-garbage out). The superiority of the post hoc approach is that it limits the errors of fitting, and that is also its weakness, i.e., it may understate the modelling errors through pure chance as many distributions fits are attempted. That then, is the reason for examining residuals and considering physicality. The top down or a priori approach offers no such post hoc check on reasonableness. That is, the only method of comparing the physicality of modelling with different distributions, is to post hoc compare them. Thus arises the nature of physical theory, we test a hypothetical explanation of data with many experiments before we accept them as exhausting alternative explanations.
Best Answer
A common way (not necessarily the best --- what's 'best' depends on your criteria for bestness) to decide this would be to see if there's overdispersion in a Poisson model (e.g. by looking at the residual deviance.
For example, look at
summary(glm(count~spray,InsectSprays,family=poisson))
- this has a residual deviance of 98.33 for 66 df. That's about 50% larger than we'd expect, so it's probably big enough that it could matter for your inference.[If you want a formal test,
pchisq(98.33,66,lower.tail=FALSE)
, but formal testing of assumptions is generally answering the wrong question.]So I'd be inclined to consider a negative binomial for that case.
More generally, if you're not reasonably confident that the Poisson makes sense, you could simply use negative binomial as a default, since it encompasses the Poisson as a limiting case.
Yes, that's appropriate, and it would be my first instinct to include sampling time as an offset (rather than a predictor), since count would be expected to simply be proportional to the length of the sampling interval.