Solved – Regression model for count data with restricted upper bound

count-datageneralized linear model

What is the recommended model for count data with a known/restricted upper bound on the number of counts? I know that Poisson is used for count data and the negative binomial in case of overdispersion, but if I am not mistaken, they both assume a
non-zero count that has no upper limit (0 to inf).

More details: the dependent variable $y$ is the number of events attended per customer (per year). There is a maximum of 25 events per year (which is the upper bound on thecount). Customers attend 1 to 25 events so $y$ is an integer belonging to [1,25].

The independent variable $x$ is a categorical variable representing the location of the customer and that can take three values (local, non-local and foreigner). I want to use a glm(y ~ x) to predict the number of events attended based on the value of $x$ (I am using R).

Also note that I have deducted the number of attended events by 1 so that my $y$ is now between 0 and 24 to avoid the need for a zero-truncated model (in case I need to use a Poisson or negative binomial). I then intend to use the inverse link function to get the prediction of number of events, and add one to it.

Best Answer

It seems like a binomial GLM might work reasonably well (as Nick also mentioned).

Failing that a quasi binomial glm (to pick up the different dispersion due to a mix of $p$'s across people), or a beta-binomial model, or a binomial glmm (generalized linear mixed model).

There's questions relating to each of these on site.

Related Question