Solved – Choosing reasonable parameters for a negative binomial distribution

maximum likelihoodnegative-binomial-distributionpython

My data is a list of observations and a count for each observation. The data is overdispersed, the mean is ~1,200 and the variance is ~18,000,000. I want to use a negative binomial model to assign p-values for each observation. I tried doing this with a Poisson model (I know the number of trials and the probability of success for each trial) but the p-values became so small for many of the observations that python interpreted the number as 0. Values that were just a couple hundred from the mean were called highly significant because the variance was being underestimated by the model.

Is there an easy to implement way to estimate the parameters for the negative binomial using my data in python? I also need help developing an intuition for what r and p represent in the negative binomial when it is really a mixed Poisson Gamma distribution.

Best Answer

For maximum-likelihood estimation, you'll need to solve the score equations numerically: see http://en.wikipedia.org/wiki/Negative_binomial_distribution#Maximum_likelihood_estimation. (Or directly maximize the log-likelihood.)

For method-of-moments estimation, following e.g. the parametrization given here, substitute the sample mean & variance for the population mean $\operatorname{E}Y$ & variance $\operatorname{Var}Y$, & solve for the parameters $\mu$ & $\theta$. In this case the estimates are

$$\tilde\mu = \bar y$$

$$ {\tilde\theta}= \frac{\bar y^2}{s_y^2 - \bar y} $$

where $\bar y$ is the sample mean, & $s_y^2$ the sample variance.

Related Question