Solved – Statsmodels’ Negative Binomial: after .fit_regularized(), how to turn PMF into PPF to get the discrete values

density functionnegative-binomial-distributionscipystatsmodels

I used the package statsmodels to fit a Negative Binomial to my data. This data contains ~1500 samples with 21 covariates. Since I have overdispersion in my data because my dependent variable (y) is skewed, I used the fit_regularized function (the normal .fit() does not make the numerical solver –newton, nm, cg…- converge).

When I plot the fitted data it looks more like a probability mass function (PMF), because I see that the predicted values are not integer (as it is usually the case with NB), but float numbers. I thought I could use scipy.nbinom.ppf() to turn these floating numbers into an integer value (as I previously did with Poisson and it's mu mean values), but I see that scipy.nbinom.ppf() does not receive a mu, but a p, q, and n parameters. And I can't make sense of these parameters.

If I use q=0.95, p=0.5, and n=predicted values, I get the following plot:

enter image description here

And if I use q=0.5, p=0.35, and n=predicted values, I get the following plot:

enter image description here

It seems that p and q control the shape of the ppf-converted distribution, but I do not understand the rationale behind it. What is p? What is q?

My question is, what is the proper way of turning the "orange" distribution into an discrete one, so that it resembles the original "grey" distribution as much as possible? It does not feel right to just try values until I force the "orange" to look like the "grey", but I don't know a proper way of turning the PMF (if its a PMF) into the PPF. Does anybody know how to do this, or whether I am in the right track? I only want the discrete values as with Poisson.

Thanks!

Best Answer

NegativeBinomial regression is usually done in a different parameterization from the standard negative binomial distribution.

The predicted value is the mean or expected value of the distribution, usually modeled with a log link. The second parameter is for the dispersion. The mean of the negative binomial is similar to Poisson a continuous variable.

The parameters have to be converted from the regression parameterization to the usual distribution parameterization which then can be used, for example, with the scipy.stats distribution.

Specific to statsmodels:

The NegativeBinomial has no conversion function built in. A script and link to a stackoverflow question is here https://github.com/statsmodels/statsmodels/issues/106#issuecomment-43961704.

Since version 0.9.0 statsmodels has a generalized version in NegativeBinomialP which has an extended predict method and a parameter conversion function. which parameter in http://www.statsmodels.org/dev/generated/statsmodels.discrete.discrete_model.NegativeBinomialP.predict.html and a currently only internally used http://www.statsmodels.org/dev/generated/statsmodels.discrete.discrete_model.NegativeBinomialP.convert_params.html

(To fully support these conversion parts is still work in progress in statsmodels)

A notebook to illustrate some of the new countmodel features is here https://gist.github.com/josef-pkt/c932904296270d75366a24ee92a4eb2f

Related Question