Solved – Exemplar MLE for negative binomial

maximum likelihoodnegative-binomial-distribution

I recently compared MLE estimates for a negative binomial fit using two different pieces of software, and got different results. I'd like to determine which (if either) is correct.

To do that, I'd like an example of a simple set of data along with the corresponding MLE solution. I'd really like an analytic proof of the optimality of the MLE solution, since determining it with some third software library seems a bit circular. However, we are free to choose any data that might make the analysis work out nicely.

For the record, I noticed the discrepancy through these snippets of code (first library in R, second library in Python). The data was {2,4} (i.e., I only fit two data points).

library(MASS)
x=fitdistr(c(2,4),"Negative Binomial")
p = x$estimate['size'][[1]] / (x$estimate['size'][[1]] + x$estimate['mu'][[1]])
# Results:
# > p
# [1] 0.9708738
# > x
#      size         mu    
#   100.00020     3.00000 
#  (505.70845) (  1.24298)

and

import fit_nbinom
x = fit_nbinom.fit_nbinom(np.array([2,4]))
mu = ((1.0-x['prob'])/x['prob'])  * x['size']
# Results:
# In []: x
# Out[]: {'prob': 0.98892868281051893, 'size': 270.37321834245245}
#
# In []: mu
# Out[]: 3.026899423427535

Best Answer

The negative binomial density is:

$$f(x) = \Gamma(x + n )/(\Gamma(n) x!) p^n (1-p)^x$$

So for your sample:

$$L([2,4]) = f(2)f(4) \propto \Gamma(2 + n )\Gamma(4 + n )/2\Gamma(n)( p^{2n} (1-p)^6 ) $$

which we are to maximize

The mean is 3 by default as others have noted. Define a few useful R wrappers:

loglik <- function(size) sum(dnbinom(c(2,4), size=size, mu=3, log=T))

and just visualize what happens over a range of "size"

plot(1:1000, sapply(1:1000, loglik), type='l', xlab='Size parameter', ylab='Trace of log likelihood')

You can see the likelihood approaches an asymptote and has no unique, finite maximum.

Related Solutions

Negative Binomial Distribution – Understanding Key Parameters

You should look further down the Wikipedia article on the NB, where it says "gamma-Poisson mixture". While the definition you cite (which I call the "coin-flipping" definition since I usually define it for classes as "suppose you want to flip a coin until you get $k$ heads") is easier to derive and makes more sense in an introductory probability or mathematical statistics context, the gamma-Poisson mixture is (in my experience) a much more generally useful way to think about the distribution in applied contexts. (In particular, this definition allows non-integer values of the dispersion/size parameter.) In this context, your dispersion parameter describes the distribution of a hypothetical Gamma distribution that underlies your data and describes unobserved variation among individuals in their intrinsic level of contact. In particular, it is the shape parameter of the Gamma, and it may be helpful in thinking about this to know that the coefficient of variation of a Gamma distribution with shape parameter $\theta$ is $1/\sqrt{\theta}$; as $\theta$ becomes large the latent variability disappears and the distribution approaches the Poisson.

Solved – Use of a negative binomial model for fitting alternative splicing event

Some comments first:

You need a testable hypothesis before getting p-values. That means that you need to describe what the data would look like without and with the effect of interest. You can do parameter estimation without hypothesis testing, but then you need to specify what parameter is of interest, and there are no p-values.
While your values are counts, they represent the number of reads falling into each category with a fixed total per experimental run. Or more precisely, the total is random, but you want to condition on it (you keep saying things like "out of a sample of b events"). If so, your counts do not have a negative binomial distribution, but rather a binomial distribution.

The following assumes such a conditioning on the total number of reads per experimental run. The right analysis approach still depends on whether your goal is to estimate hte proportion of exon-skipping reads for each junction (with a confidence interval) or do you want to be able to estimate the entire distribution of the number of exon-skipping reads given the total number of reads. You seem to be asking for the latter, but I am not sure whether that's what you really need, as it is not clear what are you planning to do with the results of the analysis.

You have very little information to inform the model, so you have to make major assumptions. Based on a preliminary analysis using quasi-binomial regression it appears that there might be overdispersion: the probability of an exon-skipping read varies somewhat between the replicates. If that's true, binomial regression cannot be used, but you could consider beta-binomial regression. Here I will show the binomial regression.

First, you have to set up the data so that counts from the same replicate are in the same row.

d1 <- data.frame(Rep=1:3, Skip=c(8,0,0), Normal=c(12,6,8))

And then we can use a binomial glm:

m1 <- glm(cbind(Skip,Normal) ~ 1, data=d1, family=binomial)
summary(m1)



Call:
glm(formula = cbind(Skip, Normal) ~ 1, family = binomial, data = d1)

Deviance Residuals: 
     1       2       3  
 1.634  -1.794  -2.072  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -1.1787     0.4043  -2.915  0.00355 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 10.18  on 2  degrees of freedom
Residual deviance: 10.18  on 2  degrees of freedom
AIC: 15.613

Number of Fisher Scoring iterations: 4

Note that the p-value shown is probably meaningless to you. It tests whether the parameter is 0, and that corresponds to testing whether the probability of exon-skipping is 0.5.

The predict function can be used to get the predicted probability of exon-skipping for each replicate, and the dbinom function can can be used to get the individual response probabilities:

p1 <- predict(m1, type="response")
> p1
        1         2         3 
0.2352941 0.2352941 0.2352941 
> newtotal <- 10
> dbinom(0:newtotal, size=newtotal, p=p1[1])
 [1] 6.838240e-02 2.104074e-01 2.913333e-01 2.390427e-01 1.287153e-01 4.752565e-02
 [7] 1.218606e-02 2.142605e-03 2.472236e-04 1.690418e-05 5.201286e-07

So, for example, the probability of having 0 exon-skipping reads out of 10 reads at junciton 1 is estimated to be 0.06828.

Best Answer

Related Solutions

Negative Binomial Distribution – Understanding Key Parameters

Solved – Use of a negative binomial model for fitting alternative splicing event

Related Question