Estimate MLE of discrete distribution with two parameters in R

maximum likelihoodnegative-binomial-distributionnumericsoptimizationr

I want to estimate the MLE of a discrete distribution in R using a numeric method.

My data looks like this:

data1<-c(5,2,2,3,0,2,1 2,4,4,1)

If we assume it follows a negative binomial distribution, how do we do it in R? There are a lot of tutorials about estimating mle for one parameter but in this case, there are two parameters ( in a negative binomial distribution)

And it involves a gamma function, which makes it more complicated.

Any ideas?

Edit: I wish to use optim in R or other methods.

log-likelihood:
enter image description here

I'm having trouble with constructing it in R.

Best Answer

Firstly, your equation for the log-likelihood function for the negative binomial distribution looks wrong to me, and it's not clear how your $\beta$ enters into the parameterisation of the distribution. In any case, I will show you how to do this kind of problem using the standard parameterisation of the negative binomial distribution. If $X_1,...,X_n \sim \text{IID NegBin}(r, \theta)$ then you should have:

$$\begin{align} \ell_\mathbf{x} (r, \theta) &= \sum_{i=1}^n \log \text{NegBin}(x_i |r, \theta) \\[6pt] &= \sum_{i=1}^n \log \Bigg( \frac{\Gamma(x_i+r)}{x_i! \Gamma(r)} (1-\theta)^r \theta^{x_i} \Bigg) \\[6pt] &= \sum_{i=1}^n \log \Gamma(x_i+r) - \sum_{i=1}^n \log (x_i!) - n \log \Gamma(r) + nr \log (1-\theta) + \log (\theta) \sum_{i=1}^n x_i \\[6pt] &= \sum_{i=1}^n \log \Gamma(x_i+r) - n \tilde{x}_n - n \log \Gamma(r) + nr \log (1-\theta) + n \bar{x}_n \log (\theta), \\[16pt] \end{align}$$

where $\bar{x}_n \equiv \sum_{i=1}^n x_i / n$ and $\tilde{x}_n \equiv \sum_{i=1}^n \log (x_i!) / n$. The first-order partial derivates of this function are:

$$\begin{align} \frac{\partial \ell_\mathbf{x}}{\partial r} (r, \theta) &= \sum_{i=1}^n \psi(x_i+r) - n \psi(r) + n \log (1-\theta), \\[12pt] \frac{\partial \ell_\mathbf{x}}{\partial \theta} (r, \theta) &= - \frac{nr}{1-\theta} + \frac{n \bar{x}_n}{\theta}. \\[6pt] \end{align}$$

The second partial derivatives show that the log-likelihood is concave, so the MLE occurs at the critical points of the function. The MLE for the probability parameter is $\hat{\theta}(r) = \bar{x}_n/(r + \bar{x}_n)$, and you can use this explicit form to write the profile log-likelihood:

$$\begin{align} \hat{\ell}_\mathbf{x} (r) &\equiv \ell_\mathbf{x} (r, \hat{\theta}(r)) \\[12pt] &= \sum_{i=1}^n \log \Gamma(x_i+r) - n \tilde{x}_n - n \log \Gamma(r) + nr \log \bigg( \frac{r}{r+\bar{x}_n} \bigg) + n \bar{x}_n \log \bigg( \frac{\bar{x}_n}{r+\bar{x}_n} \bigg) \\[16pt] &= \sum_{i=1}^n \log \Gamma(x_i+r) - n \tilde{x}_n - n \log \Gamma(r) + nr \log (r) + n \bar{x}_n \log (\bar{x}_n) - n(r+\bar{x}_n) \log (r+\bar{x}_n), \\[16pt] \end{align}$$

and its derivative:

$$\begin{align} \frac{d \hat{\ell}_\mathbf{x}}{dr} (r) &= \sum_{i=1}^n \psi(x_i+r) - n \psi(r) + n \log (r) - n \log (r+\bar{x}_n). \\[16pt] \end{align}$$

(This derivative matches the initial partial derivative with the substituted MLE for the probability parameter.) In order to compute the MLE we need to maximise the profile log-likelihood function, which is equivalent to finding the solution to its critical point equation. We first find the MLE for $r$ and then use the this to get the MLE for $\theta$ from its explicit form. This can be done using standard optimisation or root-finding functions.


Objective function: For this kind of constrained optimisation problem it is best to first convert to unconstrained optimisation, so we will optimise the transformed parameter $\phi \equiv \log r$, which has the (minimising) objective function:

$$\begin{align} F_\mathbf{x}(\phi) \equiv - \hat{\ell}_\mathbf{x} (e^\phi) &= - \sum_{i=1}^n \log \Gamma(x_i+e^\phi) + n \tilde{x}_n + n \log \Gamma(e^\phi) - n \phi e^\phi \\[6pt] &\quad - n \bar{x}_n \log (\bar{x}_n) + n(e^\phi+\bar{x}_n) \log (e^\phi+\bar{x}_n), \\[16pt] \frac{d F_\mathbf{x}}{d\phi}(\phi) &= - e^\phi \sum_{i=1}^n \psi(x_i+e^\phi) + n e^\phi \psi(e^\phi) - n (\phi+1) e^\phi \\[6pt] &\quad + n e^\phi (1+\log (e^\phi+\bar{x}_n)). \\[16pt] \end{align}$$

Minimising this objective function will give you the MLE $\hat{\phi}$ from which you can then compute $\hat{r}$ and $\hat{\theta}$. We implement this below and give an example. (You should also note that there are certain pathological cases in this estimation problem. The negative binomial distribution has a variance that is never smaller than its mean, so it has difficulties with any dataset with a sample variance smaller than its mean. In some cases (which occur with non-zero probability even under the model) the inferential problem will lead to the estimate $\hat{\phi} = \hat{r} = \infty$ and $\hat{\theta} = 0$ (see e.g., here). In this case your numerical search for the MLE will technically "fail" but it will stop after giving you a "large" value for $\hat{\phi}$ and a "small" value for $\hat{\theta}$


Implementation in R: We can implement the computation of the MLE in R by using the nlm function for nonlinear minimisation. For this iterative optimisation we will use the method-of-moments estimator as the starting value (see this related question for the MOM estimators). Here is a function that computes the MLE of the parameters of the negative binomial for any valid input for the observed data vector x.

MLE.nbinom <- function(x) {
  
  #Check input
  if (!is.numeric(x))           stop('Error: Data x should be non-negative integers')
  if (any(x != as.integer(x)))  stop('Error: Data x should be non-negative integers')
  if (min(x) < 0)               stop('Error: Data x should be non-negative integers')
  
  #Compute sample quantities
  n       <- length(x)
  bar.x   <- mean(x)
  tilde.x <- sum(lfactorial(x))/n

  #Set objective function
  NEGLOGLIKE <- function(phi) {
    r     <- exp(phi)
    OUT   <- - sum(lgamma(x+r)) + n*tilde.x + n*lgamma(r) - 
               n*r*phi - n*bar.x*log(bar.x) + n*(r+bar.x)*log(r+bar.x)
    D1    <- - r*sum(digamma(x+r)) + n*r*digamma(r) - n*r*(phi+1) + 
               n*r*(1+log(r+bar.x))
    attr(OUT, 'gradient') <- D1
    OUT }
  
  #Compute MLE
  START.r <- max(bar.x^2/(var(x) - bar.x), 1e-4)
  NLM     <- nlm(f = NEGLOGLIKE, p = log(START.r))
  MLE.r   <- exp(NLM$estimate)
  MLE.t   <- bar.x/(MLE.r + bar.x)
  MAX.LL  <- -NLM$minimum
  
  #Give output
  list(MLE.r = MLE.r, MLE.theta = MLE.t, loglike.max = MAX.LL) }

Now let's try this function on some simulated data from the negative binomial distribution. (Noting that the base function rnbinom uses a slightly different parameterisation from our density above.)

#Generate data from negative binomial distribution
set.seed(1)
r     <- 3
theta <- 0.8
x     <- rnbinom(n = 10^6, size = r, prob = 1-theta )

#Compute the MLE
MLE.nbinom(x)

$MLE.r
[1] 2.998328

$MLE.theta
[1] 0.8001082

$loglike.max
[1] -3341525

As you can see, our MLE function comes reasonably close to recovering the true parameters used to generate the data. (You should note that the MLEs are biased but consistent estimators in this case.) With a bit more work you could compute the relevant second-order partial derivatives and use these to compute the standard error matrix for the estimator. I will leave this as an exercise for the reader.

Related Question