R Programming – Log-Transformation in Negative Log-Likelihood for Negative Binomial Distribution

likelihoodmaximum likelihoodoptimizationr

I am performing negative log-likelihood maximization for success probability parameter of the negative binomial distribution avoiding numerical errors. I am not 100% sure if this procedure is valid, especially, if I am allowed to perform transformation $d = log(odds)$ and if yes, why? My general question is:
is the bellow procedure correct?

Let us assume that $x_1, x_2, \dots x_n$ is a random sample from negative binomial distribution with $r=5$ and $p=4/5$.

X <- rnbinom(5000, size=5, p=1-(4/5))

The probability mass function for negative binomial distribution is give by:
$$
P(X = x_i) = \binom{x_i + r – 1}{x_i} (1-p)^r p^{x_i} = \frac{\Gamma(x_i + r)}{\Gamma(x_i+1)\Gamma(r)} (1-p)^r p^{x_i}.
$$

and log-lihelihood:
$$
\log \prod \limits_i P(X_i = x_i) = \log \left[ \prod \limits_i \frac{\Gamma(x_i + r)}{\Gamma(x_i+1)\Gamma(r)} (1-p)^r p^{x_i} \right]=
\sum \limits_i \log \left( \frac{\Gamma(x_i + r)}{\Gamma(x_i+1)\Gamma(r)} (1-p)^r p^{x_i} \right) =
$$

$$
= \sum_i \left[ \log\Gamma(x_i +r) – \log\Gamma(x_i + 1) – \log\gamma(r) + r\log(1-p) + x_i \log(p) \right]
$$

We assume that $r$ is know and we want to estimate $p$ based on the sample using optim (method L-BFGS-B).

logLikelihood <- function(p){
  r=5
  y <- sapply(X,  function(x) lgamma(x+r) + r*log(1-p) + x*log(p) - lgamma(x+1) - lgamma(r))
  -sum(y)
}
optim(0.5, logLikelihood,   method="L-BFGS-B", lower=0.00001, upper=0.99999)

Result:
Results of optimisation

Since $p$ belongs to a close set $p \in [0,1]$, it may causes problem while performing optimisation. It may be a better idea to transform $p$. Let us denote $odds = \frac{p}{1-p}$. Then $p = \frac{odds}{odds+1}$. In our case $odds = 4$.
Substituting $odds$ for $p$ we arrive at:
$$
\sum_i \left[ \log\Gamma(x_i +r) – r\log(odds+1) + x_i \log\left(\frac{odds}{odds+1}\right) – \log\Gamma(x_i + 1) – \log\gamma(r) \right]
$$

We make transformation $d = log(odds)$. [Are we allowed to do this?]

Then we arrive with log-likelihood given by:
$$
\sum_i \left[ \log\Gamma(x_i +r) – r\log(e^d+1) + x_i \log\left(\frac{e^d}{e^d+1}\right) – \log\Gamma(x_i + 1) – \log\gamma(r) \right]
$$

Remembering sigmoid function definition:
$$ – \log(1+e^d) = \log(\frac{1}{e^{d}+1}) = \log sigmoid(-d), \;\;\;\; \log \left(\frac{e^{d}}{e^{d}+1}\right) = \log sigmoid(d)$$

We arrive with likelihood:
$$
\sum_i \left[ \log\Gamma(x_i+r) + r\log( \textrm{sigmoid}(-d)) + x\log( \textrm{sigmoid}(d)) – \log\Gamma(x_i + 1) – \log\Gamma(r) \right]
$$

sigmoid_logLikelihood <- function(o){
  r=5
  y <- sapply(X,  function(x) lgamma(x+r) + r*log(sigmoid(-o)) + x*log(  sigmoid(o)) - lgamma(x+1) - lgamma(r))
  -sum(y)
}
x<-optim(0.5, sigmoid_logLikelihood, method="L-BFGS-B", lower=0.00001, upper=100)
## d= 
exp(x$par)

Results is $d = 3.96292$, so $e^d$ which is circa equal to $odds$ as we expected.

My general question is:
is the bellow procedure correct? If yes, we we are allowed to make the transformation $d = log(odds)$.

Best Answer

The MLE of $p$ is available in closed form. There is no need to run an algorithm to maximize the likelihood.

The negative binomial is an example of an linear exponential family and, for all linear exponential families, the maximum likelihood of the expected values is the sample mean. Hence $$\hat\mu=\frac{r\hat p}{1-\hat p} = \bar x.$$ where $\bar x$ is the sample mean and $\hat p$ is the MLE of $p$. In other words, $$\hat d= \log(\bar x /r)$$ where $\hat d$ is the MLE of $d$.

To answer your question about transformations, it is possible and valid to reparametrize any parameter with a one-to-one function of that parameter when computing MLEs. In your case $d = \log\{p/(1-p)\}$ is a monotonic differential function of $p$ so it causes no problems. It is a standard procedure in many statistical applications to transform parameters to the whole real line in this way to avoid constraints.

Another transformation that is even more useful in many applications is $$\mu = \frac{r\hat p}{1-\hat p}$$ and $$\phi = 1/r$$ so that $$E(X)=\mu$$ and $${\rm var}(X)=\mu+\phi\mu^2.$$