I'm going to answer this using a Poisson model, which is precisely a negative binomial model without overdispersion, because the math will be simpler. The poisson model predicts the probability of observing $y_i$ to be a particular non-negative discrete number
$$P(y_i|X) = \dfrac{\exp(-\lambda_i)\lambda_i ^{y_i}}{y_i!}$$
The conditional mean of this distribution $\lambda_i$.
$$E[y_i|x_i] = \lambda_i = \exp(x_i\beta)$$
$$\log \lambda_i = x_i\beta$$
The conditional variance of the poisson model is also $\lambda_i$, but the variance of the negative binomial model is $\lambda_i + \alpha \lambda_i$. This is the only practical difference between the two models for the purposes of this answer.
This is effectively a log-linear model. So the marginal effect of $x$ on $\lambda$ can be shown as
$$\dfrac{\partial E[y|x]}{\partial x} = \dfrac{\partial\lambda_i}{\partial x} = \exp(\beta)$$
So if you have a negative $\beta$ for a dummy variable $x$, you can say that "on average, $x$ lowers the expected value of $\log(y)$ by $\beta$*100 percent."
The negative binomial distribition parametrized by mean and size can be given by
$$ \DeclareMathOperator{\P}{\mathbb{P}}
\P (X=k) = \binom{k+m-1}{k}\left( \frac{m}{m+\mu} \right)^m \left( \frac{\mu}{m+\mu} \right)^k
$$
for the outcome $k$ a nonnegative integer, and $\mu>0$ the mean, $m>0$ the size. I will do the calculations by maple.
The Fisher information matrix (of size $2\times 2$) has components
$I_{\mu\mu}, I_{\mu m} \text{ and } I_{m m}$ given by
$$ \DeclareMathOperator{\E}{\mathbb{E}}
I_{ij}=-\E\left\{ \frac{\partial^2}{\partial \theta_i \partial\theta_j}\log f(X;\theta)|\theta \right\}
$$ where here $\theta=(\mu, m)$. Then we (maple code at the end of post) find
$$
I_{\mu\mu}=\frac{m}{(m+\mu)\mu}
$$
The diagonal term is simplest, it reduces to zero! That is the beauty of the mean parametrization
$$
I_{\mu m}=0
$$ showing that $\mu$ and $m$ are orthogonal parameters.
For the last term, the result will involve a trigamma function written $\Psi(1,\cdot)$ (second derivative of log of gamma function) and result will be a somewhat complex infinite series, which must be evaluated numerically:
$$
I_{mm}=\sum_{k=0}^\infty \binom{k+m-1}{k}\left\{ -m^{m-1}\mu^k (m+\mu)^{-m-2-k} \left( m(m+\mu)^2 \Psi(1,k+m) -m(m+\mu)^2 \Psi(1,m) +mk+\mu^2 \right)
\right\}
$$
A concise form can be derived either by simplifying the expression that Maple given above:
\begin{align*}
I_{mm} =& -\sum_{k=0}^\infty\binom{k+m-1}{k}\left(\frac{m}{m+\mu}\right)^m \left(\frac{\mu}{m+\mu}\right)^k\{\frac{1}{(m+\mu)^2m}\left(m(m+\mu)^2\Psi(1,k+m)-m(m+\mu)^2\Psi(1,m)+m k+\mu^2\right)\}\\
=& -\mathbb{E}\left(\frac{1}{(m+\mu)^2m}\left(m(m+\mu)^2\Psi(1,X+m)-m(m+\mu)^2\Psi(1,m)+m X+\mu^2\right)\right)\\
=& -\mathbb{E}\left(\frac{1}{(m+\mu)^2m}\{m(m+\mu)^2(\Psi(1,X+m) - \Psi(1,m))+m X +\mu^2\}\right)\\
=& -\mathbb{E}\left(\Psi(1,X+m) - \Psi(1,m)\right) - \frac{\mu}{m(m+\mu)}
\end{align*}
where $X$ follows negative binomial distribution with mean $\mu$ and size $m$.
Or by definition of Fisher information
\begin{align*}
I_{mm} =& - \mathbb{E}\frac{\partial^2}{\partial m^2}\ln \mathbb{P}(X;\mu,m) \\
=& - \mathbb{E}\frac{\partial}{\partial m} \{\Psi(X+ m) - \Psi( m) + \ln\frac{ m}{ m+\mu} + \frac{\mu -X }{ m+ \mu}\}\\
=& - \mathbb{E} \{\frac{\partial}{\partial m}(\Psi(X+ m) - \Psi( m)) +
+ \frac{1}{ m}-\frac{1}{ m+\mu}-\frac{\mu - X}{( m+\mu)^2}\}\\
=& -\mathbb{E}\frac{\partial}{\partial m}\left(\Psi(X+ m) - \Psi( m)\right) -\frac{\mu}{ m( m+\mu)} \\
=& -\mathbb{E}\left(\Psi(1,X+ m) - \Psi(1, m)\right) -\frac{\mu}{ m( m+\mu)}
\end{align*}
where $\Psi(\cdot)$ is the digamma function (first derivative of log of gamma function).
Below some maple code (and output):
f := binomial(k+m-1,k)*(m/(m+mu))^m * (mu/(m+mu))^k
m k
/ m \ / mu \
f := binomial(k + m - 1, k) |------| |------|
\m + mu/ \m + mu/
lf := ln( binomial(k+m-1,k) ) + m*ln( m/(m+mu) ) + k* ln( mu/(m+mu) ) assuming m>0,mu>0;
/ m \ / mu \
lf := ln(binomial(k + m - 1, k)) + m ln|------| + k ln|------|
\m + mu/ \m + mu/
simplify( -sum(f*diff(lf,mu,mu),k=0..infinity ) ) assuming m>0,mu>0;
m
-----------
(m + mu) mu
simplify( -sum(f*diff(lf,mu,m),k=0..infinity ) ) assuming m>0,mu>0;
0
simplify( -sum(f*diff(lf,m,m),k=0..infinity ) ) assuming m>0,mu>0;
infinity
-----
\
) (m - 1) (-m - 2 - k) k
- / m (m + mu) mu binomial(k + m - 1, k)
-----
k = 0
/ 2 2 2\
\m (m + mu) Psi(1, k + m) - m (m + mu) Psi(1, m) + m k + mu /
Best Answer
The
$theta
from a fittedglm.nb()
corresponds to thesize
indnbinom()
. As a simple example, let's replicate the fitted log-likelihood from scratch. Using thequine
data fromMASS
:And this value of the log-likelihood can be obtained by summing the
dnbinom(..., log = TRUE)
values:Doubling the weight of all observations leaves all parameter estimates (including
theta
) unchanged: