Solved – Switch from Modelling a Process using a Poisson Distribution to use a Negative Binomial Distribution

kalman filternegative-binomial-distributionpoisson processstate-space-models

$\newcommand{\P}{\mathbb{P}}$We have a random process that may-or-may-not occur multiple times in a set period of time $T$. We have a data feed from a pre-existing model of this process, that provides the probability of a number of events occurring in the period $0 \leq t < T$. This existing model is old and we need to run live checks on the feed-data for estimation errors. The old model producing the data-feed (which is providing the probability of $n$ events occurring in the time-remaining $t$) is approximately Poisson Distributed.

So to check for anomalies/errors, we let $t$ be the time remaining and $X_t$ be the total number of events to occur in the remaining time $t$. The old model implies the estimates $\P(X_t \leq c)$. So under our assumption $X_t\sim \operatorname{Poisson}(\lambda_{t})$ we have:
$$
\P(X_t \leq c) = e^{-\lambda}\sum_{k=0}^c\frac{\lambda_t^k}{k!}\,.
$$
To derive our event rate $\lambda_t$ from the output of old model (observations $y_{t}$), we use a state space approach and model the state relationship as:
$$
y_t = \lambda_t + \varepsilon_t\quad (\varepsilon_t \sim N(0, H_t))\,.
$$
We filter the observations from the old model, using a state space [constant speed decay] model for the evolution of the $\lambda_t$ to obtain the filtered state $E(\lambda_t|Y_t)$ and flag an anomaly/error in the estimated event frequency from the the feed-data if $E(\lambda_t|Y_t) < y_t$.

This approach works fantastically well at picking up errors in the estimated event counts over the full time-period $T$, but not so well if we want to do the same for another period $0 \leq t < \sigma$ where $\sigma < \frac{2}{3} T$. To get around this, we have decided we now want to switch to use the Negative Binomial distribution so that we assume now $X_t\sim NB(r, p)$ and we have:
$$
\P(X_{t} \leq c) = p^{r}\sum_{k = 0}^c (1 – p)^{k}\binom{k + r -1}{r – 1},
$$
where the parameter $\lambda$ is now replaced by $r$ and $p$. This should be straightforward to implement, but I am having some difficulties with interpretation and thus I have some questions I'd like you to help with:

1. Can we merely set $p = \lambda$ in the negative binomial distribution? If not, why not?

2. Assuming we can set $p = f(\lambda)$ where $f$ is some function, how can we correctly set $r$ (do we need to fit $r$ using past data sets)?

3. Is $r$ dependent on the number of events we expect to occur during a given process?


Addendum to extracting estimates for $r$ (and $p$):

I am aware that if we in fact had this problem reversed, and we had the event counts for each process, we could adopt the maximum likelihood estimator for $r$ and $p$. Of course the maximum likelihood estimator only exists for samples for which the sample variance is larger than the sample mean, but if this was the case we could set the likelihood function for $N$ independent identically distributed observations $k_1, k_2, \ldots, k_{N}$ as:
$$
L(r, p) = \prod_{i = 1}^{N}\P(k_i; r, p),
$$
from which we can write the log-likelihood function as:
$$
l(r, p) = \sum_{i = 1}^{N} \ln(\Gamma(k_i + r)) – \sum_{i = 1}^{N} \ln(k_{i}!) – N\ln(\Gamma(r)) + \sum_{i = 1}^{N} k_i \ln(p) + N r\ln(1 – p).
$$
To find the maximum we take the partial derivatives with respect to $r$ and $p$ and set them equal to zero:
\begin{align*}
\partial_{r} l(r, p) &= \sum_{i = 1}^{N} \psi(k_i + r) – N\psi(r) + N\ln(1 – p), \\
\partial_{p} l(r, p) &= \sum_{i = 1}^{N} k_i\frac{1}{p} – N r \frac{1}{1 – p} \enspace .
\end{align*}
Setting $\partial_{r} l(r, p) = \partial_{p} l(r, p) = 0$ and setting $p = \displaystyle\sum_{i = 1}^{N} \displaystyle\frac{k_i} {(N r + \sum_{i = 1}^{N} k_i)},$ we find:
$$
\partial_{r} l(r, p) = \sum_{i = 1}^{N} \psi(k_i + r) – N \psi(r) + N\ln\left(\frac{r}{r + \sum_{i = 1}^{N} \frac{k_i}{N}}\right) = 0.
$$
This equation cannot be solved for r in closed form using Newton or even EM. However, this is not the case in this situation. Although we could use the past data to get a static $r$ and $p$ this is not really any use as for our process, we need to adapt these parameters in time, like we did using Poisson.

Best Answer

The negative binomial distribution is very much similar to the binomial probability model. it is applicable when the following assumptions(conditions) hold good 1)Any experiment is performed under the same conditions till a fixed number of successes, say C, is achieved 2)The result of each experiment can be classified into one of the two categories, success or failure 3)The probability P of success is the same for each experiment 40Each experiment is independent of all the other. The first condition is the only key differentiating factor between binomial and negative binomial

Related Question