Solved – Maximum likelihood estimation of a Poisson binomial distribution

bernoulli-processbinomial distributionindependencepoisson-binomial-distribution

According to Wikipedia,

the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed

In other words, the Bernoulli trials have different probabilities $p_{1},\dots,p_{n}$.

Suppose I have a task where the participants are presented stimuli that fall into either category A or category B. The experiment goes on for $N$ trials, and for each trial the participants report which category is the correct one.

The trials are independent but there is probably a different probability for each trial that they give a correct response. If I have a large number of trials (e.g. 500), is it possible to find meaningful maximum likelihood estimates of the distribution parameters (the $p_{i}$'s)?

How many participants would I need?

I would be interested to test whether the ordinary Binomial or the Poisson binomial is a better fit to the data I have at hand. However, I was wondering whether it is theoretically possible to find a plausible answer to this question.

Best Answer

As I see your problem, you have $K$ individuals completing $N$ trials, that result in binary outcomes (success or failure). So you are dealing with $N\times K$ random variables $X_{ij}$. You are interested in computing probabilities of success for each trial $p_i$.

So the first thing to notice is that you assume in here that participants are exchangeable, so there is no more or less skilled participants - is this assumption correct for your data? The first thing that comes to my mind is that for the kind of data as yours Item Response Theory models would be better suited. Using such models you could estimate model assuming that the tasks vary in their difficulty and that the participants vary in their skills (e.g. using simple Rasch model).

But let's stick to what you said and assume that participants are exchangeable and you are interested only in probabilities per task. As others already noticed, you are not dealing here with Poisson binomial distribution, since we use such distribution for sums of successes from $N$ independent Bernoulli trials with different probabilities $p_1,\dots,p_N$. For this you would have to introduce new random variable defined as $Y_j = \sum_{i=1}^N X_{ij}$, i.e. total number of successes per participant. As noted by Xi'an, parameters $p_i$ are not identifiable in here and if you have data on result of each trial by each participant, it is better to think about it as of Bernoulli variables parametrized by $p_i$.

From what you are saying, you would like to test if Poisson binomial distribution fits the data better then ordinary binomial. As I read it, you want to test if the trials differ in probabilities of success, versus if probability of success is the same for each trial (since $p_1 = p_2 = \dots = p_N$ is an ordinary binomial distribution). Saying it other way around, your null hypothesis would be that not only participants, but also trials are exchangeable, so identifying particular trials tells us nothing about the data since they all have the same probability of success. If we have null hypothesis stated like this, it instantly leads to permutation test, where you would randomly "shuffle" your $N\times K$ matrix and compare the statistic computed on such permuted data to statistic computed on unshuffled data. For the statistic to compare I would use combined variance's

$$ \sum_{i=1}^N \hat p_i(1-\hat p_i) $$

where $\hat p_i$ is probability of success estimated from the data for the $i$-th participant (columnwise means). In case of equal $\hat p_i$'s would reduce to $N \hat p(1-\hat p)$.

To illustrate it I conducted a simulation with three different scenarios: (a) all $p_i = 0.5$, (b) they come from Beta(1000,1000) distribution, (c) they come from uniform distribution. In the first case $p_i$'s are all equal; in the second case they are "random", but grouped around common mean; and in the third case they are totally "random". The plot below shows distributions of test statistic under null hypothesis (i.e. computed on shuffled data), red lines show the variance computed on unshuffled data. As you can see, the combined variances of unshuffled data crosses with the null distribution in the first case (test is not significant) and slightly approach the distribution in the second case (significant difference from the null). In the third case the red line is even not visible on the plot since it is that far from the null distribution (significant difference).

So while the test correctly identified "all the same $p_i$'s" scenario (a), but didn't find the "similar but not the same" scenario (b) to fulfill the criteria of equality. The question is if you want to be that strict about it? Nonetheless, this is a direct implementation of test for your hypothesis. It compares the basic criteria that would enable you to distinguish ordinary binomial from Poisson binomial (their variances).

There is of course lot's of other possibilities, more or less appropriate depending on your problem, e.g.: comparing the individual confidence intervals, pairwise $z$-tests, ANOVA, using some kind of logistic regression model etc. However, as I said before, this sounds rather like a problem for Item Response Theory models and assuming equal skills of participants sounds risky.

Related Solutions

Solved – Hypothesis testing on Poisson Binomial distribution

You can just use the number of successes itself as the test statistic.

If you want a one-tailed test this is simple (and I expect you will). For a two tailed test, because of the asymmetry, the usual approach would be to allocate $\alpha/2$ to each tail and compute a rejection region that way. There's some variation in exactly how this gets implemented since you can't get exactly $\alpha/2$ in either tail, but p-values are slightly complicated (you base them on whatever rules you come up with for how your rejection region would actually be computed).

1) If the $p_i$s are known and all are small, you can use a Poisson approximation for the number of successes.

If the p's are all pretty small, the mean of the sum is close to the variance of the sum, so one quick assessment of whether they're small enough is to compare the variance to the mean; if it's pretty close, this will normally work fairly well. What constitutes 'close' depends on your criteria, you really need to calibrate it yourself. As a very rough rule of thumb I'd suggest variance/mean above 0.9, but you may want it higher. If you want a more precise rule about when the Poisson will work, Le Cam's theorem bounds the sum of absolute deviations between the true probability function and the Poisson approximation (though this doesn't necessarily tell you how well it performs in the part of the tail that determines the accuracy of a nominal significance level).

2) If the $p_i$ are not necessarily small but there are a lot of them you can use a normal approximation (perhaps with continuity correction). Assuming the p's are typically less than 0.5, a simple rule would be if the coefficient of variation for the number of successes is sufficiently* small, the normal approximation should be fine.

* what constitutes 'sufficiently' depends on your criteria, again, you really need to calibrate it yourself. But as a very rough rule of thumb, I'd suggest that the reciprocal of the coefficient of variation should be more than 4 if you want roughly accurate p-values around 0.05 (one tailed), more than 5 if you want roughly accurate p-values around the 2.5% level, and more than 7 if you want reasonable accuracy around the 1% level. Those are pretty rough, though, you may well want more accuracy than that.

3) If there's very little variation in the $p_i$'s, you could use a binomial approximation.

4) You can simulate the distribution of the number of successes directly.

5) The probability function for the convolution of the Bernoulli($p_i$) variables is fairly easy to compute numerically. For small numbers of variables (up to a couple of dozen easily), it can be done by brute force with no difficulty. [For large numbers, you're probably better off using FFT, as it's much faster, though you'll likely hit the point where normal approximation is very accurate long before it's much of an issue.]

In each of the cases (1) to (3), you can check the quality of the approximation via simulation... but if you do that, you might as well get the p-value the same way.

Incidentally, the R package poibin offers four calculations or approximations (it does not include the Poisson). There's an associated paper

Hong, Y. (2012),
"On computing the distribution function for the Poisson binomial distribution."
Computational Statistics & Data Analysis.
http://dx.doi.org/10.1016/j.csda.2012.10.006. (Tech Report here)

where the author derives an expression based on the discrete Fourier transform.

Here's an example, using the Poisson approximation:

$p_1, ..., p_5 = 0.05, p_6, ...,p_9 = 0.10$

The mean is 0.65 and the standard deviation = $\sqrt{0.05\times 0.95\times 5 + 0.10 \times 0.90 \times 4} \approx 0.773$

The coefficient of variation rules out the normal approximation. The variance divided by the mean is 0.92, which suggests the Poisson may not do too badly.

The possible (typical range) one tailed significance levels for a Poisson(0.65) are 13.8%, 2.8%, 0.44% ... Let's say we choose 2.8%, which is to say if we see 3 or more successes we'll reject the null.

Now, the exact calculation is simply the convolution of a Binomial(5,0.05) and a Binomial(4,0.10), and this immediately is:

             0         1         2         3         4         5        6       7 ... 
exact 0.5076777 0.3592339 0.1110464 0.0196723 0.0022006 0.0001612 7.70e-06 2.0e-07  
Pois  0.5220458 0.3393298 0.1102822 0.0238945 0.0038829 0.0005048 5.47e-05 5.1e-06

As you see they're at least close-ish except in the far tail (up to X=3 say); the exact significance level for a rejection rule of 'reject if there are at least 3 successes' is about 2.2%, while the Poisson gave about 2.8%. For my purposes that would be reasonable, but your own needs may differ.

Solved – Variance of the Poisson Binomial Distribution

Think of the case where $n=2$. If $p_1 = p_2 = 0.5$, this maximizes the variance of X. If $p_1 = 0$ and $p_2 = 1$, then $X=1$ and there is no variance.

Best Answer

Related Solutions

Solved – Hypothesis testing on Poisson Binomial distribution

Solved – Variance of the Poisson Binomial Distribution

Related Question