Solved – Testing for independence between variables in poisson distribution

independencepoisson distribution

I'm trying to figure out how to test if variables that I think follow a Poisson distribution are independent (which is a requirement of the Poisson distribution) — or it it matters.

Let's assume I count the number of shoppers, X, that come into my store in every 6-hour timeframe. After a month of doing that, I want to determine which of those timeframes saw a significantly smaller or larger number of shoppers. I can calculate a lambda (average rate of X/timeframe), plot a time series, and compare the result to a poisson distribution with the same lambda — it's likely going to be similar.

But, for Xi to be Poisson-distributed, they need to be independent. In most examples of Poisson-related work on time series (e.g., http://pymc-devs.github.io/pymc/tutorial.html), the variables are more obviously independent: mining disasters, car crashes, text messages received, etc. In an example like mine, they might not be; for example, a large number of shoppers going into a store at once could motivate others to go into the store as well.

How do I test for independence, beyond making assumptions about the situation? A chi-square test seems most obvious, but there's no degrees of freedom to run a chi-square test.

[It probably doesn't matter for this question, but my goal is to run a very similar analysis to the one detailed in the pymc link — to set up a Markov Chain model that shows when a likely change in frequency/lambda occurred. That model has the same independence requirement though.]

Best Answer

I'm trying to figure out how to test if variables that I think follow a Poisson distribution are independent (which is a requirement of the Poisson distribution)

You seem to be conflating the Poisson process with the Poisson distribution here.

For the Poisson process to really be a Poisson process, it has to be independent, and then if the rate is constant (and the remaining assumptions are true), you'll get Poisson observations (the count of events in the process over a given interval will be Poisson). However that doesn't imply that observations that are Poisson-distributed are automatically independent; it's possible to construct situations where Poisson data over time is serially dependent, say (and it's possible to find data where such a dependent-Poisson model is a plausible description of the data).

But, for Xi to be Poisson-distributed, they need to be independent.

Indeed, we can't even read this as confusion over Poisson process vs Poisson distribution -- this is just not true.

In most examples of Poisson-related work on time series (e.g., http://pymc-devs.github.io/pymc/tutorial.html),

Most you're aware of, perhaps. I don't have data from which to conclude what might be most common.

the variables are obviously independent: mining disasters, car crashes, text messages received, etc.

Actually, I can see several ways that there might be dependence in those (partly depending on what you condition on). Independence might be a plausible model, but that doesn't make it necessarily true.

In an example like mine, they might not be; for example, a large number of shoppers going into a store at once could motivate others to go into the store as well.

And, for example, one mine disaster might prompt safety reviews in others, making them negatively dependent (a disaster might well make further disasters less likely for a period), and similarly there are possible dependencies for the others. I don't see how your assertion that they're obviously independent is justified.

How do I test for independence, beyond making assumptions about the situation?

But you cited an obvious assumption to consider - that there could be serial dependence. That's reasonably easy to test for, in a number of ways.

If you want to test for every possible kind of dependence you have some problems, but there's no great difficulty checking for simple forms of serial dependence.

The first thing to consider would be a plot of $y_t$ vs $y_{t-1}$. Here are data that are (by construction) identically distributed Poisson (all with $\lambda=240$):

$\ $serially correlated Poisson data

The sample Pearson correlation is 0.543 (the Spearman correlation is 0.526, Kendall's tau is 0.371). You might - as an example - formally test a null of independence against the alternative of lag-1 dependence via a permutation test, but if you're just trying to check model assumptions or find a good model, formal testing is probably not the most effective choice.

Related Question