Probably you have solved it, but I let this here to help anyone that is lost, like I was. The difference is the Null Hypothesis.
scipy.stats.chi2_contingency, from Scipy:
"Chi-square test of independence of variables in a contingency table"
In this test you are testing if there is there is relationship between two or more variable. This is called chi-square test for independence, also called Pearson's chi-square test or the chi-square test of association. In this test you are testing the association between two or more variable. The null hupothesis, in your example, is "there is no effect of group in choosing the equipment to use".
In scipy.stats.chisquare from Scipy
"The chi square test tests the null hypothesis that the categorical
data has the given frequencies."
Here you are comparing if there is difference between an observation and an expected frequency. So, the null hupothesis, is that "there isn't any difference between observed and the expected". Here, the test is used to compare the observed sample distribution with the expected probability distribution. This is named Chi-Square goodness of fit test
I'm trying to figure out how to test if variables that I think follow a Poisson distribution are independent (which is a requirement of the Poisson distribution)
You seem to be conflating the Poisson process with the Poisson distribution here.
For the Poisson process to really be a Poisson process, it has to be independent, and then if the rate is constant (and the remaining assumptions are true), you'll get Poisson observations (the count of events in the process over a given interval will be Poisson). However that doesn't imply that observations that are Poisson-distributed are automatically independent; it's possible to construct situations where Poisson data over time is serially dependent, say (and it's possible to find data where such a dependent-Poisson model is a plausible description of the data).
But, for Xi to be Poisson-distributed, they need to be independent.
Indeed, we can't even read this as confusion over Poisson process vs Poisson distribution -- this is just not true.
In most examples of Poisson-related work on time series (e.g., http://pymc-devs.github.io/pymc/tutorial.html),
Most you're aware of, perhaps. I don't have data from which to conclude what might be most common.
the variables are obviously independent: mining disasters, car crashes, text messages received, etc.
Actually, I can see several ways that there might be dependence in those (partly depending on what you condition on). Independence might be a plausible model, but that doesn't make it necessarily true.
In an example like mine, they might not be; for example, a large number of shoppers going into a store at once could motivate others to go into the store as well.
And, for example, one mine disaster might prompt safety reviews in others, making them negatively dependent (a disaster might well make further disasters less likely for a period), and similarly there are possible dependencies for the others. I don't see how your assertion that they're obviously independent is justified.
How do I test for independence, beyond making assumptions about the situation?
But you cited an obvious assumption to consider - that there could be serial dependence. That's reasonably easy to test for, in a number of ways.
If you want to test for every possible kind of dependence you have some problems, but there's no great difficulty checking for simple forms of serial dependence.
The first thing to consider would be a plot of $y_t$ vs $y_{t-1}$. Here are data that are (by construction) identically distributed Poisson (all with $\lambda=240$):
$\ $
The sample Pearson correlation is 0.543 (the Spearman correlation is 0.526, Kendall's tau is 0.371). You might - as an example - formally test a null of independence against the alternative of lag-1 dependence via a permutation test, but if you're just trying to check model assumptions or find a good model, formal testing is probably not the most effective choice.
Best Answer
At that sort of sample size and probability, the C-test should be okay.
Since this is just a binomial test, you can test it using a binomial test in Scipy.
x
is 127,n
is 127+316 andp
is 35465/(35465+79076).There are other tests for this situation. Vanilla R offers an exact Poisson test for example.
See also:
Krishnamoorthy K. and J. Thomson (2004),
"A more powerful test for comparing two Poisson means,"
Journal of Statistical Planning and Inference, 119, pp 23–35
which indicates that an unconditional test will tend to have greater power (as is generally the case for unconditional tests in this sort of situation).
It is, of course, not exact.
In response to comments:
You're now taking $X$ to be binomial, not Poisson.
I followed your assertion that $X$ was Poisson (by which the 35465 and 79076 are simply exposure) and showed how to do the corresponding C-test.
If you want to treat the 35465 & 79076 as numbers of trials you don't need the C-test at all. You just do a straight two-sample binomial test on the trials and successes you have.
Like so (this is in R):
Incidentally, this p-value is very similar to what you get with the C-test.