Solved – equivalent of t-test for Binomial/Poisson variables

p-valuepoisson distributionskellam-distributiont-test

I have to try to estimate and explain conversion rates that can be extremely low, on limited dataset.

Because I have very few observations, a normal framework would give me a poor estimate, because the population times the convertion rate is too small for my binomial laws to converge towards normal laws.

Thus, I was wondering what kind of test I could apply to compare these ?

==> The question I need to answer : How confident are we that A conversion rate is higher than B ?

I'm scared to use a t-stat because I don't know how close we are from having converged to a normal framework, a typical example would be :

sample A = 100 000 tries, 20 successes
sample B = 100 000 tries, 15 successes

We assume Success(A) and Success(B) are independent binomial distibutions of parameters 100 000 and lambda(A) (resp. lambda(B) )

I thought of several variants :

I was thinking of setting H0 = {lambda(A)=lambda(B)=Average conversion of both}
and testing for p-value = P(Success(A)-Success(B) > observed value), and approximating A and B as Poisson
In my example, in H0, lambda(A)=lambda(B)=0.000175, and Success(A)-Success(B) is a Skellam distribution . However, is there a way to compute a repartition funciton ? Is my hypothesis on the average conversion a bit of an exaggeration ?

-> I guess I could also look for the lambda that maximizes the p-value, but it is even more complicated to solve theoretically

-> I also wondered if I should use unilateral or bilateral confidence interval

Basically, I'm having trouble adapting the t-stat method to a non homoskedastic and non continuous variable, so I'm wondering fundamental questions about p-value.

Any source on this (i.e. what happens before limit central theorem comes into play) would also be welcome.

First post in here, don't hesitate to tell me if another exchange is more suitable for my question.

Best Answer

In statistical terms, you observe two independent Binomial random variables $X_1 \sim \text{Bin}(n_1,p_1)$ and $X_2 \sim \text{Bin}(n_2,p_2)$ and want to test the null hypothesis $H_0 : p_1=p_2$. Fisher's exact test is appropriate here. In your example you have $n_1=n_2=100000$ and observe $X_1=20$ and $X_2=15$. The P-value can be computed in R as follows:

fisher.test(matrix(c(20,15,100000-20,100000-15),2,2))

giving $P=0.4995$ in your example. Since the number of trials (100000) in each case is large compared to the number of successes, the related test for Poisson random variables gives practically the same result:

poisson.test(c(20,15))

giving $P=0.4996$.

Edit: These computations are based on a two-sided alternative but can easily be adapted if a one-sided test is desired.

Related Question