Solved – 2 sample Z test for non normally distributed data

hypothesis testingz-test

I'm trying to analyse the impact of a marketing campaign and I'm looking to identify the most appropriate statistical test. I have one group of customers who received the campaign (approx 100,000 people) and a random control (approx 50,000 customers) who received nothing. I want to measure whether there is any difference in the average spend from both groups in the period following the campaign launch.

The vast majority of customers in both groups will spend nothing after the campaign. Only approximately 2% of the customers in the groups will spend anything. So the average spend variable in both groups is not normally distributed – it will have a huge tail at 0, corresponding to the large number of people who will spend nothing.

As the population sizes are very large (100,000 and 50,000) can I just use a 2 sample z test as normal due to the central limit theorem? Or will the fact that the variable being tested is highly skewed mean that a 2 sample z test is invalid?

Best Answer

As noted in a comment, z-test is not appropriate in this case, since it requires a normal distribution and your data is obviously not normally distributed.

EDIT: When can you use z-test? It is true that many test statistics may approach normality with large number of samples. Now the question is whether your mean amount spent will approach a normal distribution given the large number of your samples. The way you have described (with majority, 98% of individuals spending nothing), it might be contested. I am at heart an experimentalist, so here is what I'd do: draw samples of 10,000 from your data and see whether the means are normally distributed on a q-q plot. If your data is good enough at 10K samples, it will be even better at 50K.

One possibility is, of course, a non-parametric alternative such as a U-test. With these numbers, you could also safely use a randomization approach.

However, it might be that whatever the obtained p-value, your results will not be very meaningful – precisely because of the large sample size, which will allow to detect even very small effects. If the amount spent is, say, in \$10-20 range, would an average difference of \$0.05 make any difference?

In other words, your statistical test may be overpowered.

Rather than focusing on the test, you could focus on the effect size – or actual difference (and estimate its confidence intervals).

Another problem with your approach is that very often the amount spent is not a continuous variable; for example, we are talking about a price list which contains a finite number of prices (e.g., \$9.99, \$19.99 and \$29.99, and no-one ever spends \$12.75). If that is the case, maybe comparing the frequencies of individuals who choose a particular option (or none at all) using a χ² test or something similar might be a better choice.

Related Question