I have two groups of people. One group were shown an ad and the other were not. I know in aggregate that the treatment group spent \$799 and the control group spent \$412, so overall it seems people responded. Both groups had 1000 people in them.
I am not sure how to translate this to verify whether this is statistically significant. I was going to do a chi squared test with treatment group observed values and use the expected value of the average spend per customer in the control group (\$0.412).
Is this the correct way to choose an expected value?
Best Answer
Comment: There is one sense in which seeing the ad seems to have prompted a greater response. That is the response rate (rather than the dollar amount spent).
There are various tests of $H_0: p_1 = p_2$ vs. $H_a: p_1 > p_2,$ where the $p_i$ are the respective response rates.
Output from Minitab for two such tests is shown below.
The first test (P-value 0.021) uses a normal approximation of the binomial proportions, which should give reasonably accurate results for such large samples. Fisher's exact test (P-value 0.011) uses a hypergeometric distribution. Both tests are significant.
If looking at response rates is of interest to you, you can find particulars in an elementary applied statistics text or online.
As @NickCox suggests, we would have to know the $(50+29 = 79)$ individual dollar amounts in order confidently to explore two-sample tests for amount spent. However, it seems each purchase in each group averages around $15, so looking at response rates might tell you what you really want to know about the effect of exposure to the ad.
Note: Just as an experiment, I simulated a dataset assuming that the 50 nonzero sales in Group 1 are distributed $\mathsf{Norm}(\mu=16, \sigma=3)$ and that the 29 nonzero sales in Group 2 are distributed $\mathsf{Norm}(\mu=14, \sigma=3).$ Dollar amounts were rounded to integers:
A Welch two-sample t test in R gave P-value 0.0033, as follows:
In spite of all the zeros and additional ties that result from rounding dollars to integers, a one-sided, two-sample Wilcoxon (rank sum) test in R gave P-value 0.007. with no error messages.
A one-sided permutation test with the pooled 2-sample t statistic as metric (but not assuming normality) gave P-value about 0.003.
Unless your non-zero dollar values are much different from my simulated ones, I do not expect a problem finding a valid two-sample test to compare dollar amounts.