I have a scenario where I have an A/B test of a webpage. I'm trying to figure out using statistical analysis if more people spent time (in minutes, which is a float) on page B (the new page), than page A (the old page).
I have formulated the below hypotheses:
Null Hypothesis: Time spent on the new page is equal to the time spent on the old page
Alternative Hypothesis: Time spent on the new page is greater than the time spent on the old page
The characteristics of my two samples (A and B) are:
- Both samples come from independent populations
- Both samples were randomly selected
- Both samples have the same number of values (50)
- The sample standard deviation can be calculated. I don't know the population standard deviation, and I don't know the size of the population.
I believe the statistical test to use is a two sample z test for comparing means. I am looking to compute in Python, and I know the Python statsmodels library has ztest, which can be imported as
from statsmodels.stats.weightstats import ztest
When I look at the statsmodels page (https://www.statsmodels.org/dev/generated/statsmodels.stats.weightstats.ztest.html). For reference, below are the parameters the documentation says ztest takes:
statsmodels.stats.weightstats.ztest(x1, x2=None, value=0, alternative='two-sided', usevar='pooled', ddof=1.0)
I'm unclear of the values that go in some of the parameters
-
Is x1 an array of all the appropriate values (in this case the time samples collected for the new page based on how my null and alternate hypotheses are written)?
-
Is x2 an array of all the appropriate values (in this case the time samples collected for the old page based on how my null and alternate hypotheses are written)?
-
What is value and how is it calculated? I read the documentation, and I was unclear.
I do know alternative will be "greater" based on how my alternative hypothesis is written.
Or since I don't explicitly know the standard deviation up front, would this be a 2 sample t test? Otherwise, I would need to look at the Python SciPy library.
Best Answer
In addition to what you say in your first set of 'bullets' I assume that both samples
x1
andx2
are from (nearly) normal populations, with equal population variances.Although some texts and online pages seem to muddle the decision whether to use a two-sample z test or a 2-sample t test, this decision is quite easy to make:
if the common population variance $\sigma^2 = \sigma_1^2 = \sigma_2^2$ is known, then use a 2-sample z test. If $\sigma^2$ is unknown and estimated from the samples, then use a 2-sample t test.
In addition, you may use a pooled 2-sample t test only if you know that the two populations have the same variance, as assumed above. If you do not know whether the two populations have the same variance, then use a Welch 2-sample t test (which does not assume equal variances).
Here are fictitious samples
x1
andx2
both of size $n_1=n_2 = 50,$ sampled using R, from populations with possibly different population means $\mu_1, \mu_2$ but with the same population variance $\sigma^2.$Here are descriptions of the samples:
The sample means are $\bar X_1 = 69.53, \bar X_2 = 75.78$ differ. The question is whether they are enough different (relative to sample sizes and variance) to be considered significantly different at the 5% level, in a statistical sense.
Here is output from a two-sample pooled t test in R: Because the P-value $0.004 < 0.04 = 5\%$ they are significantly different. [Note the parameter
var.eq=T
, to perform a pooled test. Unless otherwise stated, a two-tailed test of the null hypothesis that means are equal is performed.]If we did not know that that population variances are equal, we would have done a Welch test, as below, which gives similar results for my fictitious data.
I hope you can figure out how to run pooled and Welch 2-sample t tests using Python. P-values should be almost exactly the same as my results from R.
Note: In an actual application one would not know the exact population parameters, but here is R code used to sample my fictitious data.