You are right - two assumptions of the classic z-test about a mean are hardly ever met in practice:
- The true standard deviation $\sigma$ is known.
- The values come from a normal distribution.
In not too small samples, these assumptions are not very important and the z-test is quite fine:
- We can replace the unknown $\sigma$ by its quite precise estimate.
- Thanks to the almighty Central Limit Theorem, the test statistic of the test (standardized mean) is approximately normally distributed, even for quite "unnormal" observations.
But in small samples (e.g. just ten observations as in your example), we cannot use these backdoors. Fortunately, a refinement of the z-test, the also very famous t-test takes care of issue 1: It correctly takes the additional uncertainty of the sample standard deviation (compared to the fixed $\sigma$) into account.
As a summary: In practice, whenever you can choose between z-test and t-test, always take the t-test. For large sample sizes, their results agree though and we could use the more simple z-test.
Final warnings:
- Durations are usually not normally distributed but rather right skewed (even in your small sample there is such tendency).
- Not rejecting the null hypothesis does not mean that it holds.
I think you should start with asking them what they think it really means to say about a person that he or she is able to tell the difference between coca-cola and pepsi. What can such a person do that others can not do?
Most of them will not have any such definition, and will not be able to produce one if asked. However, a meaning of that phrase is what statistics gives us, and that is what you can bring with your "a taste for statistics" class.
One of the points of statistics is to give an exact answer to the question: "what does it mean to say of someone that he or she is able to tell the difference between coca-cola and pepsi"
The answer is: he or she is better than a guessing-machine to classify cups in a blind test. The guessing machine can not tell the difference, it simply guesses all the time. The guessing machine is a useful invention for us because we know that it does not have the ability. The results of the guessing machine are useful because they show what we should expect from someone who lacks the ability that we test for.
To test whether a person is able to tell the difference between coca-cola and pepsi, one must compare his or hers classifications of cups in a blind test to the classification that a guessing machine would do. Only if s/he is better than the guessing machine, s/he is able to tell the difference.
How, then, do you determine whether one result is better than another result? What if they are almost the same?
If two persons classify a small number of cups, it's not really fair to say that one is better than the other if the results are almost the same. Perhaps the winner just happened to be lucky today, and the results would have been reversed if the competition was repeated tomorrow?
If we are to have a trustworthy result, it can not be based on a tiny number of classifications, because then chance can decide the result. Remember, you don't have to be perfect to have the ability, you just have to be better than the guessing machine. In fact, if the number of classifications is too small, not even a person that always identifies coca-cola correctly will be able to show that s/he is better than the guessing machine. For example, if there is only one cup to classify, even the guessing machine will have 50 per cent chance to classify completely correct. That's not good, because that means that in 50 per cent of the trials, we would falsely conclude that a good coca-cola identifier is no better than the guessing machine. Very unfair.
The more cups there are to classify, the more opportunities for the guessing machine's inability to be revealed and the more opportunities for the good coca-cola identifier to show off.
10 cups might be a good place to start. How many right answers must a human then have to show that he or she is better than the machine?
Ask them what they would guess.
Then let them use the machine and find out how good it is, i.e. let all pupils generate a series of ten guesses, eg. using a dice or a random generator on the smartphone. To be pedagogical, you should prepare a series of ten right answers, which the guesses are to be evaluated against.
Record all the results on the board. Print the sorted results on the board. Explain that a human would have to be better than 95 per cent of those results before a statistician would acknowledge his or her ability to tell the difference between coca-cola and pepsi. Draw the line that separates the 95% worst results from the top 5% results.
Then, let a few pupils try classifying 10 cups. By now the pupils should know how many right they need to have to prove that they can tell the difference.
All this is not really doable in 10 minutes though.
Best Answer
As Procrastinator pointed out the test statistic is not significantly large. Don't just look at the number and assume that it is large enough to reject! The chi square statistic has 29 degrees of freedom. It has a mean of 29 and a variance of 58. So the value of the test statistic being 24.54 is not large at all and with the estimate so close to 0.4, this is what we would expect.