Here's the idea: you have a hypothesis you want to test about a given population. How do you test it? You take data from a random sample, and then you determine how likely (this is the confidence level) it is that a population with that assumed hypothesis and an assumed distribution would produce such data. You decide: if this data has a probability less than, say $95$% of coming from this population, then you reject at this confidence level--so $95$% is your confidence level. How do you decide how likely it is for the data to come from a given population? You use a certain assumed distribution of the data, together with any parameters of the population that you may know.
A concrete example: You want to test the claim that the average adult male weight is $170 lbs$ . You know that adult weight is normally-distributed, with standard deviation, say, 10 pounds. You say: I will accept this hypothesis, if the sample data I get comes from this population with probability at least $95$% . How do you decide how likely the sample data is? You use the fact that the data is normally-distributed, with (population) standard deviation=$10$, and you assume the mean is $170$ . How do you determine how likely it is for the sample data to come from this population: the $z-$ value you get ( since this is a normally-distributed variable , and a table allows you to determine the probability.
So, say the average of the random sample of adult male weights is $188lbs$. Do you accept the claim that the population mean is $170$? . Well, the decision comes down to : how likely (how probable) is it that a normally-distributed variable with mean $170$ and standard deviation $10$ would produce a sample value of $188lb$? . Since you have the necessary values for the distribution, you can test how likely this value of $188$ is, in a population $N(170,10)$ by finding its $z-$ value. If this $z-$ -value is less than the critical value, then the value you obtain is less likely than your willing to accept. Otherwise, you accept.
You have a paired design. It is the same $n = 15$ students
taking the test both times. Let's call the first score for the $i$th
subject $X_i$ and the second score $Y_i.$ You want to do a
one-sample z-test of the differences $D_i = X_i - Y_i.$
You don't give the individual scores, but the averages are
$\bar D = \bar X - \bar Y.$
The null hypothesis is $H_O: \mu_D = 0$ (no different after
playing the game) and $H_a: \mu_d > 0$ (better scores after playing
the game).
The test statistic is $$Z = \frac{\bar D - 0}{\sigma/\sqrt{n}} = 1.67.$$ The critical value at the 5% level is the value $c = 1.645$ that cuts 5% from the upper tail of the standard normal curve.
Because $T = 1.67 > c = 1.645,$ you reject the null hypothesis
and conclude that the game might have enabled the students to
get better scores on the second test. (Or maybe learned something
from taking the first test!)
However, $T$ exceeds $c$ by only
a little, and evidence is not 'strong'. If you subject the
findings to a more stringent standard and test at the 1% level,
then the critical new value $c^\prime = 3.326$ that cuts 1% from
the upper tail of the standard normal distribution.
According to this more stringent standard, you do not reject
the null hypothesis.
The P-value is the probability to the right of $Z = 1.67$
under the standard normal curve. That probability is 0.47.
With the p-value, we can test at any desired level of significance.
In particular, at the 5% level, we reject because $.047 < .05 = 5\%$.
However, at the 1% level, we do not reject because $.047 > .01 = 1\%.$
In case it is useful, I pasted output below (somewhat abridged) from doing this
test in Minitab statistical software:
One-Sample Z
Test of mu = 0 vs > 0
The assumed standard deviation = 3.7
N Mean SE Mean Z P
15 1.600 0.955 1.67 0.047
Best Answer
For a two-tailed binomial exact test, the rejection region depends on how you allocate the $\alpha$. You suggest an equal allocation of $\alpha/2$ to each tail, thus you require the maximum $x_1$ such that $\Pr[X \le x_1 \mid H_0] \le \alpha/2$, and the minimum $x_2$ such that $\Pr[X \ge x_2 \mid H_0] \le \alpha/2$. For $n = 20$, as you pointed out, $\Pr[X = 0] \approx 0.0115292$ but $\Pr[X \le 1] \approx 0.0691753 > \alpha/2$. We also have $\Pr[X \ge 8] \approx 0.0321427$, but $\Pr[X \ge 7] \approx 0.0866925 > \alpha/2$. So the rejection region for an equal-tailed allocation is $$(X = 0) \cup (X \ge 8).$$ What is interesting is that the rejection region $(X = 0) \cup (X \ge 7)$ has an overall Type I error of $0.0982217 < \alpha$, but it isn't equal allocation. Similarly we could also choose $(x \le 1) \cup (X \ge 9)$ with an overall Type I error $0.0791571$ but again this is unequal allocation of $\alpha$ to the tails.