The answer to your question can be best appreciated by recalling the nature of the significance level $\alpha$. This is a value that we choose that reflects our tolerance for Type I error, which in this case, is the outcome of incorrectly concluding the coin is biased in favor of heads when in fact it is not.
To see why any nontrivial hypothesis testing procedure must admit some nonzero Type I error, consider that even when a coin is perfectly fair, there is some small but nonzero probability that it could land heads in $n$ throws: specifically, this is just $\Pr[X = n \mid p = 1/2] = 2^{-n} > 0$. Yes, a large number of trials will make this a very small chance, but it is still greater than $0$. So any hypothesis test you can design that allows some chance of rejecting the statement that the coin is fair, must do so with some possibility of being wrong by random chance.
For instance, suppose you set the rejection criterion to be $X = n$, that is to say, you will claim the coin is unfair if in $n$ trials you get all heads; otherwise you say the data is inconclusive. Then the Type I error of the test is precisely $\alpha = 2^{-n}$: $$\alpha = \Pr[\text{reject } H_0 \mid H_0 \text{ true}] = \Pr[X = n \mid p = 1/2] = 2^{-n}.$$ Now for a "large" $n$, this might be an extremely strict test--if $n = 20$, the chance you would erroneously reject $H_0$ when the coin is fair is approximately $9.5 \times 10^{-7}$, or less than $1$ in a million. But our intuition suggests that this is perhaps too strict. After all, even if the coin is severely biased towards heads, say $p = 0.9$, the chance that all $20$ flips will be heads is only $$\Pr[X = 20 \mid p = 0.9] = (0.9)^{20} \approx 0.121557.$$ This quantity we just computed is called the power of the test for the case $p = 0.9$, and represents the probability of correctly rejecting the null when it is false. Having only a $12\%$ chance to do this when the coin is turning up heads $90\%$ of the time seems, well...suboptimal. But that's the price we pay for having such a tiny chance of being wrong about the coin being unfair.
If we are willing to accept a higher chance of Type I error, then we can construct a more reasonable test. And this brings us back to the beginning of this answer: if we are willing to accept, say, a $5\%$ chance of wrongly saying the coin is biased toward heads when it is in fact fair, then $\alpha = 0.05$, and we can optimize our rejection condition so that when the coin is fair, we will only have at most a $5\%$ chance that the test will reject the null. To do this for $n = 20$, we need to find a value, say $x_{\text{crit}}$, for which if we observe at least $x_{\text{crit}}$ heads, we say we saw too many to reasonably maintain that the coin is fair. But this choice must guarantee that a fair coin would only get at least that many heads at most $5\%$ of the time: that is, we require
$$\alpha = \Pr[\text{reject } H_0 \mid H_0 \text{ true}] = \Pr[x_{\text{crit}} \le X \le n \mid p = 1/2].$$ Note that the rejection region is not $X = x_{\text{crit}}$. This is because if we want to conclude that the coin is unfair if we saw $X = 17$ heads out of $n = 20$ trials, certainly seeing $18$, $19$, or $20$ heads should also lead us to conclude it's unfair. In other words, rejecting the coin is fair when $X = x_{\text{crit}}$ should also make us reject when $X > x_{\text{crit}}$.
So how do we calculate this value when $n = 20$ and $\alpha = 0.05$? We can construct a table:
$$\begin{array}{c|c|c}
x & \Pr[X = x \mid p = 1/2] & \Pr[x \le X \le 20 \mid p = 1/2] \\
\hline
20 & 9.5367 \times 10^{-7} & 9.5367 \times 10^{-7} \\
19 & 0.0000190735 & 0.0000200272 \\
18 & 0.000181198 & 0.000201225 \\
17 & 0.00108719 & 0.00128841 \\
16 & 0.00462055 & 0.00590897 \\
15 & 0.0147858 & 0.0206947 \\
14 & 0.0369644 & 0.0576591 \\
\end{array}$$
Notice that I started at the upper end of the range, and that the third column is just the sum of the values in the second column up to the same row. Now we can read off the value of $x_{\text{crit}} = 15$ as the smallest value in the first column that corresponds to a value in the third column that does not exceed $\alpha = 0.05$. So our rejection region for this test is $15 \le X \le 20$.
At this point, you should see how our discussion answers your question about why we don't consider individual outcomes. In your example with $n = 8$, the probability $$\Pr[X = 7 \mid p = 1/2] = 0.3125,$$ which by itself might be smaller than $\alpha = 0.05$, but it is total probability of being in the rejection region that we require to be limited by $\alpha$, not any specific outcome in that rejection region.
As for your second question, this should also be evident from our discussion. The alternative hypothesis that the coin is biased toward heads means that if $X$ counts the number of heads, then $X \ge x_{\text{crit}}$ is the form of the rejection region, because the more heads we observe, the more evidence we have that the coin is biased toward heads. However, if our statistic counted the number of tails, then the rejection region would need to be of the form $X \le x_{\text{crit}}$, because the fewer tails we see means the more heads we see.
Finally, we should point out that a funny thing happens because of the discrete nature of the binomial distribution. If our tolerance for Type I error were $3\%$ instead of $5\%$, the table above would give us the same rejection region. This reflects the inability to observe an outcome that is somewhere between $14$ and $15$ heads, so we cannot "spend" our Type I error tolerance efficiently unless it happens to be exactly one of those numbers in the third column of the table.
As an exercise, what is the power of the test when $p = 0.9$ for the rejection region $15 \le X \le 20$; i.e., what is $$\Pr[15 \le X \le 20 \mid p = 0.9]?$$ And for the same region, what is the power of the test when $p = 0.51$? Why are these probabilities different?
Best Answer
That 5% is commonly known as the Critical Region. Any outcome in that region lies so far away from the expectation based on the null hypothesis that one is to conclude that the p-value in the nullhypothesis is set too low. For that reason you reject the null hypothesis.