Here the natural null-hypothesis $H_0$ is that the coin is unbiased, that is, that the probability $p$ of a head is equal to $1/2$. The most reasonable alternate hypothesis $H_1$ is that $p\ne 1/2$, though one could make a case for the one-sided alternate hypothesis $p>1/2$.
We need to choose the significance level of the test. That's up to you. Two traditional numbers are $5$% and $1$%.
Suppose that the null hypothesis holds. Then the number of heads has *binomial distribution with mean $(900)(1/2)=450$, and standard deviation $\sqrt{(900)(1/2)(1/2)}=15$.
The probability that in tossing a fair coin the number of heads differs from $450$ by $40$ or more (in either direction) is, by symmetry,
$$2\sum_{k=490}^{900} \binom{900}{k}\left(\frac{1}{2}\right)^{900}.$$
This is not practical to compute by hand, but Wolfram Alpha gives an answer of roughly $0.008419$.
Thus, if the coin was unbiased, then a number of heads that differs from $450$ by $40$ or more would be pretty unlikely. It would have probability less than $1$%. so at the $1$% significance level, we reject the null hypothesis.
We can also use the normal approximation to the binomial to estimate the probability that the number of heads is $\ge 490$ or $\le 410$ under the null hypothesis $p=1/2$. Our normal has mean $450$ and variance $15$ is $\ge 490$ with probability the probability that a standard normal is $\ge 40/15$. From tables for the normal, this is about $0.0039$. Double to take the left tail into account. We get about $0.0078$, fairly close to the value given by Wolfram Alpha, and under $1$\%. So if we use $1$\% as our level of significance, again we reject the null hypothesis $H_0$.
Comments: $1$. In the normal approximation to the binomial, we get a better approximation to the probability that the binomial is $\ge 490$ by calculating the probability that the normal is $\ge 489.5$. If you want to look it up, this is the continuity correction. If we use the normal approximation with continuity correction, we find that the probability of $490$ or more or $410$ or fewer heads is about $0.008468$, quite close to the "exact" answer provided by Wolfram Alpha. Thus we can find a very accurate estimate by, as in the bad old days, using tables of the standard normal and doing the arithmetic "by hand."
$2$. Suppose that we use the somewhat less natural alternate hypothesis $p>1/2$.
If $p=1/2$, the probability of $490$ or more is about $0.00421$. Thus again at the $1$% significance level, we would reject the null hypothesis, indeed we would reject it even if we were using significance level $0.005$.
Setting a significance level is always necessary, for it is possible for a fair coin to yield say $550$ or more heads in $900$ tosses, just ridiculously unlikely.
If the guards are independent of each other and the tosses are fair then it doesn't matter which guard tossed which coin or how many times each guard tossed each coin. The results for each coin can be grouped together. Thus for the coin that you give data the grand result is 372 heads from 467 tosses (a fairly convincingly biassed coin).
Rank the coins in order of the ratio of the likelihood of the maximally likely Pr(heads) divided by the likelihood of Pr(heads)=0.5 and the owners of the coins with the 50 highest ratios are your 50 best choices of culprits.
The likelihood function you need is:
$$
L(\theta) \propto \binom{n}{h}p^h(1-p)^{n-h}
$$
where $\theta$ is the set of all possible values of $p$, Pr(heads), $n$ is the total number of tosses and $h$ is the number of heads observed. Plug in $p=\frac{372}{467}$ to get the likelihood of the most likely value of $p$ for the coin in your question and $p=0.5$ and divide the two values to get the likelihood ratio that represents the maximal strength of the evidence for that coin being biassed.
There is no need to do a significance test for this problem and so you do not need to combine P-values.
You can set criteria for how strong the evidence needs to be before you sentence a coin owner to death, or you can just kill the 50 against whom the evidence is strongest.
Best Answer
This is a variant on a standard intro stats demonstration: for homework after the first class I have assigned my students the exercise of flipping a coin 100 times and recording the results, broadly hinting that they don't really have to flip a coin and assuring them it won't be graded. Most will eschew the physical process and just write down 100 H's and T's willy-nilly. After the results are handed in at the beginning of the next class, at a glance I can reliably identify the ones who cheated. Usually there are no runs of heads or tails longer than about 4 or 5, even though in just 100 flips we ought to see a longer run that that.
This case is subtler, but one particular analysis stands out as convincing: tabulate the successive ordered pairs of results. In a series of independent flips, each of the four possible pairs HH, HT, TH, and TT should occur equally often--which would be $(300-1)/4 = 74.75$ times each, on average.
Here are the tabulations for the two series of flips:
The first is obviously far from what we might expect. In that series, an H is more than twice as likely ($102:46$) to be followed by a T than by another H; and a T, in turn, is more than twice as likely ($102:49$) to be followed by an H. In the second series, those likelihoods are nearly $1:1,$ consistent with independent flips.
A chi-squared test works well here, because all the expected counts are far greater than the threshold of 5 often quoted as a minimum. The chi-squared statistics are 38.3 and 0.085, respectively, corresponding to p-values of less than one in a billion and 77%, respectively. In other words, a table of pairs as imbalanced as the second one is to be expected (due to the randomness), but a table as imbalanced as the first happens less than one in every billion such experiments.
(NB: It has been pointed out in comments that the chi-squared test might not be applicable because these transitions are not independent: e.g., an HT can be followed only by a TT or TH. This is a legitimate concern. However, this form of dependence is extremely weak and has little appreciable effect on the null distribution of the chi-squared statistic for sequences as long as $300.$ In fact, the chi-squared distribution is a great approximation to the null sampling distribution even for sequences as short as $21,$ where the counts of the $21-1=20$ transitions that occur are expected to be $20/4=5$ of each type.)
If you know nothing about chi-squared tests, or even if you do but don't want to program the chi-square quantile function to compute a p-value, you can achieve a similar result. First develop a way to quantify the degree of imbalance in a $2\times 2$ table like this. (There are many ways, but all the reasonable ones are equivalent.) Then generate, say, a few hundred such tables randomly (by flipping coins--in the computer, of course!). Compare the imbalances of these two tables to the range of imbalances generated randomly. You will find the first sequence is far outside the range while the second is squarely within it.
This figure summarizes such a simulation using the chi-squared statistic as the measure of imbalance. Both panels show the same results: one on the original scale and the other on a log scale. The two dashed vertical lines in each panel show the chi-squared statistics for Series 1 (right) and Series 2 (left). The red curve is the $\chi^2(1)$ density. It fits the simulations extremely well at the right (higher values). The discrepancies for low values occur because this statistic has a discrete distribution which cannot be well approximated by any continuous distribution where it takes on small values -- but for our purposes that makes no difference at all.