Binomial Distribution – How to Determine the Real and Fake Sequence of Coin Flips

binomial distribution

This is an interesting problem I came across. I'm attempting to write a Python program to get a solution to it; however, I'm not sure how to proceed. So far, I know that I would expect the counts of heads to follow a binomial, and length of runs (of tails, heads, or both) to follow a geometric.

Below are two sequences of 300 “coin flips” (H for heads, T for
tails). One of these is a true sequence of 300 independent flips of a
fair coin. The other was generated by a person typing out H’s and T’s
and trying to seem random. Which sequence is truly composed of coin
flips?

Sequence 1:

TTHHTHTTHTTTHTTTHTTTHTTHTHHTHHTHTHHTTTHHTHTHTTHTHH
TTHTHHTHTTTHHTTHHTTHHHTHHTHTTHTHTTHHTHHHTTHTHTTTHH
TTHTHTHTHTHTTHTHTHHHTTHTHTHHTHHHTHTHTTHTTHHTHTHTHT
THHTTHTHTTHHHTHTHTHTTHTTHHTTHTHHTHHHTTHHTHTTHTHTHT
HTHTHTHHHTHTHTHTHHTHHTHTHTTHTTTHHTHTTTHTHHTHHHHTTT
HHTHTHTHTHHHTTHHTHTTTHTHHTHTHTHHTHTTHTTHTHHTHTHTTT

Sequence 2:

HTHHHTHTTHHTTTTTTTTHHHTTTHHTTTTHHTTHHHTTHTHTTTTTTH
THTTTTHHHHTHTHTTHTTTHTTHTTTTHTHHTHHHHTTTTTHHHHTHHH
TTTTHTHTTHHHHTHHHHHHHHTTHHTHHTHHHHHHHTTHTHTTTHHTTT
THTHHTTHTTHTHTHTTHHHHHTTHTTTHTHTHHTTTTHTTTTTHHTHTH
HHHTTTTHTHHHTHHTHTHTHTHHHTHTTHHHTHHHHHHTHHHTHTTTHH
HTTTHHTHTTHHTHHHTHTTHTTHTTTHHTHTHTTTTHTHTHTTHTHTHT

Both sequences have 148 heads, two
less than the expected number for a 0.5 probability of heads.

Best Answer

This is a variant on a standard intro stats demonstration: for homework after the first class I have assigned my students the exercise of flipping a coin 100 times and recording the results, broadly hinting that they don't really have to flip a coin and assuring them it won't be graded. Most will eschew the physical process and just write down 100 H's and T's willy-nilly. After the results are handed in at the beginning of the next class, at a glance I can reliably identify the ones who cheated. Usually there are no runs of heads or tails longer than about 4 or 5, even though in just 100 flips we ought to see a longer run that that.

This case is subtler, but one particular analysis stands out as convincing: tabulate the successive ordered pairs of results. In a series of independent flips, each of the four possible pairs HH, HT, TH, and TT should occur equally often--which would be $(300-1)/4 = 74.75$ times each, on average.

Here are the tabulations for the two series of flips:

   Series 1    Series 2
      H   T        H  T
  H  46 102       71 76
  T 102  49       77 75

The first is obviously far from what we might expect. In that series, an H is more than twice as likely ($102:46$) to be followed by a T than by another H; and a T, in turn, is more than twice as likely ($102:49$) to be followed by an H. In the second series, those likelihoods are nearly $1:1,$ consistent with independent flips.

A chi-squared test works well here, because all the expected counts are far greater than the threshold of 5 often quoted as a minimum. The chi-squared statistics are 38.3 and 0.085, respectively, corresponding to p-values of less than one in a billion and 77%, respectively. In other words, a table of pairs as imbalanced as the second one is to be expected (due to the randomness), but a table as imbalanced as the first happens less than one in every billion such experiments.

(NB: It has been pointed out in comments that the chi-squared test might not be applicable because these transitions are not independent: e.g., an HT can be followed only by a TT or TH. This is a legitimate concern. However, this form of dependence is extremely weak and has little appreciable effect on the null distribution of the chi-squared statistic for sequences as long as $300.$ In fact, the chi-squared distribution is a great approximation to the null sampling distribution even for sequences as short as $21,$ where the counts of the $21-1=20$ transitions that occur are expected to be $20/4=5$ of each type.)


If you know nothing about chi-squared tests, or even if you do but don't want to program the chi-square quantile function to compute a p-value, you can achieve a similar result. First develop a way to quantify the degree of imbalance in a $2\times 2$ table like this. (There are many ways, but all the reasonable ones are equivalent.) Then generate, say, a few hundred such tables randomly (by flipping coins--in the computer, of course!). Compare the imbalances of these two tables to the range of imbalances generated randomly. You will find the first sequence is far outside the range while the second is squarely within it.

Figure

This figure summarizes such a simulation using the chi-squared statistic as the measure of imbalance. Both panels show the same results: one on the original scale and the other on a log scale. The two dashed vertical lines in each panel show the chi-squared statistics for Series 1 (right) and Series 2 (left). The red curve is the $\chi^2(1)$ density. It fits the simulations extremely well at the right (higher values). The discrepancies for low values occur because this statistic has a discrete distribution which cannot be well approximated by any continuous distribution where it takes on small values -- but for our purposes that makes no difference at all.

Related Question