Solved – Could any equation have predicted the results of this simulation

distributionshistogramprobabilityrsimulation

Suppose there is a coin that has a 5% chance of landing on HEADS and a 95% chance of landing on TAILS. Based on a computer simulation, I want to find out the following :

The average number of flips before observing HEADS, TAILS, HEADS (note: not the probability, but the number of flips)

Using the R programming language, I tried to make this simulation (this simulation keeps flipping a coin until HTH and counts the number of flips until this happens – it then repeats this same process 10,000 times):

results <- list()

for (j in 1:10000) {

  response_i <- ''
  i <- 1

  while (response_i != 'HTH') {
    response_i <- c("H","T")
    response_i <- sample(response_i, 3, replace=TRUE, 
                         prob=c(0.05, 0.95))
    response_i <- paste(response_i, collapse = '')

    iteration_i = i
    if (response_i == 'HTH') {
      run_i = data.frame(response_i, iteration_i)
      results[[j]] <- run_i
      }
    i <- i + 1
  }
}

data <- do.call('rbind', results)

We can now see a histogram of this data:

hist(data$iteration_i, breaks = 500, main = "Number of Flips Before HTH")

We can also see the summary of this data:

summary(data$iteration_i)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    1.0   119.0   288.0   413.7   573.0  3346.0

My Question:

Could any "mathematical equation" have predicted the results of this simulation in advance? Could any "formula" have shown that the average number of flips to get HTH would have been 413? Can Markov Chains be used to solve this problem?
Based on the "skewed" shape of this histogram, is the "arithmetical mean" (i.e. mean = sum(x_i)/n) a "faithful" representation of the "true mean"? Looking at the above histogram, we can clearly see that you are are more likely to see HTH before 437 iterations compared to seeing HTH after 437 iterations, e.g. (on 100,000 simulations, the new average is 418):

nrow(data[which(data$iteration_i <418), ])

63184

nrow(data[which(data$iteration_i > 418), ])

36739

For such distributions, is there a better method to find out the "expectation" of this experiment?

Thanks!

Best Answer

At any given point in the game, you're $3$ or fewer "perfect flips" away from winning.

For example, suppose you've flipped the following sequence so far: $$ HTTHHHTTTTTTH $$

You haven't won yet, but you could win in two more flips if those two flips are $TH$. In other words, your last flip was $H$ so you have made "one flip" worth of progress toward your goal.

Since you mentioned Markov Chains, let's describe the "state" of the game by how much progress you have made toward the desired sequence $HTH$. At every point in the game, your progress is either $0$, $1$, or $2$--if it reaches $3$, then you have won. So we'll label the states $0$, $1$, $2$. (And if you want, you can say that there's an "absorbing state" called "state $3$".)

You start out in state $0$, of course.

You want to know the expected number of flips, from the starting point, state $0$. Let $E_i$ denote the expected number of flips, starting from state $i$.

At state $0$, what can happen? You can either flip $H$, and move to state $1$, or you flip $T$ and remain in state $0$. But either way, your "flip counter" goes up by $1$. So: $$ E_0 = p (1 + E_1) + (1-p)(1 + E_0), $$ where $p = P(H)$, or equivalently $$ E_0 = 1 + p E_1 + (1-p) E_0. $$ The "$1+$" comes from incrementing your "flip counter".

At state $1$, you want $T$, not $H$. But if you do get an $H$, at least you don't go back to the beginning--you still have an $H$ that you can build on next time. So: $$ E_1 = 1 + p E_1 + (1-p) E_2. $$

At state $2$, you either flip $H$ and win, or you flip $T$ and go all the way back to the beginning. $$ E_2 = 1 + (1-p) E_0. $$

Now solve the three linear equations for the three unknowns.

In particular you want $E_0$. I get $$ E_0 = \left( \frac{1}{p} \right) \left( \frac{1}{p} + \frac{1}{1-p} + 1 \right), $$ which for $p=1/20$ gives $E_0 = 441 + 1/19 \approx 441.0526$. (So the mean is not $413$. In my own simulations I do get results around $441$ on average, at least if I do around $10^5$ or $10^6$ trials.)

In case you are interested, our three linear equations come from the Law of Total Expectation.

This is really the same as the approach in Stephan Kolassa's answer, but it is a little more efficient because we don't need as many states. For example, there is no real difference between $TTT$ and $HTT$--either way, you're back at the beginning. So we can "collapse" those sequences together, instead of treating them as separate states.

Simulation code (two ways, sorry for using Python instead of R):

# Python 3
import random

def one_trial(p=0.05):
    # p = P(Heads)
    state = 0 # states are 0, 1, 2, representing the progress made so far
    flip_count = 0 # number of flips so far
    while True:
        flip_count += 1
        if state == 0: # empty state
            state = random.random() < p
            # 1 if H, 0 if T
        elif state == 1: # 'H'
            state += random.random() >= p
            # state 1 (H) if flip H, state 2 (HT) if flip T
        else: # state 2, 'HT'
            if random.random() < p: # HTH, game ends!
                return flip_count
            else: # HTT, back to empty state
                state = 0

def slow_trial(p=0.05):
    sequence = ''
    while sequence[-3:] != 'HTH':
        if random.random() < p:
            sequence += 'H'
        else:
            sequence += 'T'
    return len(sequence)

N = 10**5
print(sum(one_trial() for _ in range(N)) / N)
print(sum(slow_trial() for _ in range(N)) / N)

Related Solutions

Time to Hit a Pattern in Coin Toss Series – Heads and Tails Probability Analysis

Think about what happens the first time you get an H followed by a T.

Case 1: you're looking for H-T-H, and you've seen H-T for the first time. If the next toss is H, you're done. If it's T, you're back to square one: since the last two tosses were T-T you now need the full H-T-H.

Case 2: you're looking for H-T-T, and you've seen H-T for the first time. If the next toss is T, you're done. If it's H, this is clearly a setback; however, it's a minor one since you now have the H and only need -T-T. If the next toss is H, this makes your situation no worse, whereas T makes it better, and so on.

Put another way, in case 2 the first H that you see takes you 1/3 of the way, and from that point on you never have to start from scratch. This is not true in case 1, where a T-T erases all progress you've made.

Solved – A serious in-depth problem of probabilities for flipping coins

If I have understood correctly, then the problem is to find a probability distribution for the time at which the first run of $n$ or more heads ends.

Edit The probabilities can be determined accurately and quickly using matrix multiplication, and it is also possible to analytically compute the mean as $\mu_-=2^{n+1}-1$ and the variance as $\sigma^2=2^{n+2}\left(\mu-n-3\right) -\mu^2 + 5\mu$ where $\mu=\mu_-+1$, but there is probably not a simple closed form for the distribution itself. Above a certain number of coin flips the distribution is essentially a geometric distribution: it would make sense to use this form for larger $t$.

The evolution in time of the probability distribution in state space can be modelled using a transition matrix for $k=n+2$ states, where $n=$ the number of consecutive coin flips. The states are as follows:

State $H_0$, no heads
State $H_i$, $i$ heads, $1\le i\le(n-1)$
State $H_n$, $n$ or more heads
State $H_*$, $n$ or more heads followed by a tail

Once you get into state $H_*$ you can't get back to any of the other states.

The state transition probabilities to get into the states are as follows

State $H_0$: probability $\frac12$ from $H_i$, $i=0,\ldots,n-1$, i.e. including itself but not state $H_n$
State $H_i$: probability $\frac12$ from $H_{i-1}$
State $H_n$: probability $\frac12$ from $H_{n-1},H_n$, i.e. from the state with $n-1$ heads and itself
State $H_*$: probability $\frac12$ from $H_n$ and probability 1 from $H_*$ (itself)

So for example, for $n=4$, this gives the transition matrix

$$ X = \left\{ \begin{array}{c|cccccc} & H_0 & H_1 & H_2 & H_3 & H_4 & H_* \\ \hline H_0 & \frac12 & \frac12& \frac12& \frac12 &0 & 0 \\ H_1 & \frac12 &0 & 0 &0 & 0 & 0 \\ H_2 & 0 & \frac12 &0 &0 & 0 & 0 \\ H_3 & 0 & 0 &\frac12 &0 & 0 & 0 \\ H_4 & 0 & 0 &0 &\frac12 & \frac12 & 0 \\ H_* & 0 & 0 & 0&0 &\frac12 & 1 \end{array}\right\} $$

For the case $n=4$, the initial vector of probabilities $\mathbf{p}$ is $\mathbf{p}=(1,0,0,0,0,0)$. In general the initial vector has $$\mathbf{p}_i=\begin{cases}1&i=0\\0&i>0\end{cases}$$

The vector $\mathbf{p}$ is the probability distribution in space for any given time. The required cdf is a cdf in time, and is the probability of having seen at least $n$ coin flips end by time $t$. It can be written as $(X^{t+1}\mathbf{p})_k$, noting that we reach state $H_*$ 1 timestep after the last in the run of consecutive coin flips.

The required pmf in time can be written as $(X^{t+1}\mathbf{p})_k - (X^{t}\mathbf{p})_k$. However numerically this involves taking away a very small number from a much larger number ($\approx 1$) and restricts precision. Therefore in calculations it is better to set $X_{k,k}=0$ rather than 1. Then writing $X'$ for the resulting matrix $X'=X|X_{k,k}=0$, the pmf is $(X'^{t+1}\mathbf{p})_k$. This is what is implemented in the simple R program below, which works for any $n\ge 2$,

n=4
k=n+2
X=matrix(c(rep(1,n),0,0, # first row
           rep(c(1,rep(0,k)),n-2), # to half-way thru penultimate row
           1,rep(0,k),1,1,rep(0,k-1),1,0), # replace 0 by 2 for cdf
         byrow=T,nrow=k)/2
X

t=10000
pt=rep(0,t) # probability at time t
pv=c(1,rep(0,k-1)) # probability vector
for(i in 1:(t+1)) {
  #pvk=pv[k]; # if calculating via cdf
  pv = X %*% pv;
  #pt[i-1]=pv[k]-pvk # if calculating via cdf
  pt[i-1]=pv[k] # if calculating pmf
}

m=sum((1:t)*pt)
v=sum((1:t)^2*pt)-m^2
c(m, v)

par(mfrow=c(3,1))
plot(pt[1:100],type="l")
plot(pt[10:110],type="l")
plot(pt[1010:1110],type="l")

The upper plot shows the pmf between 0 and 100. The lower two plots show the pmf between 10 and 110 and also between 1010 and 1110, illustrating the self-similarity and the fact that as @Glen_b says, the distribution looks like it can be approximated by a geometric distribution after a settling down period.

enter image description here

It's possible to investigate this behaviour further using an eigenvector decomposition of $X$. Doing so shows that the for sufficiently large $t$, $p_{t+1}\approx c(n)p_t$, where $c(n)$ is the solution of the equation $2^{n+1}c^n(c-1)+1=0$. This approximation gets better with increasing $n$ and is excellent for $t$ in the range from about 30 to 50, depending on the value of $n$, as shown in the plot of log error below for computing $p_{100}$ (rainbow colours, red on the left for $n=2$). (In fact for numerical reasons, it would actually be better to use the geometric approximation for probabilities when $t$ is larger.)

enter image description here

I suspect(ed) there might be a closed form available for the distribution because the means and variances as I have calculated them as follows

$$ \begin{array}{r|rr} n & \text{Mean} & \text{Variance} \\ \hline 2 &7& 24& \\ 3 &15& 144& \\ 4 &31& 736& \\ 5 &63& 3392& \\ 6 &127& 14720& \\ 7 &255& 61696& \\ 8 &511& 253440& \\ 9 &1023& 1029120& \\ 10&2047& 4151296& \\ \end{array} $$

(I had to bump up the number up the time horizon to t=100000 to get this but the program still ran for all $n=2,\ldots,10$ in less than about 10 seconds.) The means in particular follow a very obvious pattern; the variances less so. I have solved a simpler, 3-state transition system in the past, but so far I'm having no luck with a simple analytic solution to this one. Perhaps there's some useful theory that I'm not aware of, e.g. relating to transition matrices.

Edit: after a lot of false starts I came up with a recurrence formula. Let $p_{i,t}$ be the probability of being in state $H_i$ at time $t$. Let $q_{*,t}$ be the cumulative probability of being in state $H_*$, i.e. the final state, at time $t$. NB

For any given $t$, $p_{i,t}, 0\le i\le n$ and $q_{*,t}$ are a probability distribution over space $i$ , and immediately below I use the fact that their probabilities add to 1.
$p_{*,t}$ form a probability distribution over time $t$. Later, I use this fact in deriving the means and variances.

The probability of being at the first state at time $t+1$, i.e. no heads, is given by the transition probabilities from states that can return to it from time $t$ (using the theorem of total probability). \begin{align} p_{0,t+1} &= \frac12p_{0,t} + \frac12p_{1,t} + \ldots \frac12p_{n-1,t}\\ &= \frac12\sum_{i=0}^{n-1}p_{i,t}\\ &= \frac12\left( 1-p_{n,t}-q_{*,t} \right) \end{align} But to get from state $H_0$ to $H_{n-1}$ takes $n-1$ steps, hence $p_{n-1,t+n-1} = \frac1{2^{n-1}}p_{0,t}$ and $$ p_{n-1,t+n} = \frac1{2^n}\left( 1-p_{n,t}-q_{*,t} \right)$$ Once again by the theorem of total probability the probability of being at state $H_n$ at time $t+1$ is \begin{align} p_{n,t+1} &= \frac12p_{n,t} + \frac12p_{n-1,t}\\ &= \frac12p_{n,t} + \frac1{2^{n+1}}\left( 1-p_{n,t-n}-q_{*,t-n} \right)\;\;\;(\dagger)\\ \end{align} and using the fact that $q_{*,t+1}-q_{*,t}=\frac12p_{n,t} \implies p_{n,t} = 2q_{*,t+1}-2q_{*,t}$, \begin{align} 2q_{*,t+2} - 2q_{*,t+1} &= q_{*,t+1}-q_{*,t}+\frac1{2^{n+1}}\left( 1-2q_{*,t-n+1}+q_{*,t-n} \right) \end{align} Hence, changing $t\to t+n$, $$2q_{*,t+n+2} - 3q_{*,t+n+1} +q_{*,t+n} + \frac1{2^n}q_{*,t+1} - \frac1{2^{n+1}}q_{*,t} - \frac1{2^{n+1}} = 0$$

This recurrence formula checks out for the cases $n=4$ and $n=6$. E.g. for $n=6$ a plot of this formula using t=1:994;v=2*q[t+8]-3*q[t+7]+q[t+6]+q[t+1]/2**6-q[t]/2**7-1/2**7 gives machine order accuracy.

enter image description here

Edit I can't see where to go to find a closed form from this recurrence relation. However, it is possible to get a closed form for the mean.

Starting from $(\dagger)$, and noting that $p_{*,t+1}=\frac12p_{n,t}$, \begin{align} p_{n,t+1} &= \frac12p_{n,t} + \frac1{2^{n+1}}\left( 1-p_{n,t-n}-q_{*,t-n} \right)\;\;\;(\dagger)\\ 2^{n+1}\left(2p_{*,t+n+2}-p_{*,t+n+1}\right)+2p_{*,t+1} &= 1-q_{*,t} \end{align} Taking sums from $t=0$ to $\infty$ and applying the formula for the mean $E[X] = \sum_{x=0}^\infty (1-F(x))$ and noting that $p_{*,t}$ is a probability distribution gives \begin{align} 2^{n+1}\sum_{t=0}^\infty \left(2p_{*,t+n+2}-p_{*,t+n+1}\right) + 2\sum_{t=0}^\infty p_{*,t+1} &= \sum_{t=0}^\infty(1-q_{*,t}) \\ 2^{n+1} \left(2\left(1-\frac1{2^{n+1}}\right)-\;1\;\right) + 2 &= \mu \\ 2^{n+1} &= \mu \end{align} This is the mean for reaching state $H_*$; the mean for the end of the run of heads is one less than this.

Edit A similar approach using the formula $E[X^2] = \sum_{x=0}^\infty (2x+1)(1-F(x))\;$ from this question yields the variance. \begin{align} \sum_{t=0}^\infty(2t+1)\left (2^{n+1}\left(2p_{*,t+n+2}-p_{*,t+n+1}\right) + 2 p_{*,t+1}\right) &= \sum_{t=0}^\infty(2t+1)(1-q_{*,t}) \\ 2\sum_{t=0}^\infty t\left (2^{n+1}\left(2p_{*,t+n+2}-p_{*,t+n+1}\right) + 2 p_{*,t+1}\right) + \mu &= \sigma^2 + \mu^2 \\ 2^{n+2}\left(2\left(\mu-(n+2)+\frac1{2^{n+1}}\right)-(\mu-(n+1))\right) + 4(\mu-1) &\\+ \mu &= \sigma^2 + \mu^2 \\ 2^{n+2}\left(2(\mu-(n+2))-(\mu-(n+1))\right) + 5\mu &= \sigma^2 + \mu^2 \\ 2^{n+2}\left(\mu-n-3\right) + 5\mu &= \sigma^2 + \mu^2 \\ 2^{n+2}\left(\mu-n-3\right) -\mu^2 + 5\mu &= \sigma^2 \end{align}

The means and variances can easily be generated programmatically. E.g. to check the means and variances from the table above use

n=2:10
m=c(0,2**(n+1))
v=2**(n+2)*(m[n]-n-3) + 5*m[n] - m[n]^2

Finally, I'm not sure what you wanted when you wrote

when a tails hits and breaks the streak of heads the count would start again from the next flip.

If you meant what is the probability distribution for the next time at which the first run of $n$ or more heads ends, then the crucial point is contained in this comment by @Glen_b, which is that the process begins again after one tail (c.f. the initial problem where you could get a run of $n$ or more heads immediately).

This means that, for example, the mean time to the first event is $\mu-1$, but the mean time between events is always $\mu+1$ (the variance is the same). It's also possible to use a transition matrix to investigate the long-term probabilities of being in a state after the system has "settled down". To get the appropriate transition matrix, set $X_{k,k,}=0$ and $X_{1,k}=1$ so that the system return immediately to state $H_0$ from state $H_*$. Then the scaled first eigenvector of this new matrix gives the stationary probabilities. With $n=4$ these stationary probabilities are

$$ \begin{array}{c|cccccc} & \text{probability} \\ \hline H_0 & 0.48484848 \\ H_1 & 0.24242424 \\ H_2 & 0.12121212 \\ H_3 & 0.06060606 \\ H_4 & 0.06060606 \\ H_* & 0.03030303 \end{array} $$ The expected time between states is given by the reciprocal of the probability. So the expected time between visits to $H_* = 1/0.03030303 = 33 = \mu + 1$.

Appendix: Python program used to generate exact probabilities for n = number of consecutive heads over N tosses.

import itertools, pylab

def countinlist(n, N):
    count = [0] * N
    sub = 'h'*n+'t'
    for string in itertools.imap(''.join, itertools.product('ht', repeat=N+1)):
        f = string.find(sub)
        if (f>=0):
            f = f + n -1 # don't count t, and index in count from zero 
            count[f] = count[f] +1
            # uncomment the following line to print all matches
            # print "found at", f+1, "in", string
    return count, 1/float((2**(N+1)))

n = 4
N = 24
counts, probperevent = countinlist(n,N)
probs = [count*probperevent for count in counts]

for i in range(N):
    print '{0:2d} {1:.10f}'.format(i+1,probs[i]) 
pylab.title('Probabilities of getting {0} consecutive heads in {1} tosses'.format(n, N))
pylab.xlabel('toss')
pylab.ylabel('probability')
pylab.plot(range(1,(N+1)), probs, 'o')
pylab.show()

Best Answer

Related Solutions

Time to Hit a Pattern in Coin Toss Series – Heads and Tails Probability Analysis

Solved – A serious in-depth problem of probabilities for flipping coins

Related Question