Time to Hit a Pattern in Coin Toss Series – Heads and Tails Probability Analysis

probabilityrstochastic-processesstring-count

Inspired by Peter Donnelly's talk at TED, in which he discusses how long it would take for a certain pattern to appear in a series of coin tosses, I created the following script in R. Given two patterns 'hth' and 'htt', it calculates how long it takes (i.e. how many coin tosses) on average before you hit one of these patterns.

coin <- c('h','t')

hit <- function(seq) {
    miss <- TRUE
    fail <- 3
    trp  <- sample(coin,3,replace=T)
    while (miss) {
        if (all(seq == trp)) {
            miss <- FALSE
        }
        else {
            trp <- c(trp[2],trp[3],sample(coin,1,T))
            fail <- fail + 1
        }
    }
    return(fail)
}

n <- 5000
trials <- data.frame("hth"=rep(NA,n),"htt"=rep(NA,n))

hth <- c('h','t','h')
htt <- c('h','t','t')

set.seed(4321)
for (i in 1:n) {
    trials[i,] <- c(hit(hth),hit(htt))    
}
summary(trials)

The summary statistics are as follows,

      hth             htt        
 Min.   : 3.00   Min.   : 3.000  
 1st Qu.: 4.00   1st Qu.: 5.000  
 Median : 8.00   Median : 7.000  
 Mean   :10.08   Mean   : 8.014  
 3rd Qu.:13.00   3rd Qu.:10.000  
 Max.   :70.00   Max.   :42.000 

In the talk it is explained that the average number of coin tosses would be different for the two patterns; as can be seen from my simulation. Despite watching the talk a few times I'm still not quite getting why this would be the case. I understand that 'hth' overlaps itself and intuitively I would think that you would hit 'hth' sooner than 'htt', but this is not the case. I would really appreciate it if someone could explain this to me.

Best Answer

Think about what happens the first time you get an H followed by a T.

Case 1: you're looking for H-T-H, and you've seen H-T for the first time. If the next toss is H, you're done. If it's T, you're back to square one: since the last two tosses were T-T you now need the full H-T-H.

Case 2: you're looking for H-T-T, and you've seen H-T for the first time. If the next toss is T, you're done. If it's H, this is clearly a setback; however, it's a minor one since you now have the H and only need -T-T. If the next toss is H, this makes your situation no worse, whereas T makes it better, and so on.

Put another way, in case 2 the first H that you see takes you 1/3 of the way, and from that point on you never have to start from scratch. This is not true in case 1, where a T-T erases all progress you've made.

Related Question