Inspired by Peter Donnelly's talk at TED, in which he discusses how long it would take for a certain pattern to appear in a series of coin tosses, I created the following script in R. Given two patterns 'hth' and 'htt', it calculates how long it takes (i.e. how many coin tosses) on average before you hit one of these patterns.
coin <- c('h','t')
hit <- function(seq) {
miss <- TRUE
fail <- 3
trp <- sample(coin,3,replace=T)
while (miss) {
if (all(seq == trp)) {
miss <- FALSE
}
else {
trp <- c(trp[2],trp[3],sample(coin,1,T))
fail <- fail + 1
}
}
return(fail)
}
n <- 5000
trials <- data.frame("hth"=rep(NA,n),"htt"=rep(NA,n))
hth <- c('h','t','h')
htt <- c('h','t','t')
set.seed(4321)
for (i in 1:n) {
trials[i,] <- c(hit(hth),hit(htt))
}
summary(trials)
The summary statistics are as follows,
hth htt
Min. : 3.00 Min. : 3.000
1st Qu.: 4.00 1st Qu.: 5.000
Median : 8.00 Median : 7.000
Mean :10.08 Mean : 8.014
3rd Qu.:13.00 3rd Qu.:10.000
Max. :70.00 Max. :42.000
In the talk it is explained that the average number of coin tosses would be different for the two patterns; as can be seen from my simulation. Despite watching the talk a few times I'm still not quite getting why this would be the case. I understand that 'hth' overlaps itself and intuitively I would think that you would hit 'hth' sooner than 'htt', but this is not the case. I would really appreciate it if someone could explain this to me.
Best Answer
Think about what happens the first time you get an H followed by a T.
Case 1: you're looking for H-T-H, and you've seen H-T for the first time. If the next toss is H, you're done. If it's T, you're back to square one: since the last two tosses were T-T you now need the full H-T-H.
Case 2: you're looking for H-T-T, and you've seen H-T for the first time. If the next toss is T, you're done. If it's H, this is clearly a setback; however, it's a minor one since you now have the H and only need -T-T. If the next toss is H, this makes your situation no worse, whereas T makes it better, and so on.
Put another way, in case 2 the first H that you see takes you 1/3 of the way, and from that point on you never have to start from scratch. This is not true in case 1, where a T-T erases all progress you've made.