If you "know" that the coin is fair
then we still expect the long run proportion of heads to tend to $0.5$. This is not to say that we should expect more (than 50%) of the next flips to be tails, but rather that the initial $500$ flips become irrelevant as $n\rightarrow\infty$. A streak of $500$ heads may seem like a lot (and practically speaking it is), but
- if $250$ of the next $500$ flips are heads then the sample proportion becomes
$$\hat p = \frac{500 + 250}{1000} = 0.75.$$
- if $250$ of the next $500$ flips are heads then...
$$\hat p = \frac{500+250+250}{1500} \approx 0.67$$
- if $100000$ of the next $200000$ flips are heads then...
$$\hat p = \cdots \approx 0.501.$$
This is the Law of Large Numbers.
On the other hand...
if I were to flip a coin in real life and see $500$ heads in a row, I would start to seriously doubt that the coin is actually fair. (Interesting side note, it is hard (impossible?) to actually bias a coin in real life. The only realistic values of $p$ are $0$, $0.5$ and $1$, but we will ignore this for the sake of an answer).
To account for this possibility, we could use a Bayesian procedure from the outset. Rather than assume $p=1/2$, suppose we specify the prior distribution
$$p \sim \text{Beta}(\alpha, \alpha).$$
This is a symmetric distribution, which encodes my a priori belief that the coin is fair, i.e. $E(p) = \frac{1}{2}$. How strongly I believe in this notion is specified through the choice of $\alpha$, since $Var(p) = \frac{1}{8(\alpha+0.5)}$.
- $\alpha = 1$ corresponds to a uniform prior over $(0,1)$.
- $\alpha = 0.5$ is Jeffrey's prior - another popular non-informative choice.
- Choosing a large value of $\alpha$ gives more credence to the belief that $p=1/2$. In fact, setting $\alpha = \infty$ implies that $Pr(p=1/2) = 1$.
Applying Bayes rule directly, the posterior distribution for $p$ is
$$p|y \sim \text{Beta}(\alpha+y, \alpha+n-y)$$
where $y = \text{number of heads}$ and $n = \text{number of flips}$. For instance, if you choose $\alpha = 1$ and observe $n=y=500$, the posterior distribution becomes $\text{Beta}(501, 1)$ and
$$E(p|y) = \frac{\alpha + y}{2\alpha + n} = \frac{501}{502} \approx 0.998$$
indicating that I should bet on heads for the next flip (since it is highly improbable that the coin is fair).
This updating process can be applied after each flip, using the posterior distribution after $n$ flips as the prior for flip $n+1$. If it turns out that the $500$ heads was just a (astronomically) improbable event and the coin really is fair, the posterior distribution will eventually capture this (using a similar argument to the previous section).
Intuition for choosing $\alpha$: To help understand the role of $\alpha$ in the Bayesian procedure, we can use the following argument. The mean of the posterior distribution is equivalent to the maximum likelihood estimate of $p$, if we were to augment the data with a series of $2\alpha$ "hypothetical flips", where $\alpha$ of these flips are heads and $\alpha$ of these flips are tails. Choosing $\alpha=1$ (as we did above) suggests that the augmented data is $501$ heads and $1$ tails. Choosing a larger value of $\alpha$ suggests that more evidence is required to change our beliefs. Still, for any finite choice of $\alpha$, these "hypothetical flips" will eventually become irrelevant as $n\rightarrow\infty$.
By default, str_count
does not count overlapping occurrances of the specified pattern. The substring 1111
can overlap with itself substantially, whereas the substring 1110
cannot overlap with itself. Consequently, your calculation for the first substring is substantially biased --- you are substantially undercounting the number of times this pattern actually occurs in your simulation. Try this alternative method instead:
#Flip the coin many times
set.seed(1)
n <- 10^8
FLIPS <- sample(c(0,1), size = n, replace = TRUE)
#Count the proportion of occurrences of 1-1-1-1
PATTERN.1111 <- FLIPS[1:(n-3)]*FLIPS[2:(n-2)]*FLIPS[3:(n-1)]*FLIPS[4:n]
sum(PATTERN.1111)/n
[1] 0.06246614
#Count the proportion of occurrences of 1-1-1-0
PATTERN.1110 <- FLIPS[1:(n-3)]*FLIPS[2:(n-2)]*FLIPS[3:(n-1)]*(1-FLIPS[4:n])
sum(PATTERN.1110)/n
[1] 0.0624983
With this alternative simulation (which counts overlapping occurrences of the patterns) you get proportions for the two outcomes that are roughly the same. If the coin flips are in fact independent and "fair" then each player has the same probability of winning the wager. Mathematically, the true probability of any run of four outcomes is $1/2^4 = 0.0625$, so that it what the above simulations are effectively estimating; the remaining small disparity in the simulation is due to random error.
Best Answer
It's called the Gambler's fallacy.