I couldn't follow your derivation, but I believe the correct way to solve the problem is as follows.
First, some notation: take $x_i$ to be the expected number of flips to reach the end state ($n$ consecutive heads or $n$ consecutive tails), given that we're on a "run" of exactly $i$ consecutive flips of heads. Similarly, take $y_i$ to be the expected number of flips to reach the end state, given that we're on a run of exactly $i$ consecutive flips of tails. Clearly, $x_n = y_n = 0$, since if we've already had $n$ consecutive heads or tails, we're done.
Then, we're interested in finding $x_0$ ($= y_0$). The relevant equations are (for $i < n$):
\begin{equation}
\begin{split}
x_i & = (1-p)(1 + y_1) + p(1 + x_{i+1}) \\
y_i & = p(1 + x_1) + (1-p)(1 + y_{i+1})
\end{split}
\end{equation}
Let's derive the first equation (the second one is analogous). Suppose we're on a run of $i$ heads. We want to find the expected number of flips to end the game, which is $x_i$. Let's take our next flip. There's a probability $p$ that it comes up heads, which means we now have a run of $i+1$ heads. That's the second term on the right-hand-side of the equation: there's a probability of $p$ of moving from the state with $i$ consecutive heads to the state with $i+1$ consecutive heads, and it costs 1 flip in order to do so. There's also a probability $1-p$ that it comes up tails, in which case we have a run of only 1 tail. That's the first term on the right-hand-side of the equation: there's a probability of $1-p$ of moving from the state with $i$ consecutive heads to the state with 1 consecutive tail, and it costs us 1 flip in order to do so.
With these equations you can solve the problem for any $n$. I tried implementing this in Mathematica, and if $p = 0$ or $p = 1$, I indeed get that $x_0 = y_0 = n$, as expected. I'm not sure if there's any closed form solution to the equations for arbitrary $n$; that's probably a question for the Math StackExchange. (Edit: see derivation below for the solution for arbitrary $n$).
What I've sketched out here is a general approach for solving many types of coin flipping problems (and other problems). It essentially constructs a Markov chain that has states (e.g., 3 consecutive heads), and transition probabilities for moving between states (e.g., a probability of $1-p$ of moving from 3 consecutive heads to 1 tail). The absorbing states are the end states of the game. See, for instance, the answer to this question.
EDIT: For the particular case of $n=5$, I get (using Mathematica):
\begin{equation}
x_0 = \frac{(5 - 10p + 10p^2 - 5p^3 + p^4)(1 + p + p^2 + p^3 + p^4)}{1 - 4p + 6p^2 - 4p^3 + p^4 + 4p^5 - 6p^6 + 4p^7 - p^8}
\end{equation}
which seems to have the right limits (although it's possible I have a typo somewhere).
Edit #2: Following @whuber's advice, I solved the recurrence relation for general $n$. The solution is:
\begin{equation}
\begin{bmatrix}
x_0 \\ y_0
\end{bmatrix}
=
(A + B)[(I-A^{n-1})^{-1}(I-A)-B]^{-1}c + c
\end{equation}
where
\begin{equation}
A =
\begin{bmatrix}
p & 0 \\
0 & 1-p
\end{bmatrix}
\end{equation}
and
\begin{equation}
B =
\begin{bmatrix}
0 & 1-p \\
p & 0
\end{bmatrix}
\end{equation}
and
\begin{equation}
c =
\begin{bmatrix}
1 \\ 1
\end{bmatrix}
\end{equation}
and $I$ is the identity matrix.
For $n=5$, this gives the same result as above.
If you "know" that the coin is fair
then we still expect the long run proportion of heads to tend to $0.5$. This is not to say that we should expect more (than 50%) of the next flips to be tails, but rather that the initial $500$ flips become irrelevant as $n\rightarrow\infty$. A streak of $500$ heads may seem like a lot (and practically speaking it is), but
- if $250$ of the next $500$ flips are heads then the sample proportion becomes
$$\hat p = \frac{500 + 250}{1000} = 0.75.$$
- if $250$ of the next $500$ flips are heads then...
$$\hat p = \frac{500+250+250}{1500} \approx 0.67$$
- if $100000$ of the next $200000$ flips are heads then...
$$\hat p = \cdots \approx 0.501.$$
This is the Law of Large Numbers.
On the other hand...
if I were to flip a coin in real life and see $500$ heads in a row, I would start to seriously doubt that the coin is actually fair. (Interesting side note, it is hard (impossible?) to actually bias a coin in real life. The only realistic values of $p$ are $0$, $0.5$ and $1$, but we will ignore this for the sake of an answer).
To account for this possibility, we could use a Bayesian procedure from the outset. Rather than assume $p=1/2$, suppose we specify the prior distribution
$$p \sim \text{Beta}(\alpha, \alpha).$$
This is a symmetric distribution, which encodes my a priori belief that the coin is fair, i.e. $E(p) = \frac{1}{2}$. How strongly I believe in this notion is specified through the choice of $\alpha$, since $Var(p) = \frac{1}{8(\alpha+0.5)}$.
- $\alpha = 1$ corresponds to a uniform prior over $(0,1)$.
- $\alpha = 0.5$ is Jeffrey's prior - another popular non-informative choice.
- Choosing a large value of $\alpha$ gives more credence to the belief that $p=1/2$. In fact, setting $\alpha = \infty$ implies that $Pr(p=1/2) = 1$.
Applying Bayes rule directly, the posterior distribution for $p$ is
$$p|y \sim \text{Beta}(\alpha+y, \alpha+n-y)$$
where $y = \text{number of heads}$ and $n = \text{number of flips}$. For instance, if you choose $\alpha = 1$ and observe $n=y=500$, the posterior distribution becomes $\text{Beta}(501, 1)$ and
$$E(p|y) = \frac{\alpha + y}{2\alpha + n} = \frac{501}{502} \approx 0.998$$
indicating that I should bet on heads for the next flip (since it is highly improbable that the coin is fair).
This updating process can be applied after each flip, using the posterior distribution after $n$ flips as the prior for flip $n+1$. If it turns out that the $500$ heads was just a (astronomically) improbable event and the coin really is fair, the posterior distribution will eventually capture this (using a similar argument to the previous section).
Intuition for choosing $\alpha$: To help understand the role of $\alpha$ in the Bayesian procedure, we can use the following argument. The mean of the posterior distribution is equivalent to the maximum likelihood estimate of $p$, if we were to augment the data with a series of $2\alpha$ "hypothetical flips", where $\alpha$ of these flips are heads and $\alpha$ of these flips are tails. Choosing $\alpha=1$ (as we did above) suggests that the augmented data is $501$ heads and $1$ tails. Choosing a larger value of $\alpha$ suggests that more evidence is required to change our beliefs. Still, for any finite choice of $\alpha$, these "hypothetical flips" will eventually become irrelevant as $n\rightarrow\infty$.
Best Answer
Another way is by simulating a million match-offs between $X$ and $Y$ to approximate $P(X > Y) = 0.9907\pm 0.0002.$ [Simulation in R.]
Notes per @AntoniParellada's Comment:
In R, the function
sample(1:6, 100, rep=T)
simulates 100 rolls a fair die; the sum of this simulates $X$. Alsorbinom
is R code for simulating a binomial random variable; here it's $Y.$ The difference is $D = X - Y.$ The procedurereplicate
makes a vector of a million differencesd
. Then(d > 0)
is a logical vector of a millionTRUE
s andFALSE
s, themean
of which is its proportion ofTRUE
s--our Answer. Finally, the last statement gives the margin of error of a 95% confidence interval of the proportion ofTRUE
s (using 2 instead of 1.96), as a reality check on the accuracy of the simulated Answer. [With a million iterations one ordinarily expects 2 or 3 decimal paces of accuracy for probabilities--sometimes more for probabilities so far from 1/2.]