I don't know if this counts as an elegant solution in your book, but I think it's cute.
Let's say the "frequency state" of a deck is the number of cards of each face value remaining. A full deck, for example, has the frequency state "4 aces, 4 twos, 4 threes...," while an empty deck has the frequency state "0 aces, 0 twos, 0 threes...." There are $5^{13}$ possible frequency states. When you draw a card from a deck, the frequency state changes in a way that depends only on the face value of the card you drew.
You can turn the set of possible frequency states into a directed graph by drawing an arrow for each way the frequency state can change when you draw a card. I'll call this graph the "draw graph." Each vertex in the draw graph has at most 13 edges leaving it, one for each type of card you could draw.
You can turn the draw graph into a weighted directed graph by assuming that cards are drawn uniformly at random, and weighting each arrow by the probability of that kind of draw. The full-deck state, for example, has 13 arrows leaving it, each with weight $\tfrac{1}{13}$. If you draw a queen, you end up in a state that still has 13 arrows leaving it, but 12 of them have weight $\tfrac{4}{51}$, and one—the arrow for drawing another queen—has weight $\tfrac{3}{51}$. The weights of the arrows leaving each state add up to one, so the draw graph is a Markov chain.
Let's say the draw has been passed to the dealer. We know the dealer's hand and the frequency state of the deck. Here's a cool fact: from now on, we can figure out the dealer's hand just by looking at the frequency state of the deck. That's because, when the dealer starts hitting, all the cards she draws from the deck end up in her hand. Using this fact, we can translate properties of the dealer's hand, like whether it's bust, into properties of the frequency state.
Let's record this information on the draw graph by labeling its vetrtices. We'll label the states where the dealer stays as "stay states," the states where the dealer is bust as "bust states," and the states where the dealer keeps hitting as "hit states." When the dealer is hitting, she's walking randomly along the draw graph, with her direction from each state chosen using the arrow weights as probabilities. She keeps walking until she reaches a stay state or a bust state.
The dealer has to eventually stay or go bust, so the process we just described is an absorbing Markov chain. Like most things in graph theory, absorbing Markov chains have a very pretty linear algebraic description in terms of the adjacency map of the transition graph. If you know how this works, I can describe very quickly how to calculate the bust probability.
Cribbing from Wikipedia, let $Q$ be the map describing transitions from hit states to hit states, and let $R$ be the map describing transitions from hit states to stay and bust states. Let $\pi$ be the vector corresponding to the initial state, and let $\Pi$ be the projection map onto the subspace spanned by the bust states. The bust probability is the sum of the entries of the vector
$$\Pi R (1 - Q)^{-1}\pi.$$
The $(1 - Q)^{-1}$ factor describes the part of the process where the dealer hits over and over, so I think it encapsulates the tricky "recursive bit" of the computation.
(Caution: my operators are the transposes of Wikipedia's, which is why the formula looks backwards.)
I think this question is related to a broader question that I ask myself all the time in research: what does it mean to have a "nice solution" to a problem? When I was young, I was taught that a "nice solution" is a formula for the thing you want to calculate, but that's not always true! Having a forumla for something often tells you very little about it, and other descriptions are often much more useful from a practical point of view.
I'm not sure whether the description of the bust probability given above is much use, but for this problem, I suspect that a linear algebraic description of this kind will be more useful than a formula.
Without the option, this is a fair game.
If the option is priced on the cheap side of fair, we buy the option and the option favors the player. If the option is rich, we don't buy the option.
Assuming the option is cheap, we plan to exercise if our first card is in $[1,7]$ and we won't exercise if the first card is in $[8,13]$ (number from 1-13 is simpler than translating face-cards to numerals)
Your chance of getting any particular number after exercising the option is:
$P(x) = \begin{cases} \frac{7}{169} &x\in [1-7]\\\frac{20}{169}&x\in [8-13] \end{cases}$
Your chance of winning is for any given $x$ is $\frac {x-1}{13}$ your chance of loosing is $\frac{13-x}{13}$ and you have a $\frac {1}{13}$ chance of a push.
If the option is free. $E[X] = 10 (\sum_\limits{i=1}^7 \frac {7(i-1)}{13^3} + \sum_\limits{i=8}^{13} \frac {20(i-1)}{13^3}) - 10(\sum_\limits{i=1}^7 \frac {7(13-i)}{13^3} + \sum_\limits{i=8}^{13} \frac {20(13-i)}{13^3}) = 10(\frac {7\cdot21+20\cdot57 - 7\cdot 63 - 20\cdot 15}{13^3}) = \frac{546}{13^3}$
Paying for the option, this is a fixed cost regardless of how the game comes out.
You should be willing to pay up to, but no more than $\frac{546}{13^3}\approx \$2.485$
Note that we have ignored the push. If the option is close to fair value the push can be safely ignored. But if the option were free, our expected gain would be greater by a factor of $\frac{13}{12}$
Best Answer
The probabilities $P(B)$ and $P(A)$ are indeed equal, but $A$ and $B$ are not independent. After all, there are only four aces in the deck, and we can see two of them if both the player and the dealer have blackjack. We should expect $P(A\cap B)$ to be strictly less than $P(A)\cdot P(B)$.
As for the order - we just dealt out four cards face up, two to the player and two to the dealer. Do we really care what order we laid the cards on the table? No; what matters is which two cards the player got, which two cards the dealer got, and that the cards are all different. Either way you condition it to account for the smaller deck, the probability of double blackjack will be the same.