[Math] Given every horse’s probability of winning a race, what is the probability that a specific horse will finish 2nd and 3rd

conditional probabilityprobabilitystatistics

This question is a follow-on from this question
.

I am trying to determine the probability of each horse finishing 2nd and each horse finishing 3rd. I have developed code to calculate the probabilities by implementing the formulas provided in the above mentioned question.

Each horse is represented by a 'horseData' object containing variables such as the horse id (a unique number to identify the horse), the probability of winning (Pw), the probability of finishing 2nd (P2nd), the probability of finishing third (P3rd) among other variables. All of the HorseData objects are contained in a List called hdList.

The following code implements the formula:
$$
P(i,2)= \sum_{i \neq x} (P_x . \frac {P_i}{(1 – P_x) })
$$

// Calc 2nd place for each horse
for (HorseData hdi : hdList) {
    for (HorseData hdx : hdList) {
        if (hdi.id != hdx.id) {
            term = hdx.Pw * hdi.Pw / (1 - hdx.Pw);
            hd.addToP2nd(term);
        }
    }
}

This calculates the probability of finishing 2nd for each horse. The sum of these probabilities adds to one. All good so far.

The following code implements the formula:

$$
P(i,3)= \sum_{i \neq x \neq y}( P_x . P_{y2nd} .\frac {P_i}{(1 – P_x – P_{y2nd}) })
$$

// Calc prob 3rd place for each horse
for (HorseData hdi : hdList) {
    for (HorseData hdx : hdList) {
        if (hdi.id != hdx.id) {
            for (HorseData hdy : hdList) {
                if ((hdx.id != hdy.id) & (hdi.id != hdy.id)) {
                    term = hdx.Pw * hdy.P2nd * hdi.Pw / (1 - hdx.Pw - hdy.P2nd);
                    hd.addToP3rd(term);
                }
            }
        }
    }
}

This calculates the probability of finishing 3rd for each horse. However the sum of these probabilities does not add to one.

For testing, I have a 5 horse race, with the Pw = 0.2 for all horses.

The code to calculate P2nd returns 0.2 for each horse, however the code to calculate P3rd returns 0.16 for each horse (whereas I think it should be 0.2).

Any assistance in reviewing the formulas and the code implementation would be appreciated.

Best Answer

The win probabilities do not determine the prob that horse $k$ finishes in position $j$. What you are using is known as Harville's method. See the paper by D.A. Harville in Journal of the Amer. Stat. Assn. 1973. What he did: Suppose we assume the time it takes for horse $i$ to run the race is $X_i$= exponential with parameter $\lambda_i.$ Assume independence among horses. Let $s=\lambda_1+...+\lambda_n$ Then we can compute prob that $i$ wins $P_i=P(X_i<X_j,j\ne i)=\lambda_i/s.$ And we compute the conditional prob of a $(i,j,k)$ trifecta as $$\frac{P_iP_jP_k}{(1-P_i)(1-(P_i+P_j))} $$

These are totally unrealistic assumptions but they do give reasonable values. And we only need all permutations of the first 3 horses for betting purposes. Read this book: link. It is mainly about the authors method of place and show betting which requires them to compute Harville probabilities. And then they adjust the Harville values based on a regression with track data. This paperback is a condensed version of their book $ \it \text{Beat the Racetrack}.$ Probably you do not need that one, too. They also have a few academic papers. The most relevant one is:

${}_\text{EFFICIENCY OF THE MARKET FOR RACETRACK BETTING By: HAUSCH, DB ZIEMBA, WT RUBINSTEIN, M}$ ${}_\text{MANAGEMENT SCIENCE Volume:27 Pages:1435-1452 1981}$

In my younger days, I tried to follow their method at the track. It is mathematically sound but requires a decision be made with data from as close to post time as possible. These days it seems even more difficult because the money bet off-track is not added to the tote board until after you can no longer make a bet. That squeezes out the profit potential.

In your program you should get 0.2 for third for each horse: Each trifecta has prob (1/5)(1/4)(1/3) and there are 4*3 ways the other horses can finish 1st and 2nd. Looks like you are in error using the prob of 2nd place when computing prob of 3rd. What you want is to fix $k$ and sum over all $i$ and $j$ of this: $$\frac{P_iP_jP_k}{(1-P_i)(1-(P_i+P_j))} $$ with $i\ne j,j\ne k, i\ne k.$ Do not substitute a 2nd place prob here.

I found it useful to use a 2D array for exacta prob. and a 3D array to store and sum the trifecta prob.

There is a considerable number of academic papers on horse racing. Most are concerned with market efficiency (are win odds accurate) or are some bettors more knowledgeable (late money) and appear in the economics literature. I remember at least one attempt to use another model like Harville, but with normal random variables instead. Ugly multivariate normal integrals to approximate.

Related Question