[Math] Bayes’ formula application to the probability of a horse winning, depending on the jockey

bayes-theoremprobability

I have a problem that I have been racking my brain to figure this out and I just don't have the background to know if I am correct or not so I am hoping that you can help me out, OK here it goes so bear with me. I am going to try and make it as simple and clear as possible. I am going to start with a horse racing example and then go into my dilemma

Horse A and Horse B are going to race against each other in a to horse race.

They have raced each other 12 times before and Horse A has won five times and horse B has won seven times. So horse A has won 41.7% of the time but….

There are two Jockeys RED and BLUE

3 out of 5 of those wins came when Blue Jockey was riding horse A. However Blue Jockey rode him only once on the days horse A lost.

Today Blue Jockey is riding horse A so we get a 60% chance if we just use that Blue jockey 3 out of five fact. But I know we need to consider the total amount of races as well

So far I have this then

P(horse A wins given blue jockey rides) = .60 X .417 /.333

because 60% of 3 of the 5 wins were when Blue was riding. 41.7% win over all and .333 cause Blue Jockey has ridden the horse 4 times out of twelve races.

Right?

Now here is where my question with changes

Two horses A & B but five jockeys. so lets just use some quick made up stats for the sake of time.

Horse A has a 66% chance to win but when Jockey Black
rides the horse A, horse A wins 70.5% of the time.

now the colors of the jockeys are put in a jar to be pulled out to see who is going to ride horse A. the following are the chances in % of color being pulled out

Red 19%
Black 21%
Green 13%
Blue 37%
Pink 10%

Would my formula to see the actual percentage that horse A wins be:

 Horse A wins given Black rides = .705 X .66/.21

which then gives me 2.21 which is WHAT?? what does that mean?

AND

if it were a different rider lets say pink and for the sake of simplicity same win% would the denominator then be .10

This is causing me to lose sleep , just kidding(kinda) 🙂

Best Answer

i suspect you are asking how to combine data in a coherent way to predict outcomes. i will suggest an approach to do so: we need a statistical model for observed data. For example, and following the horse racing application, suppose we observe the results of many horse races. In each race we assume two horses, horse A vs horse B, to keep it simple. Any of 5 jockeys can ride the horses. Maybe jockey #1 tends to ride horse A more often. Maybe jockey #2 is an unskilled jockey, etc. That will all come out in the data analysis. A possible model is the logistic regression:

$$ {\hat p}=\frac {1}{1+exp[-(\beta_0+\beta_1A_1+...+\beta_5A_5+\beta_6B_1+...+\beta_{10}B_{5})] }$$

where $\beta_0,...,\beta_{10}$ are values that will be estimated when fitting the model to a data set.

For each race we record the winning horse and the jockey of each horse. We do that by specifying:

$A_k=1$ if jockey $k$ rode horse A and $=0$ if not.

$B_k=1$ if jockey $k$ rode horse B and $=0$ if not.

If we code the data so that horse A wins & B loses is $=1$, and B wins & A loses is $=0$ then the result of the model is to predict the probability that A wins given jockey 3 on A and 1 on B. That would be the expression

$$ {\hat p}=\frac {1}{1+exp[-(\beta_0+\beta_3+\beta_6) ]}$$

Note that the $\beta's$ may be negative. In real horse race data, the impact of a jockey on the outcome is limited so all the A's and B's will come out near 0. But if they had a big effect, the coefficients would indicate that. The value of $\beta_0$ measures the ability of the horses apart from the jockeys. If, on the other hand, you are looking to fit your example into a theorem of Baye's type problem, you might let us know and start here:

http://en.wikipedia.org/wiki/Bayes%27_theorem

Related Question