[Math] Statistics: How to measure how accurately probabilities are reported

probabilitystatistics

If you roll a six sided die a bunch of times, and count how many times the number 1 shows up, you'd expect it to show up about 1/6 of the time.

Now if you roll this die 1000 times, and the number 1 shows up 600 times, you'd know something is amiss (maybe somebody tampered with the die)

I'm designing an AI for a computer game, and part of this AI requires it to try to calculate the probability of some binary (true / false) event occurring. After having calculated the probability, it gets feedback on the outcome of this probability event.

The catch is the program's prediction may be wrong — for example, it might falsely calculate an event to have a 1/6 chance of occurring, while the actual probability of it occurring is 0.6.

How can I determine to what extent is the probability engine accurate? For instance, for the event that should occur with probability 0.6, the engine reporting the probability to be 0.7 would be rated much more accurate than one reporting the probability as 0.1.

Here's a sample of my data. So if the engine is accurate, 90% of the entries of the value 0.90 should be in the success category, while only 10% of them in the failure category.

Best Answer

One possible statistic that you could use to check this: if the probability of success for event $i$ is reported as $p_i$, let $Y_i = 1 - p_i$ in case of success and $-p_i$ in case of failure, and let $S = \sum_{i=1}^n Y_i$ (where there are $n$ events). Assuming the probability is accurate and the events are independent, $S$ has mean $0$ and standard deviation $\sigma = \sqrt{\sum_{i=1}^n p_i (1-p_i)}$; if $n$ is large and at least some positive fraction of the $p_i$ are not too close to $0$ or $1$, the distribution should be close to a normal distribution with that mean and standard deviation. The null hypothesis (that the probabilities are accurate) would be rejected at the 5% confidence level if $|S| > 1.96 \sigma$.

For your data I get $S = -5.46$ and $\sigma = 3.122627099$, so $S/\sigma = -1.748527707$. That's not enough to reject the null hypothesis.

Related Question