Solved – How to evaluate quality of probability estimator for Bernoulli experiments

estimationprobability

Given that I have a set of bernoulli experiments, each with a different and unkown probability $p_i$ and an outcome $x_i$, and an estimator that for each experiment gives a prediction of the probability of the event, I want to measure the prediction quality of the estimator.

Example: I have a stack of n "unfair" coins, each with a different probability $p_i$ for heads and $1-p_i$ for tails. The probabilities are unknown and I can flip each coin only once.
Assume that there is a "coin flipping expert" which can have a close look at each coin before flipping them and make an estimate for the probabilities, based on form, size, width, regularity and so on. After the expert makes his prediction, the coin is flipped and the result is noted.

After all coins are flipped, I want to measure how good the expert was, for example on a scale between 0 and 1, where 1 means perfect prediction and 0 means pure randomness. I would also be interested in bias / variance of the predictor.

Best Answer

You can quantify the quality of the estimator by calculating the total surprisal of all of the coin flips.

Suppose that your expert makes predictions $q_i$ for each coin. Then, given indicator variables for the coins coming up heads $x_i$, the total surprisal is:

\begin{align} \sum_i\left[ -x_i\log q_i - (1-x_i)\log (1-q_i)\right]. \end{align}

The expected value of the surprisal given the true values $\{p_i\}$ is the cross-entropy: \begin{align} \sum_i \left[-p_i\log q_i -(1-p_i)\log (1-q_i)\right]. \end{align} It is nonnegative, and achieves its minimum value (the entropy of $\{p_i\}$) if and only if $p_i = q_i \forall i$.

If you subtract the entropy from the cross-entropy, you get the relative entropy (whose minimum value is zero). If you take $e^{-x}$ of that, you have a number in $[0, 1]$ as you wanted with a reasonable probabilistic interpretation.

Related Question