Solved – Performance metric for algorithm predicting probability of low probability events

logisticmachine learning

I am writing a program to predict the click through rates of online ads. Two important notes about this problem:

  • click through rates are very small (like 0.1%)

  • click through rates depend on several parameters (like size of the ad, country in which ad is shown, whether the user has seen this ad earlier etc)

As I employ statistical/machine learning techniques, I would like to measure the performance of my techniques on historical data. I cannot employ metrics like precision or accuracy since all I predict is probability rather than predicting occurrence or non-occurrence of event.

In such a case, how should I measure the performance of my algorithm?

Best Answer

Mean squared error as suggested by Lakret will certainly work, however, I'd like to propose a method which captures the uncertainty of the clickrates of the adds (which are not known, exactly, but only estimated from historic data).

Let's say we have an add in our validation set with 10000 showns and 10 clicks, i.e. the maximum likelihood estimate for the clickrate $p$ is $0.001$. Furthermore we predicted a clickrate of $\hat{p}$ for this add.

Now instead of comparing the predicted $\hat{p}$ with p, we check whether $\hat{p}$ is in the confidence interval of $p$. Using the Beta-Distribution aka the Bayesian approach to calculate the confidence interval (called credible intervals then), we get using R

alpha <- 0.05
qbeta(c(alpha/2,1-alpha/2),10+1,10000-10+1)
# which results in [1] 0.000549185 0.001838080

For other methods to calculate binomial confidence intervals see e.g. the R-package confint.

Now, the error for the prediction of a single add is ...

  • 0, if $\hat{p}$ is in the confidence interval of p
  • 1, else

Starting from here, one can calculate binomial metrics like precision OR just the average error across multiple clickrate predictions. In a more sophisticated approach one could calculate the error as the distance to the nearest confidence interval bound (if outside the confidence interval) to make the error less discrete.

Related Question