Solved – Converting (normalizing) very small likelihood values to probability

arithmeticclikelihoodnormalizationprobability

I am writing an algorithm where, given a model, I compute likelihoods for a list of datasets and then need to normalize (to probability) each one of the likelihood. So something like [0.00043, 0.00004, 0.00321] might be converted to may be like [0.2, 0.03, 0.77].

My problem is that the log likelihoods, I am working with, are quite small (for instance, in log space, the values are like -269647.432, -231444.981 etc). In my C++ code, when I try to add two of them (by taking their exponent) I get an answer of "Inf". I tried to add them in log-space (Summation/Subtraction of log), but again stumbled upon the same problem.

Can anybody share his/her expert opinion on this?

Best Answer

Subtract the maximum logarithm from all logs. Throw away all results that are so negative they will underflow the exponential. (Their likelihoods are, for all practical purposes, zero.)

Indeed, if you want a relative precision of $\epsilon$ (such as $\epsilon = 10^{-d}$ for $d$ digits of precision) and you have $n$ likelihoods, throw away any result less than the logarithm of $\epsilon/n$. Then proceed as usual to exponentiate the resulting values and divide each one by the sum of all the exponentials.

For those who like formulas, let the logarithms be $\lambda_1, \lambda_2, \ldots, \lambda_n$ with $\lambda_n = \max(\lambda_i)$. For logarithms to the base $b\gt 1$, define

$$\alpha_i = \cases{ b^{\lambda_i - \lambda_n}, \lambda_i - \lambda_n \ge \log(\epsilon)-\log(n) \\ 0\quad \text{otherwise}.}$$

The normalized likelihoods equal $\alpha_i / \sum_{j=1}^n \alpha_j$, $i = 1, 2, \ldots, n.$ This works because replacing all of the otherwise underflowing $\alpha_i$ by zero makes a total error of at most $(n-1)\epsilon/n\lt \epsilon$ whereas, because $\alpha_n=b^{\lambda_n-\lambda_n}=b^0=1$ and all $\alpha_i$ are non-negative, the denominator $A = \sum_j \alpha_j \ge 1$, whence the total relative error due to the zero-replacement rule is strictly smaller than $\left((n-1)\epsilon/n \right) / A \lt \epsilon$, as desired.

To avoid too much rounding error, compute the sum starting with the smallest values of the $\alpha_i$. This will be done automatically when the $\lambda_i$ are first sorted in increasing order. This is a consideration only for very large $n$.

BTW, this prescription assumed the base of the logs is greater than $1$. For bases $b$ less than $1$, first negate all the logs and proceed as if the base were equal to $1/b$.


Example

Let there be three values with logarithms (natural logs, say) equal to $-269647.432,$ $-231444.981,$ and $-231444.699.$ The last is the largest; subtracting it from each value gives $-38202.733,$ $-0.282,$ and $0.$

Suppose you would like precision comparable to IEEE doubles (about 16 decimal digits), so that $\epsilon=10^{-16}$ and $n=3$. (You can't actually achieve this precision, because $-0.282$ is given only to three significant figures, but that's ok: we're only throwing away values that are guaranteed not to affect the better of the precision you want and the precision you actually have.) Compute $\log(\epsilon/n)$ = $\log(10^{-16}) - \log(3)$ = $-37.93997.$ The first of the three differences, $-38202.733,$ is less than this, so throw it away, leaving just $-0.282$ and $0.$ Exponentiating them gives $\exp(-0.282) = 0.754$ and $\exp(0)=1$ (of course). The normalized values are--in order--$0$ for the one you threw away, $0.754 / (1 + 0.754) = 0.430$, and $1/(1+0.754)=0.570$.