Likelihood – Is Negative Log Likelihood Calculated in Log Space or Exponential Space?

likelihoodlogarithmmetricprobability

I have a question about calculating negative log likelihood in a machine learning model over a dataset which seems simple but I cannot find a solid answer/explanation online.

Is the NLL calculated as an average in log space?

$$
NLL = -\frac{1}{N} \sum_{i=0}^N \log p(y_i | x_i)
$$

or should it be done in exponential space?

$$
NLL = -\log \Big(\frac{1}{N} \sum_{i=0}^N p(y_i | x_i) \Big)
$$

Best Answer

Let's start from the beginning. The likelihood is defined as the joint probability of observing the data; I guess for your task, the probability of one observation has the symbol $p(y|x)$:

$$ L = \prod_{i=1}^N p(y_i | x_i) $$

whence the negative log-likelihood is $$ -\log(L) = -\sum_{i=1}^N \log\left( p(y_i | x_i)\right) $$

and the choice to work with $$\text{NLL}=-\frac{\log(L)}{N} = -\frac{1}{N}\sum_{i=1}^N \log\left( p(y_i | x_i)\right)$$

is an estimate of the cross-entropy of the model probability and the empirical probability in the data, which is the expected negative log probability according to the model averaged across the data.

Re-scaling by $\frac{1}{N}$ does not change the result of the optimization procedure, since multiplication by any positive scalar only changes the value of $L$ but not the location of the optima.

Related Question