Why can the Kullback-Leibler information function be negative

optimizationprobabilityreal-analysisstatistics

Let $\{ f(x, \theta) ; \theta \in \Theta \}$ be a set of parametric density functions, $\Theta \subset \mathbb{R}$.

Let $N_\phi$ be a neighborhood of a point $\phi$ in $\Theta$. The Kullback-Leibler information function for discriminating between $f(X; \theta)$ and $f(X; \phi'), \phi' \in N_{\phi}$ is

$$I(\theta, N_{\phi}) = E_{\theta}\left[ \inf_{\phi' \in N_{\phi}} \log \frac{f(X;\theta)}{f(X; \phi')} \right]$$
I am told this quantity can even be negative (unlike the Kullback-Liebler divergence), what would be an example of this fact occuring?

Best Answer

I think your definition is: $$ I(\theta, N_{\phi}) = \int_{x : f(x,\theta)>0} f(x,\theta)\inf_{v\in N_{\phi}}\left[ \log\left(\frac{f(x,\theta)}{f(x,v)}\right)\right]dx$$ Now if you remove the infimum, we indeed get something that is nonnegative. So, with the inclusion of the infimum, we get something that can be even smaller (and hence can be negative).

Just take a family of exponential PDFs $f(x,\theta) = \theta e^{-\theta x}$ for $x\geq 0$. Take $\theta = \phi = 1$ and $N_{1} = (0, \infty)$. Then $$ \inf_{v \in N_{1}} \log\left(\frac{e^{-x}}{v e^{-v x}}\right) = \log(xe^{1-x}) = (1-x) + \log(x) $$ So we get a negative value for $I(1,N_1)$: \begin{align} I(1, N_1) &= \int_0^{\infty}e^{-x} [(1-x) + \log(x)]dx\\ &= \int_0^{\infty} e^{-x}\log(x)dx \\ &\approx -0.577216 \end{align}

On the other hand, if we remove the infimum we get $$ \int_{0}^{\infty} e^{-x} \log\left(\frac{e^{-x}}{ve^{-vx}}\right)dx = v-1-\log(v) \geq 0 \quad \forall v>0$$ Even though this is nonnegative for all $v\geq 0$, it requires the same $v$ to be used consistently in the integral. Meanwhile, the definition of $I(\theta, N_{\phi})$ allows different $v_x$ values to be used for each different $x$ that we meet as we integrate.

Best Answer

Related Solutions

[Math] minimum kullback leibler estimator

[Math] Kullback-Leibler divergence of two exponential distributions with different scale parameters

Related Question