Solved – kullback leibler divergence between two nested logistic regression models

generalized linear modelkullback-leiblerlogisticreferencesregression

I have two logistic nested models
$\log\dfrac{p_i}{1-p_i}=\beta_{0}+\beta_1 x_i$
and
$\log\dfrac{p_i}{1-p_i}=\beta_{0}$

How can I construct the kullback leibler divergence between two nested logistic regression models?
Can I use the estimates of $\beta_0$ and $\beta_1$ to calculate KL divergence between these two logistic models?

Is there any reference to the kullback leibler divergence between two GLM models?

Best Answer

Logistic regression is a form of binomial regression, so this will reduce to the KL divergence between two binomials. Since the probabilities depend on the covariate $x_i$, this will give a value depending on $i$, maybe you then are interested in the sum or in the average. I will not address that aspect, just look at the value for one $i$. I will use the notation and intuition from Intuition on the Kullback-Leibler (KL) Divergence It is natural to think about the intercept-only model as the null hypothesis, so that model will play the role of $Q$ in $$ \DeclareMathOperator{\KL}{KL} \KL(P || Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \; dx $$ where we for the binomial case will replace the integral with a sum over the two values $x=0,1$. Write $$ p=p_i = \frac{e^{\beta_0+\beta_1 x_i}}{1+e^{\beta_0+\beta_1 x_i}},\quad q=q_i =\frac{e^{\gamma_0}}{1+e^{\gamma_0}} $$ where we use $q$ for the probability of the intercept-only model. I wrote the intercept differently in the two models because when estimating the two models on the same data we will not get the same intercept.

Then we only have to calculate the Kullback-Leibler divergence between the two binomial distributions, which is $$ KL(p || q) = (1-p)\log\frac{1-p}{1-q} + p \log\frac{p}{q} $$ As to your more general question "Is there any reference to the kullback leibler divergence between two GLM models? " GLM's are constructed from exponential families, like above logistic regression from binomial distributions, an exponential family. So like in this answer your question reduces to Kullback-Leibler divergence in exponential families. A reference to such results can be found in my answer at Quantify Difference/Distance between Lognormal distributions

Related Solutions

Solved – Kullback–Leibler divergence between two gamma distributions

The KL divergence is a difference of integrals of the form

$$\begin{aligned} I(a,b,c,d)&=\int_0^{\infty} \log\left(\frac{e^{-x/a}x^{b-1}}{a^b\Gamma(b)}\right) \frac{e^{-x/c}x^{d-1}}{c^d \Gamma(d)}\, \mathrm dx \\ &=-\frac{1}{a}\int_0^\infty \frac{x^d e^{-x/c}}{c^d\Gamma(d)}\, \mathrm dx - \log(a^b\Gamma(b))\int_0^\infty \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &\quad+ (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\, \mathrm dx\\ &=-\frac{cd}{a} - \log(a^b\Gamma(b)) + (b-1)\int_0^\infty \log(x) \frac{e^{-x/c}x^{d-1}}{c^d\Gamma(d)}\,\mathrm dx \end{aligned}$$

We just have to deal with the right hand integral, which is obtained by observing

$$\eqalign{ \frac{\partial}{\partial d}\Gamma(d) =& \frac{\partial}{\partial d}\int_0^{\infty}e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx\\ =& \frac{\partial}{\partial d} \int_0^\infty e^{-x/c} \frac{(x/c)^{d-1}}{c}\, \mathrm dx\\ =&\int_0^\infty e^{-x/c}\frac{x^{d-1}}{c^d} \log\frac{x}{c} \, \mathrm dx\\ =&\int_0^{\infty}\log(x)e^{-x/c}\frac{x^{d-1}}{c^d}\, \mathrm dx - \log(c)\Gamma(d). }$$

Whence

$$\frac{b-1}{\Gamma(d)}\int_0^{\infty} \log(x)e^{-x/c}(x/c)^{d-1}\, \mathrm dx = (b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

Plugging into the preceding yields

$$I(a,b,c,d)=\frac{-cd}{a} -\log(a^b\Gamma(b))+(b-1)\frac{\Gamma'(d)}{\Gamma(d)} + (b-1)\log(c).$$

The KL divergence between $\Gamma(c,d)$ and $\Gamma(a,b)$ equals $I(c,d,c,d) - I(a,b,c,d)$, which is straightforward to assemble.

Implementation Details

Gamma functions grow rapidly, so to avoid overflow don't compute Gamma and take its logarithm: instead use the log-Gamma function that will be found in any statistical computing platform (including Excel, for that matter).

The ratio $\Gamma^\prime(d)/\Gamma(d)$ is the logarithmic derivative of $\Gamma,$ generally called $\psi,$ the digamma function. If it's not available to you, there are relatively simple ways to approximate it, as described in the Wikipedia article.

Here, to illustrate, is a direct R implementation of the formula in terms of $I$. This does not exploit an opportunity to simplify the result algebraically, which would make it a little more efficient (by eliminating a redundant calculation of $\psi$).

#
# `b` and `d` are Gamma shape parameters and
# `a` and `c` are scale parameters.
# (All, therefore, must be positive.)
#
KL.gamma <- function(a,b,c,d) {
  i <- function(a,b,c,d)
    - c * d / a - b * log(a) - lgamma(b) + (b-1)*(psigamma(d) + log(c))
  i(c,d,c,d) - i(a,b,c,d)
}
print(KL.gamma(1/114186.3, 202, 1/119237.3, 195), digits=12)

Solved – Smallest Kullback-Leibler divergence

If you express the Kullback–Leibler divergence when $p_2$ is a normal pdf on $\mathbb R^d$, \begin{align} D&(p_1||p_2) =\int_{\mathbb R^d} p_1 \log p_1 \text{d}\lambda - \int_{\mathbb R^d} p_1 \log p_2 \text{d}\lambda\\ &= \int_{\mathbb R^d} p_1 \log p_1 \text{d}\lambda - \dfrac{1}{2} \int_{\mathbb R^d} p_1 \left\{-(x-\mu)^T \Sigma^{-1} (x-\mu) - \log |\Sigma| -d \log 2\pi \right\} \text{d}\lambda \\ &= \int_{\mathbb R^d} p_1 \log p_1 \text{d}\lambda + \dfrac{1}{2} \left\{ \log |\Sigma| + d \log 2\pi + \mathbb{E}_1 \left[ (x-\mu)^T \Sigma^{-1} (x-\mu) \right] \right\} \end{align} Now $$ \mathbb{E}_1 \left[ (x-\mu)^T \Sigma^{-1} (x-\mu) \right]= \mathbb{E}_1 \left[ (x-\mathbb{E}_1[x] )^T \Sigma^{-1} (x-\mathbb{E}_1[x]) \right]$$ $$\qquad\qquad\qquad + (\mathbb{E}_1[x]-\mu)^T \Sigma^{-1} (\mathbb{E}_1[x]-\mu) $$ so the minimum in $\mu$ is indeed reached for $\mu=\mathbb{E}_1[x]$.

Minimising $$ \log |\Sigma| + \mathbb{E}_1 \left[ (x-\mathbb{E}_1[x] )^T \Sigma^{-1} (x-\mathbb{E}_1[x]) \right] = $$ $$ \log |\Sigma| + \mathbb{E}_1 \left[ \text{trace} \left\{ (x-\mathbb{E}_1[x] )^T \Sigma^{-1} (x-\mathbb{E}_1[x]) \right\}\right] = \qquad\qquad\qquad $$ $$ \log |\Sigma| + \mathbb{E}_1 \left[ \text{trace} \left\{ \Sigma^{-1} (x-\mathbb{E}_1[x]) (x-\mathbb{E}_1[x] )^T \right\}\right] = $$ $$ \log |\Sigma| + \text{trace} \left\{ \Sigma^{-1} \mathbb{E}_1 \left[ (x-\mathbb{E}_1[x]) (x-\mathbb{E}_1[x] )^T \right] \right\}= $$ $$ \qquad\qquad \log |\Sigma| + \text{trace} \left\{ \Sigma^{-1} \Sigma_1 \right\} $$ leads to a minimum in $\Sigma$ for $\Sigma=\Sigma_1$.

Best Answer

Related Solutions

Solved – Kullback–Leibler divergence between two gamma distributions

Implementation Details

Solved – Smallest Kullback-Leibler divergence

Related Question