Solved – kullback leibler divergence between two nested logistic regression models

generalized linear modelkullback-leiblerlogisticreferencesregression

I have two logistic nested models
$\log\dfrac{p_i}{1-p_i}=\beta_{0}+\beta_1 x_i$
and
$\log\dfrac{p_i}{1-p_i}=\beta_{0}$

How can I construct the kullback leibler divergence between two nested logistic regression models?
Can I use the estimates of $\beta_0$ and $\beta_1$ to calculate KL divergence between these two logistic models?

Is there any reference to the kullback leibler divergence between two GLM models?

Best Answer

Logistic regression is a form of binomial regression, so this will reduce to the KL divergence between two binomials. Since the probabilities depend on the covariate $x_i$, this will give a value depending on $i$, maybe you then are interested in the sum or in the average. I will not address that aspect, just look at the value for one $i$. I will use the notation and intuition from Intuition on the Kullback-Leibler (KL) Divergence It is natural to think about the intercept-only model as the null hypothesis, so that model will play the role of $Q$ in $$ \DeclareMathOperator{\KL}{KL} \KL(P || Q) = \int_{-\infty}^\infty p(x) \log \frac{p(x)}{q(x)} \; dx $$ where we for the binomial case will replace the integral with a sum over the two values $x=0,1$. Write $$ p=p_i = \frac{e^{\beta_0+\beta_1 x_i}}{1+e^{\beta_0+\beta_1 x_i}},\quad q=q_i =\frac{e^{\gamma_0}}{1+e^{\gamma_0}} $$ where we use $q$ for the probability of the intercept-only model. I wrote the intercept differently in the two models because when estimating the two models on the same data we will not get the same intercept.

Then we only have to calculate the Kullback-Leibler divergence between the two binomial distributions, which is $$ KL(p || q) = (1-p)\log\frac{1-p}{1-q} + p \log\frac{p}{q} $$ As to your more general question "Is there any reference to the kullback leibler divergence between two GLM models? " GLM's are constructed from exponential families, like above logistic regression from binomial distributions, an exponential family. So like in this answer your question reduces to Kullback-Leibler divergence in exponential families. A reference to such results can be found in my answer at Quantify Difference/Distance between Lognormal distributions