Solved – Conditional distribution for Exponential family

conditional probabilityexponential-familymachine learningpredictive-models

We have a random variable $X$ that belongs to the exponential family with p.d.f.

$$ P_X(x|\boldsymbol \theta) = h(x) \exp\left(\eta({\boldsymbol \theta}) . T(x) – A({\boldsymbol \theta}) \right) $$

where ${\boldsymbol \theta} = \left(\theta_1, \theta_2, \cdots, \theta_s \right )^T$ is the parameter vector and $\mathbf{T}(x)= \left(T_1(x), T_2(x), \cdots,T_s(x) \right)^T$ is the joint sufficient statistic and $A({\boldsymbol \theta}) = \log \int_x h(x)\exp( \eta(\boldsymbol \theta).T(x))dx$

(The notation is following the Wikipedia page on the exponential family of distributions)

Let the data be given labels such that the joint distribution is now associated with $(x, y) \in \mathcal{X}\times\mathcal{Y}$.

EDIT
The sufficient statistics for this joint distribution is given by $\mathbf{T}(x, y)$

I am unable to derive the following expression for the exponential form of the conditional distribution of labels given data (ignoring the reference measure $h(x)$)

$$ P(y | x; \theta) = \exp\left(\eta({\boldsymbol \theta}) . T(x,y) – A({\boldsymbol \theta | x}) \right) $$

with $A({\boldsymbol \theta|x}) = \log \int_{\mathcal{Y}} \exp( \eta(\boldsymbol\theta).T(x,y))dy$

This expression is used in a paper on missing variables that I am trying to implement. I have tried writing out the conditional in terms of the joint probability but did not get any clean decomposition of terms.

Is there any standard proof or text that derives the expression for conditional probability of an exponential family distribution? Any hints or references would be great. Thanks.

Best Answer

I have answered my own question. It turned out to be a rather obvious application of Bayes Rule only after making a somewhat arbitrary assumption. My question was not very clear, mostly due to my own tenuous understanding at that time.

However, this result is used quite a lot in machine learning literature involving integrating out missing variables. I am including the proof in case others find it helpful when seeing the result.

$$ P(x, y|\boldsymbol \theta) = h(x) \exp\left(\eta({\boldsymbol \theta}) . T(x, y) - A({\boldsymbol \theta}) \right) $$

By Bayes Rule,

$$ P(y|x, \theta) = \frac{ P(x|y, \theta)}{ \int_{y^{'}} P(x|{y^{'}}, \theta) P(y^{'}|\theta)d{y^{'}}} = \frac{ P(x, y| \theta)}{ \int_{y^{'}} P(x,{y^{'}}| \theta) d{y^{'}}} = \frac{h(x) \exp (\eta (\theta) . T(x,y) - A(\theta))}{ \int_{y^{'}} h(x) \exp (\eta (\theta) . T(x,y^{'}) - A(\theta))dy{'}} $$

Assumed the $h(x)$ base reference measure to be a function only of $x$ so that we can cancel it from numerator and denominator in the last step above, getting

$$ \frac{\exp ( \eta(\theta).T(x,y))}{\int_{y^{'}} \exp ( \eta(\theta).T(x,y^{'}))dy^{'}} = \exp ( \eta(\theta).T(x,y) - \log(\int_{y^{'}} \exp ( \eta(\theta).T(x,y^{'}))dy^{'}) ) = \exp ( \eta(\theta).T(x,y) - A(\theta|x) ) $$

Related Question