Solved – Exponential family regularity conditions

assumptionsexponential-family

Given a generic pmf or pdf as $f(x;\theta)$, where $\theta$ is a vector-valued parameter, it can be reparimetrized into the exponential family version $f(x;\theta)=a(\theta)g(x)exp[{\sum_{i=1}^{k} b_i(\theta)R_i(x)}]$.

I know that it holds only under certain regularity conditions. Which are them?

Best Answer

Consider a measurable space $(\Omega, \mathcal F)$. Let $\mathcal P = \{P_\theta : \theta \in \Theta\}$ be a collection of measures on this measurable space.

Suppose that there exists a $\sigma$-finite measure $\nu$ that dominates each measure in $\mathcal P$. This means that the Radon-Nikodym derivative $\frac{dP_\theta}{d\nu}$ is defined.

$\mathcal P$ is an exponential family if and only if

$$\frac{dP_\theta}{d\nu}(\omega) := f_\theta(w) = h(\omega) \exp (\eta(\theta)^T T(\omega) - \xi (\theta))$$

where $T$ is a random $p$-vector for fixed $p$, $\eta : \Theta \rightarrow \mathbb R^p$, $h$ is a non-negative Borel function on $(\Omega, \mathcal F)$, and $$\xi(\theta)=\log \left( \int_\Omega \exp (\eta(\theta)^T T(\omega))h(\omega) d\nu(\omega)\right).$$

If you can put a pdf into this form (or an equivalent form like what you wrote) then it belongs to the exponential family.

For example, if $f_\theta(k) = {n \choose k} \theta^k (1-\theta)^{n-k}$ we can write this as $f_\theta(k) = {n \choose k}\exp(k \ \textrm{logit}(\theta) + n \log(1-\theta))$ so this is an exponential family. Note how we are tacitly using the fact that the counting measure dominates the corresponding collection of measures.

We could do the same with the Gaussian distribution where now the Lebesgue measure is the dominating measure.

Now consider the shifted Laplace distribution $f_\theta(x) = \frac{1}{2\sigma}\exp \left(\frac{-|x-\mu|}{\sigma} \right)$ where $\theta = (\mu, \sigma)$. This isn't a rigorous proof but it turns out that this can't be written in the form I gave. That's because we can't separate $x$ from our parameters due to the $|x-\mu|$. If $\mu$ is a known constant (i.e. not a parameter) then this becomes an exponential family with $T(x) = |x - \mu|$.

Similarly, if the support of the distribution depends upon a parameter it rarely is an exponential family. For an example of this, consider $f_\theta(x) = \frac{1}{\theta} \mathbb I(0 < x < \theta )$, i.e. $X \sim \mathcal U(0, \theta)$. We can't separate $x$ from $\theta$ in the indicator function so it can't be written in the desired form. Again, this is not a proof. You can probably find a proof via google.

Related Question