Solved – Role of base measure in exponential family

exponential-family

An exponential family distribution $p$ in the canonical form can be written as

$p(x|\theta) = h(x)\exp(\theta^\top T(x) – A(\theta))$

where $A(\theta)$ is the log partition function, $T(x)$ is the sufficient statistics, and $h(x)$ is the base measure (according to this Wikipedia page). For simplicity, let us consider a one dimensional $x$.

What is the restriction of the form of $h(x)$ ? My intuition tells me that $h(x)$ cannot be arbitrary because otherwise we can set $T(x)=0$, and leave all the "work" to $h(x)$.

For example, I understand that a Student's t is not in the exponential family. Let $t(x)$ be the density of a t distribution. If we set $T(x)=0$ and $h(x) = t(x)$, then

$A(\theta) = \log \int h(x)\exp(\theta^\top 0) \,dx = \log 1 = 0$,

and $p(x|\theta) = h(x) = t(x)$ implying that the t is in the exponential family.

What did I miss here ?

Best Answer

Since $p(x|\theta)$ must integrate to 1, $h(x)$ must be non-negative, but that's the only restriction (according to page 111 in this book).

However, I think the question highlights a common confusion (at least one that I've had before). There isn't just one exponential family of distributions. Rather, there are many such exponential families as mentioned in the Exponential family Wikipedia article:

exponential families are in a sense very natural sets of distributions to consider.

The choice of the functions $h$ and $T$ specify the exponential family (i.e. model) and the parameter vector $\theta$ corresponds to a particular member (i.e. distribution) in that family.

Indeed, if you choose a some fixed degrees of freedom for the t-distribution (let's say $\nu = 3$), you could as you say let $T(x) = 0$ and $h(x) = t(x|\nu=3)$ which, following the formula on from the Student's t-distribution Wikipedia article, should give $$ h(x) = \frac{1} {\sqrt{3\pi}\,\Gamma(\frac{3}{2})} \left(1+\frac{x^2}{3} \right)^{-2}\!.$$

However, this doesn't give you the family of t-distributions, i.e. the set of functions $\{t(\cdot|\nu) : \nu > 0\}$. With this base distribution, you could construct a more interesting exponential family of distributions by using more interesting sufficient statistics $T(x)$, but you would not be able to design $T$ such that parameters $\theta$ would correspond to the $\nu$ parameter of the t-distribution.

Related Question