Exponential Family – Understanding Why All Distributions Aren’t Included

distributionsexponential-familymathematical-statistics

I am reading the book:

Bishop, Pattern Recognition and Machine Learning (2006)

which defines the exponential family as distributions of the form (Eq. 2.194):
$$
p(\mathbf x|\boldsymbol \eta) = h(\mathbf x) g(\boldsymbol \eta) \exp \{\boldsymbol \eta^\mathrm T \mathbf u(\mathbf x)\}
$$

But I see no restrictions placed on $h(\mathbf x)$ or $\mathbf u(\mathbf x)$. Doesn't this mean that any distribution can be put in this form, by appropriate choice of $h(\mathbf x)$ and $\mathbf u(\mathbf x)$ (in fact only one of them has to be chosen properly!)? So how come the exponential family does not include all probability distributions? What am I missing?

Finally, a more particular question that I am interested in is this: Is the Bernoulli distribution in the exponential family? Wikipedia claims it is, but since I am obviously confused about something here, I would like to see why.

Best Answer

First, note there is a terminology problem in your title: the exponential family seems to imply one exponential family. You should say a exponential family, there are many exponential families.

Well, one consequence of your definition: $$p(\mathbf x|\boldsymbol \eta) = h(\mathbf x) g(\boldsymbol \eta) \exp \{\boldsymbol \eta^\mathrm T \mathbf u(\mathbf x)\}$$ is that the support of the distribution family indexed by parameter $\eta$ do not depend on $\eta$. (The support of a probability distribution is the (closure of) the least set with probability one, or in other words, where the distribution lives.) So it is enough to give a counterexample of a distribution family with support depending on the parameter, the most easy example is the following family of uniform distributions: $ \text{U}(0, \eta), \quad \eta > 0$. (the other answer by @Chaconne gives a more sophisticated counterexample).

Another, unrelated reason that not all distributions are exponential family, is that an exponential family distribution always have an existing moment generating function. Not all distributions have a mgf.