Is the Generalized Dirichlet distribution an exponential family? If so, what is its log-normalizer, sufficient statistics, and carrier measure?
Exponential-Family Analysis – Is the Generalized Dirichlet Distribution Part of Exponential Family?
exponential-familygeneralized-dirichlet-distribution
Related Solutions
Exponential Family – Understanding Observed vs. Expected Sufficient Statistics in Exponential Family
This is a usual assertion about the exponential family, but in my opinion, most of the times it is stated in a way that may confuse the less experienced reader. Because, taken at face value, it could be interpreted as saying "if our random variable follows a distribution in the exponential family, then if we take a sample and insert it into the sufficient statistic, we will obtain the true expected value of the statistic". If only it were so... More over it does not take into account the size of the sample, which may cause further confusion.
The exponential density function is
$$f_X(x) = h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)} \tag{1}$$
where $T(x)$ is the sufficient statistic.
Since this is a density, it has to integrate to unity, so ($S_x$ is the support of $X$)
$$\int_{S_x} h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}dx =1 \tag{2}$$
Eq. $(2)$ holds for all $\theta$ so we can differentiate both sides with respect to it:
$$\frac {\partial}{\partial \theta} \int_{S_x} h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}dx =\frac {\partial (1)}{\partial \theta} =0 \tag{3}$$
Interchanging the order of differentiation and integration, we obtain
$$\int_{S_x} \frac {\partial}{\partial \theta} \left(h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}\right)dx =0 \tag{4}$$
Carrying out the differentiation we have
$$\frac {\partial}{\partial \theta} \left(h(x)e^{\eta(\theta) T(x)}e^{-A(\theta)}\right) = f_X(x)\big[T(x)\eta'(\theta) - A'(\theta)\big] \tag{5}$$
Inserting $(5)$ into $(4)$ we get
$$\int_{S_x} f_X(x)\big[T(x)\eta'(\theta) - A'(\theta)\big]dx =0 $$
$$\Rightarrow \eta'(\theta)E[T(X)] - A'(\theta) = 0 \Rightarrow E[T(X)] = \frac {A'(\theta)}{\eta'(\theta)} \tag{6}$$
Now we ask: the left-hand-side of $(6)$ is a real number. So, the right-hand-side must also be a real number, and not a function. Therefore it must be evaluated at a specific $\theta$, and it should be the "true" $\theta$, otherwise in the left-hand-side we would not have the true expected value of $T(X)$. To emphasize this we denote the true value by $\theta_0$, and we re-write $(6)$ as
$$E_{\theta_0}[T(X)] = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\theta_0} \tag{6a}$$
We turn now to maximum likelihood estimation. The log-likelihood for a sample of size $n$ is
$$L(\theta \mid \mathbf x) = \sum_{i=1}^n\ln h(x_i) +\eta(\theta)\sum_{i=1}^nT(x_i) -nA(\theta)$$
Setting its derivative with respect to $\theta$ equal to $0$ we obtain the MLE
$$\hat \theta(x) : \frac 1n\sum_{i=1}^nT(x_i) = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\hat \theta(x)} \tag {7}$$
Compare $(7)$ with $(6a)$. The right-hand sides are not equal, since we cannot argue that the MLE estimator hit upon the true value. So neither are the left hand-sides. But remember that eq. $2$ holds for all $ \theta$ and so for $\hat \theta$ also. So the steps in eq. $3,4,5,6$ can be taken with respect to $\hat \theta$ and so we can write eq. $6a$ for $\hat \theta$:
$$E_{\hat\theta(x)}[T(X)] = \frac {A'(\theta)}{\eta'(\theta)}\Big |_{\theta =\hat\theta(x)} \tag{6b}$$
which, combined with $(7)$, leads us to the valid relation
$$ E_{\hat\theta(x)}[T(X)] = \frac 1n\sum_{i=1}^nT(x_i)$$
which is what the assertion under examination really says: the expected value of the sufficient statistic under the MLE for the unknown parameters (in other words, the value of the first raw moment of the distribution that we will obtain if we use $\hat \theta(x)$ in place of $\theta$), equals (and it is not just approximated by) the average of the sufficient statistic as calculated from the sample $\mathbf x$.
Moreover, only if the sample size is $n=1$ then we could accurately say, "the expected value of the sufficient statistic under the MLE equals the sufficient statistic".
I have answered my own question. It turned out to be a rather obvious application of Bayes Rule only after making a somewhat arbitrary assumption. My question was not very clear, mostly due to my own tenuous understanding at that time.
However, this result is used quite a lot in machine learning literature involving integrating out missing variables. I am including the proof in case others find it helpful when seeing the result.
$$ P(x, y|\boldsymbol \theta) = h(x) \exp\left(\eta({\boldsymbol \theta}) . T(x, y) - A({\boldsymbol \theta}) \right) $$
By Bayes Rule,
$$ P(y|x, \theta) = \frac{ P(x|y, \theta)}{ \int_{y^{'}} P(x|{y^{'}}, \theta) P(y^{'}|\theta)d{y^{'}}} = \frac{ P(x, y| \theta)}{ \int_{y^{'}} P(x,{y^{'}}| \theta) d{y^{'}}} = \frac{h(x) \exp (\eta (\theta) . T(x,y) - A(\theta))}{ \int_{y^{'}} h(x) \exp (\eta (\theta) . T(x,y^{'}) - A(\theta))dy{'}} $$
Assumed the $h(x)$ base reference measure to be a function only of $x$ so that we can cancel it from numerator and denominator in the last step above, getting
$$ \frac{\exp ( \eta(\theta).T(x,y))}{\int_{y^{'}} \exp ( \eta(\theta).T(x,y^{'}))dy^{'}} = \exp ( \eta(\theta).T(x,y) - \log(\int_{y^{'}} \exp ( \eta(\theta).T(x,y^{'}))dy^{'}) ) = \exp ( \eta(\theta).T(x,y) - A(\theta|x) ) $$
Best Answer
Yes, it is an exponential family.
It has sufficient statistics: $\{\log(x_i)\}_i$ and $\{\log(1 - \sum_{j \le i}x_j)\}_i$.
It has zero carrier measure.
Its log-normalizer is $\sum_i B(\alpha_i, \beta_i)$ where the natural parameters that correspond to the above sufficient statistics are $\{\alpha_i-1\}_i$ and $\{\gamma_i\}_i$ with $\gamma_j=\beta_j-\alpha_{j+1}-\beta_{j+1}$.
After some work, I was able to implement the distribution here.