[Math] Exponential family distribution and sufficient statistic.

convergence-divergenceestimation-theoryexponential distributionprobability theorystatistics

The exponential distribution family is defined by pdf of the form:

$$ f_x=(x;\theta) = c(\theta) g(x) exp \Big[\sum_{j=1}^l G_j(\theta) T_j(x)]$$

Where $\theta \in \Theta$ and $c(\theta)>0$ And $Q_j(\theta)$ are arbitrary functions of $\theta$, and $g(x)>0$ And t(x) are arbitrary functions of x.

Which seems very comlicated to an untrained eye and honestly, i dont think i understand it.

Which is ther eason why i reserched the mighty internet and found out a simplified form:

$$ f(x) = exp\Big[\frac{\theta(x)-b(\theta)}{a(\Phi)}\Big]+c(x,\Phi)$$

Which us much more user friendly for beginners.

However, there is an expention to the first exmponential family pdf definition, such that by applying the factorization theorem to the joint pfd $f_x($x$;\theta)$, one obtains the sufficient statistic:

$$ T= (\sum_{i=1}^n T_1(X_i,…,\sum_{i=1}^nT_l(x_i)))$$

T is a sufficient statistic for $Q_1(\theta),…,Q_l(\theta)$.

How can the sufficient statistic be obtained from the simplified version of the exponential famimy form?

How can variance and mean be calculated from the first definition of the exponential family form?

EXAMPLE:

Prove that Poisson distribution belongs to the exponential family.

1) enter image description here

How can E[X] and Var[X] be calculated here?

2)

$$ f(x)+ \frac{\lambda^xe^{-\lambda}}{x!}$$

and by taking the log and then the exponential on both sides one gets:

$$ f(x)+ exp\Big[x log(\lambda) – \lambda – log(x!)\Big]$$

Matching this expresion to the simplified form of the exponential family we get:

$\theta = log(\lambda)$

$\lambda = exp(\theta)$

$b(\theta)=e^{\theta}$

$a(\Phi)$, always 1 for distributions with one parameter

$c(x, \Phi)= -log(x!)$

Then one can easly get:

$$ E[X]= b`(\theta) = \lambda$$
$$ Var[X] = a(\Phi)b“(\theta)= \lambda$$

How can the sufficient statistics be determined from this simplified
form?

Best Answer

First question

Inspecting the definition of the exponential family $$ f_x(x;\theta) = c(\theta) g(x) e^{ \sum_{j=1}^l G_j(\theta) T_j(x) }, $$ one can say the following:

  1. $T$ is a sufficient statistic. Condition on $T$, the conditional distribution is $g(x)$ (up to a normalization constant), which is independent of the parameter $\theta$. This is the definition of sufficiency. In fact, for the exponential family it is independent of $T$.

  2. The term $e^{ \sum_{j=1}^l G_j(\theta) T_j(x) }$ determines the marginal distribution of $T$, via the choice of $G_j$'s.

  3. $c(\theta)$ is a normalization constant so the density integrates to $1$.

Second question

As $G_j$'s are arbitrary, subject to measurability requirements etc., there is no general formula for computing moments. For the Poisson distribution, the first moment is simply $$ e^{-\lambda} \sum_{k = 0}^{\infty} k \frac{\lambda^k}{k!} = ( e^{-\lambda} \sum_{k = 1}^{\infty} \frac{\lambda^{k-1} }{(k-1)!}) \cdot \lambda = \lambda. $$

The second moment is similar.

Related Question