Estimation – Unbiased Estimator (UMVUE) and Exponential Family Distribution

cramer-raoestimationexponential-familymathematical-statisticsself-study

I read the textbook in Cramer-Rao lower bound (CRLB). Here is a theroem

For some $\tau=\tau(\theta)$, there exists an unbiased estimator $\hat{\tau}$ of $\tau$ such that $Var(\hat{\tau})$ attains the CRLB if and only if the distribution belongs to an exponential family ($X\sim f(x; \theta)=\exp(a(x)b(\theta)+c(x)+d(\theta))$).

I know how to prove that "if the distribution belongs to an exponential family, then there exists an unbiased estimator $\hat{\tau}$ of $\tau$ such that $Var(\hat{\tau})$ attains the CRLB".

But how to prove the another direction?

For some $\tau=\tau(\theta)$, there exists an unbiased estimator $\hat{\tau}$ of $\tau$ such that $Var(\hat{\tau})$ attains the CRLB, then the distribution belongs to an exponential family.

Best Answer

It was shown by R. A. Wijsman in $\rm[I]$ the only-if condition holds true. But there is more to it.

Nevertheless, at the outset, let's have a sketch of how Wijsman approached the problem. Consider the sample space be $(\mathscr X, \boldsymbol{\mathscr A}, \mu)$ and the parameter space be $(\Theta, \boldsymbol{\mathscr B}, \lambda).$ He sets up the regularity conditions that $\Theta$ is an open interval, $p_ \theta(x)$ is $\boldsymbol{\mathscr A}$-measurable for each $\theta$ and continuously differentiable function of $\theta$ for every $x\in \mathscr X,$ and the other conditions being that $0 < \operatorname{Var}[\partial_\theta \ln p_\theta(X) ]<\infty$, differentiation under the integral sign being valid for all $\theta.$

The result is under the regularity conditions, equality in CRLB exists for all $\theta$ (if) only if the density of the corresponding family is for almost everywhere $\mu$ for all $\theta$ $$p_\theta(x) = c(\theta)h(x)\exp[q(\theta)t(x)]$$ where $c(\theta), ~h(x) >0; c(\theta), q(\theta)$ are continuously differentiable and $q(\theta)$ is strictly monotonic function of $\theta.$

The derivation is straightforward:

$\bullet$ Observe $p_\theta(x)$ and $\partial_\theta\ln p_\theta(x)$ are measurable functions on $(\mathscr{X\times \Theta}, ~\boldsymbol{\mathscr{A}}\times \boldsymbol{\mathscr B})$ since they are Carathéodory functions.

$\bullet$ Since the equality holds, there exists $c_1(\theta), ~c_2(\theta): \sum c_i^2 > 0 $ almost everywhere $\mu$ $$c_1(\theta)\partial_\theta \ln p_\theta(x) + c_2(\theta)[t(x)-m(\theta)] = 0.\tag 1\label 1$$

$\bullet$ $c_1 = 0 $ leads to the case of $m(\theta)$ being constant; so it cannot be $0.$ Rewriting $\eqref 1,$

$$\partial_\theta \ln p_\theta(x) = a(\theta)t(x) + b(\theta).\tag 2\label 2$$

$a(\theta)\ne 0$ as it would violate the regularity condition of $\operatorname{Var}[\partial_\theta \ln p_\theta(x) ]= 0.$

$\bullet$ Define $g(\theta, \theta_1) :=\int\partial_\theta \ln p_\theta ~p_{\theta_1}\mathrm d\mu. $ Choose $\theta_1,\theta_2: m(\theta_1)\ne m(\theta_2);$ it is easy to see from $\eqref 2$ that $a(\theta) = [g(\theta,\theta_1)-g(\theta,\theta_2)]/[m(\theta_1)-m(\theta_2)]. ~a(\theta)$ is thus $\boldsymbol{\mathscr B}$-measurable as $g(\theta, \theta_i)$ is $\boldsymbol{\mathscr B}$-measurable.

$\bullet$ Consider the $\mathscr X$-section of $x, ~N^x$ of $N:=\{(x,\theta):\mathscr{X\times \Theta}: \partial_\theta \ln p_\theta(x) \ne a(\theta)t(x) + b(\theta)\}.$ From $\eqref 1,$ almost every $\mathscr X$ section is of measure zero. Therefore for $\theta \notin N^x, ~\partial_\theta \ln p_\theta(x) = a(\theta)t(x) + b(\theta).$

$\bullet$ Arbitrarily choose $\theta_0$ and set $h(x):= p_{\theta_0}(x);$ it follows $$\ln p_\theta(x) = \ln h(x) + \int_{\theta_0}^\theta [a(\nu)t(x) + b(\nu)]~\mathrm d\nu.$$ Choose $x_1, x_2: t(x_1)\ne t(x_2).$ It follows $$\ln p_{\theta}(x_1) - \ln p_{\theta}(x_2) = \ln h(x_1) - \ln h(x_2) + [t(x_1)-t(x_2)]\underbrace{\int_{\theta_0}^\theta a(\nu)~\mathrm d\nu}_{=: q(\theta) }.\tag 3\label 3$$ $q(\theta)$ is finite and continuously differentiable as is evident from $\eqref 3;$ denote $c(\theta):= \exp\left[\int_{\theta_0}^\theta a(\nu)~\mathrm d\nu\right].$ The latter is also continuously differentiable.

$\bullet$ It follows from the definition of $g$ and the density function that $$g(\theta, \theta_1) = \mathrm d_\theta q(\theta)m(\theta_1)+ \mathrm d_\theta \ln c(\theta);$$ thus $g(\theta,\theta_1)$ is continuous in $\theta$ whence $a(\theta)$ is also continuous. Now $a(\theta) = \mathrm d_\theta q(\theta)\wedge a(\theta)\ne 0;$ so $q$ is strictly monotone as $a(\theta) > 0~\vee a(\theta) < 0.$

So, all seems to be good and well-placed. Except that, as written at the outset, there is more to say.

Joshi in $\rm [II]$ shows that a more general family of distributions follow the equality of CRLB; and imposing additional constraints lead to the exponential family of distributions.

The derivation is again not intricate but what is conspicuous is he tweaked the regularity conditions to deduce the general result: $\partial_\theta \ln p_\theta(x)$ is $(\boldsymbol{\mathscr A}\times \boldsymbol{\mathscr B})$-measurable on $M:= \{(x,\theta)\in \mathscr X\times \mathscr\Theta: \partial_\theta \ln p_\theta(x)~\textrm{exists}\}$ which is $(\boldsymbol{\mathscr A}\times \boldsymbol{\mathscr B})$-measurable, differentiation under the integral sign is valid for almost everywhere $\theta.$ The result is that provided the regularity conditions are satisfied, the equality is attained only if almost everywhere $\mu$ for all $\theta,$

$$p_\theta(x) = c(\theta)h(x)\exp[q(\theta)t(x)]\exp[S(\theta, x)],$$ where $h(x)> 0, ~c^\prime(\theta), ~q^\prime(\theta)$ are finite and $q^\prime(x) \ne 0$ and almost everywhere $\lambda, ~\partial_\theta S(\theta, x) =0.$

A brief sketch:

$\bullet$ From $\eqref 2,$ by Denjoy–Luzin–Saks theorem, there exists functions $q(\theta), ~f(\theta): q^\prime(\theta) = a(\theta), ~b(\theta) = f^\prime(\theta)$ almost everywhere $\lambda.$

$\bullet$ Again from $\eqref 2,$ amost everywhere $\lambda$ for each $\theta,$ set for an arbitrarily chosen $\theta_0$ $$\ln p_\theta(x) = q(\theta)t(x) + f(\theta) + \underbrace{T(\theta_0, x) + S(\theta, x)}_{=: T(\theta, x)};\tag 4\label 4$$ take $h(x):= \exp[T(\theta_0, x)], ~c(\theta):=\exp[f(\theta)].$ This means $c(\theta), ~h(x)$ both are strictly positive. Also, from $\eqref 2$ and $\eqref 4,~\partial_\theta S(\theta, x)= 0 $ almost everywhere $\lambda.$

$\bullet$ As $M_1:= \{(x,\theta)\in \mathscr X\times \mathscr\Theta: |\partial_\theta \ln p_\theta(x)|<\infty\}$ is $(\boldsymbol{\mathscr A}\times \boldsymbol{\mathscr B})$-measurable, $\partial_\theta \ln p_\theta(x)$ is finite for almost everywhere $\lambda$ whence $q^\prime(\theta), ~c^\prime(\theta)$ exist for almost everywhere $\lambda.$ Also, $q^\prime(\theta)\ne 0$ as $a(\theta)\ne 0$ for almost everywhere $\lambda.$

When the further condition of absolute continuity of $\ln p_\theta(x)$ is imposed for each $x$ almost everywhere $\lambda,$ then the family of distributions reduces to the exponential family. The proof only involves the property of absolute continuity.

What should be remarkable and the actual meat of Joshi's result is that merely imposing the above condition doesn't make the result coincide with that of Wijsman: there are differences in the regularity conditions. As he notes, $q^\prime(\theta), c^\prime(\theta)$ need not be continuous in $\theta.$

So, to answer the question under concern in black and white spectrum would not be fruitful. You should carefully determine what the assumptions are before hand.


References:

$\rm [I]$ On the Attainment of the Cramer-Rao Lower Bound, R. A. Wijsman, The Annals of Statistics, Vol. $1,$ No. $3$ (May, $1973$), pp. $538-542.$

$\rm [II]$ On the Attainment of the Cramér-Rao Lower Bound, V. M. Joshi, The Annals of Statistics, Vol. $4,$ No. $5$ (Feb, $1976$), pp. $998-1002.$