[Math] Complete Sufficient Statistic for double parameter exponential

exponential distributionstatistical-inferencestatisticssufficient-statistics

I am trying to show that $(X_{(1)}, \sum_{i=1}^{n}(X_i-X_{(1)})$ are joint complete sufficient for $(a,b)$ where $\{X_i\}_{i}^{n}\sim exp(a,b)$.

I know the joint pdf is
$$\prod_{i=1}^{n}\frac{1}{b}e^{(X_i-a)}\chi_{>a}(x_i)=\frac{1}{b}^{n}e^{\sum_{i=1}^{n}(X_i-a)}\chi_{>a}(x_{(1)})$$

By adding a zero in the form of $nX_{(1)}-nX_{(1)}$

the above can be rearranged to:

$$e^{-\sum_{i=1}^{n}(X_i-X_{(1)})+nX_{(1)}+na-nlog(b)}\chi_{>a}(x_{(1)})$$

I know since $T(X)=((X_{(1)}, \sum_{i=1}^{n}(X_i-X_{(1)}))$ then it is a complete sufficient statistic but I am having trouble in getting rid of $\chi_{>a}(x_{(1)})$ to get it into proper exponential family form i.e $h(x)=\chi_{>a}(x_{(1)})$ only dependent on the data. Any help?

Best Answer

Joint pdf of $X_1,\ldots,X_n$ where $X_i\stackrel{\text{i.i.d}}\sim \mathsf{Exp}(a,b)$ is

\begin{align} f_{(a,b)}(x_1,\ldots,x_n)&=\frac1{b^n}e^{-\sum_{i=1}^n (x_i-a)/b}1_{x_{(1)}>a} \\&=\frac{e^{na/b}}{b^n}e^{-\sum_{i=1}^n x_i/b}1_{x_{(1)}>a}\quad,\,(a,b)\in \mathbb R\times \mathbb R^+ \end{align}

By Factorization theorem, $(X_{(1)},\sum\limits_{i=1}^n X_i)$ or equivalently $(X_{(1)},\sum\limits_{i=1}^n (X_i-X_{(1)}))=(T_1,T_2)$ (say) is sufficient for $(a,b)$. In fact it can be shown as done here that $T_1\sim \mathsf{Exp}\left(a,\frac bn\right)$ and $\frac{2}{b}T_2\sim \chi^2_{2n-2}$, with $T_1$ independent of $T_2$.

To show $(T_1,T_2)$ is complete, start from $$E_{(a,b)}[g(T_1,T_2)]=0\quad,\,\forall\,(a,b)$$ for some measurable function $g$.

That is, $$\iint g(x,y)f_{T_1}(x)f_{T_2}(y)\,dx\,dy=0\quad,\,\forall\,(a,b)$$

For fixed $b$ and by Fubini's theorem, this is equivalent to

$$\int \underbrace{\int g(x,y)f_{T_2}(y)\,dy}_{E_b[g(x,T_2)]}\, f_{T_1}(x)\,dx=0\quad,\,\forall\,a$$

Or, $$\int_a^\infty E_b[g(x,T_2)]e^{-nx/b}\,dx=0\quad,\,\forall\,a \tag{1}$$

Since $b$ is known in $(1)$, comparing with this setup where $T_1=X_{(1)}$ is complete for $a$, we get

$$E_b[g(x,T_2)]=0\quad,\text{a.e.}$$

As the pdf of $T_2$ is a member of exponential family, $E_b[g(x,T_2)]$ is a continuous function of $b$ for any fixed $x$. So for almost all $x$, we have $$E_b[g(x,T_2)]=0\quad,\,\forall\,b \tag{2}$$

Moreover since $T_2$ is a complete statistic for $b$ (there is no $a$ here), equation $(2)$ implies $$g(x,y)=0\quad,\text{a.e.}$$

Reference:

For details regarding this proof, see Lehmann/Casella's Theory of Point Estimation (2nd ed, page 43).


Edit in response to OP:

We have $E_b[g(x,T_2)]=\int g(x,y)f_{T_2}(y)\,dy$ where the pdf $f_{T_2}$ of $T_2$ depends on $b$. So for fixed $x$, $E_b[g(x,T_2)]$ is a function of $b$ alone; that this function is continuous can be guessed from the form of $f_{T_2}(\cdot)$, member of a regular exponential family.

From the completeness of $T_1$ for fixed $b$ (here $b$ is arbitrary), note that $E_b[g(x,T_2)]=0$ holds almost everywhere (as a function of $b$) and for almost all $x$ (i.e. $\lambda$-almost everywhere $x\in X$ where $\lambda$ is Lebesgue measure and $X$ is the set of $x$ values where $X$ may depend on $b$). And due to continuity, $E_b[g(x,T_2)]=0$ (for almost all $x$) holds not only almost everywhere but for all $b$ as a consequence of this result.

Related Question