Minimal sufficient statistic for normal bivariate is complete

statistical-inferencestatistics

Let $\mathbf{Z}_1, \mathbf{Z}_2, \ldots, \mathbf{Z}_n$ be iid random sample of size $n$ where $\mathbf{Z}_i = (X_i,Y_i)^T$ is a normal bivariate distribution $\mathcal{N}_2(\mathbf{0},\Sigma)$ such that

$$ \Sigma = \begin{pmatrix}1&\rho\\ \rho &1\end{pmatrix} \qquad \text{and} \qquad \mathbf{0}=(0,0)^T$$

I want to find a minimal sufficient statistic for $\rho$. Here, I denote $\underline{Z}=(\mathbf{Z}_1,\mathbf{Z}_2,\ldots, \mathbf{Z}_n)$.

By the Theorem 5.3 in this e-book, I compute the ratio of the likelihood function for two samples $\underline{Z}$ and $\underline{W}$. Since $f_\mathbf{Z}(\mathbf{z}\mid\rho)=(2\pi)^{-1}|\Sigma|^{-1/2}\exp\left(-\frac{1}{2}[\mathbf{z}^T\Sigma^{-1}\mathbf{z}]\right)$, then we have
\begin{align*}
f_\underline{Z}(\underline{z}\mid\rho) &= \prod_{i=1}^n f_\mathbf{Z}(\mathbf{z}_i\mid\rho)=\prod_{i=1}^n (2\pi)^{-1}|\Sigma|^{-1/2}\exp\left(-\frac{\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i}{2}\right)\\
&=(2\pi)^{-n}|\Sigma|^{-n/2}\exp\left(-\frac{1}{2}\sum_{i=1}^n\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i\right)
\end{align*}

Thus,
\begin{align*}
\frac{f_\underline{Z}(\underline{z}\mid\rho)}{f_\underline{Z}(\underline{w}\mid\rho)}&=\frac{(2\pi)^{-n}|\Sigma|^{-n/2}\exp\left(-\frac{1}{2}\sum_{i=1}^n\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i\right)}{(2\pi)^{-n}|\Sigma|^{-n/2}\exp\left(-\frac{1}{2}\sum_{i=1}^n\mathbf{w}_i^T\Sigma^{-1}\mathbf{w}_i\right)}\\
&=\exp\left(-\frac{1}{2}\left(\sum_{i=1}^n\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i- \sum_{i=1}^n\mathbf{w}_i^T\Sigma^{-1}\mathbf{w}_i\right)\right)
\end{align*}

The ratio turns to be a constant for $\rho$ if and only if \begin{equation}
\sum_{i=1}^n\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i- \sum_{i=1}^n\mathbf{w}_i^T\Sigma^{-1}\mathbf{w}_i=0 \quad \cdots \quad(\dagger)
\end{equation}

Therefore,
\begin{align*}
\sum_{i=1}^n\mathbf{z}_i^T\Sigma^{-1}\mathbf{z}_i&=\frac{1}{1-\rho^2}\sum_{i=1}^n \left\{x_i^2-2\rho x_iy_i + y_i^2\right\}\\
&= \frac{1}{1-\rho^2}\left(\sum_{i=1}^n x_i^2-2\rho \sum_{i=1}^n x_iy_i + \sum_{i=1}^ny_i^2\right)
\end{align*}

If we denote $S_X=\sum_{i=1}^n X_i^2$, $S_Y=\sum_{i=1}^n Y_i^2$ and $S_{XY}=\sum_{i=1}^n X_iY_i$

Here, I define $T(\underline{z})=(S_X, S_Y, S_{XY})$. Therefore, if $T(\underline{Z}) = T(\underline{W})$, then $(\dagger)$ satifies. This proves that $T$ is a minimal sufficient statistic for $\rho$.

The second task is checking if $T$ is a complete statistic… but I can't find a way to prove if is or not complete. The complete statistic definition that I have just take any function $g$ and doesn't take in consideration the "measurable function" as in the definition of complete statistic from Wikipedia.

Best Answer

The joint pdf of $(X_1,Y_1),(X_2,Y_2),\ldots,(X_n,Y_n)$ is

$$f_{\rho}(\boldsymbol x,\boldsymbol y)=\frac1{\left(2\pi\sqrt{1-\rho^2}\right)^n}\exp\left\{-\frac1{1-\rho^2}\sum_{i=1}^n (x_i^2+y_i^2)+\frac{2\rho}{1-\rho^2}\sum_{i=1}^n x_iy_i\right\}\,,\quad\rho\in (-1,1)$$

This shows that a minimal sufficient statistic for $\rho$ is $$T=T(\boldsymbol X,\boldsymbol Y)=\left(\sum_{i=1}^n (X_i^2+Y_i^2),\sum_{i=1}^n X_iY_i\right)$$

Consider the function $$g(T)=(1,0)^\top T=\sum_{i=1}^n (X_i^2+Y_i^2)$$

Then $$E_{\rho}\left[g(T)-2n\right]=0\quad,\forall\,\rho\in (-1,1)$$

But of course $g(T)-2n\ne 0$ almost surely, which shows that $T$ is not complete.

The standard bivariate normal distribution belongs to a curved exponential family, where it is often the case that a complete statistic does not exist.

Related Question