We know that if an estimator is an unbiased estimator of theta and if its variance tends to 0 as n tends to infinity then it is a consistent estimator for theta. But this is a sufficient and not a necessary condition. I am looking for an example of an estimator which is consistent but whose variance does not tend to 0 as n tends to infinity. Any suggestions?
Solved – Counterexample for the sufficient condition required for consistency
consistencymathematical-statisticsprobabilityunbiased-estimatorvariance
Related Solutions
Solved – Understanding the relationship between a ‘sufficient statistic’ and an ‘unbiased estimator’
Sufficiency is an essential if rare property: if $S(X)$ is sufficient for model $f_\theta$, considering $S(X)$ for estimation of $\theta$ is sufficient, meaning you need nothing else from $X$. In other words, the collection of estimators based on $S(X)$ cannot be improved by other estimators [in the sense of any convex loss function].
Unfortunately, sufficiency only exists in exponential families. That is, if the density is of the form$$f_\theta(x)=\exp\{ \sigma(x)\cdot \Phi(\theta) - \psi(\theta)\}h(x)$$ then for a sample $(x_1,\ldots,x_n)$, $$S_n(x_1,\ldots,x_n)=\sum_{i=1}^n \sigma(x_i)$$is sufficient. Outside exponential families there is no sufficient statistic $S_n(X_1,\ldots,X_n)\in\mathbb{R}^d$ with fixed dimension $d$ [fixed in $n$] (this is the Pitman-Koopman-Darmois theorem).
On the other hand, unbiasedness is a local property that does not lead to optimality per se. Optimality does depend on the loss function used for comparing estimators and restricting the class of estimators to unbiased estimators can only increase the optimal error.
For instance, in the estimation of a multivariate normal mean vector, when $x\sim\mathcal{N}_d(\mu,I_d)$, under squared error loss, admissible estimators, i.e. estimators that cannot be beaten uniformly over $\mathbb{R}^d$, are all biased. In other words, an unbiased estimator of $\mu$ $\delta_0$ is inadmissible: there always exists a biased estimator $\delta_1$ (and in fact an infinite number of biased estimators) such that$$\mathbb{E}[||\mu-\delta_1(X)||^2]\stackrel{\ne}{\le}\mathbb{E}[||\mu-\delta_0(X)||^2]$$
For further information, I suggest reading Lehmann and Casella on this topic.
Let us show that there can be a UMVUE which is not a sufficient statistic.
First of all, if the estimator $T$ takes (say) value $0$ on all samples, then clearly $T$ is a UMVUE of $0$, which latter can be considered a (constant) function of $\theta$. On the other hand, this estimator $T$ is clearly not sufficient in general.
It is a bit harder to find a UMVUE $Y$ of the "entire" unknown parameter $\theta$ (rather than a UMVUE of a function of it) such that $Y$ is not sufficient for $\theta$. E.g., suppose the "data" are given just by one normal r.v. $X\sim N(\tau,1)$, where $\tau\in\mathbb{R}$ is unknown. Clearly, $X$ is sufficient and complete for $\tau$.
Let $Y=1$ if $X\ge0$ and $Y=0$ if $X<0$, and let
$\theta:=\mathsf{E}_\tau Y=\mathsf{P}_\tau(X\ge0)=\Phi(\tau)$; as usual, we denote by $\Phi$ and $\varphi$, respectively, the cdf and pdf of $N(0,1)$.
So, the estimator $Y$ is unbiased for $\theta=\Phi(\tau)$ and is a function of the complete sufficient statistic $X$. Hence,
$Y$ is a UMVUE of $\theta=\Phi(\tau)$.
On the other hand, the function $\Phi$ is continuous and strictly increasing on $\mathbb{R}$, from $0$ to $1$. So, the correspondence $\mathbb{R}\ni\tau=\Phi^{-1}(\theta)\leftrightarrow\theta=\Phi(\tau)\in(0,1)$ is a bijection. That is, we can re-parametirize the problem, from $\tau$ to $\theta$, in a one-to-one manner. Thus, $Y$ is a UMVUE of $\theta$, not just for the "old" parameter $\tau$, but for the "new" parameter $\theta\in(0,1)$ as well. However, $Y$ is not sufficient for $\tau$ and therefore not sufficient for $\theta$. Indeed, \begin{multline*} \mathsf{P}_\tau(X<-1|Y=0)=\mathsf{P}_\tau(X<-1|X<0)=\frac{\mathsf{P}_\tau(X<-1)}{\mathsf{P}_\tau(X<0)} \\ =\frac{\Phi(-\tau-1)}{\Phi(-\tau)} \sim\frac{\varphi(-\tau-1)/(\tau+1)}{\varphi(-\tau)/\tau}\sim\frac{\varphi(-\tau-1)}{\varphi(-\tau)}=e^{-\tau-1/2} \end{multline*} as $\tau\to\infty$; here we used the known asymptotic equivalence $\Phi(-\tau)\sim\varphi(-\tau)/\tau$ as $\tau\to\infty$, which follows by the l'Hospital rule. So, $\mathsf{P}_\tau(X<-1|Y=0)$ depends on $\tau$ and hence on $\theta$, which shows that $Y$ is not sufficient for $\theta$ (whereas $Y$ is a UMVUE for $\theta$).
Best Answer
Consider a serially correlated, covariance-stationary stochastic process $\{y_t\},\;\; t=1,...,n$, with mean $\mu$ and autocovariances $\{\gamma_j\},\;\; \gamma_j\equiv \operatorname{Cov}(y_t,y_{t-j})$. Assume that $\lim_{j\rightarrow \infty}\gamma_j= 0$ (this bounds the "strength" of autocorrelation as two realizations of the process are further and further away in time). Then we have that
$$\bar y_n = \frac 1n\sum_{t=1}^ny_t\rightarrow_{m.s} \mu,\;\; \text{as}\; n\rightarrow \infty$$
i.e. the sample mean converges in mean square to the true mean of the process, and therefore it also converges in probability: so it is a consistent estimator of $\mu$.
The variance of $\bar y_n$ can be found to be
$$\operatorname{Var}(\bar y_n) = \frac 1n \gamma_0+\frac 2n \sum_{j=1}^{n-1}\left(1-\frac {j}{n}\right)\gamma_j$$
which is easily shown to go to zero as $n$ goes to infinity.
Now, making use of Cardinal's comment let's randomize further our estimator of the mean, by considering the estimator
$$\tilde \mu_n = \bar y_n + z_n$$
where $\{z_t\}$ is an stochastic process of independent random variables which are also independent from the $y_i$'s, taking the value $at$ (parameter $a>0$ to be specified by us) with probability $1/t^2$, the value $-at$ with probability $1/t^2$, and zero otherwise. So $\{z_t\}$ has expected value and variance
$$E(z_t) = at\frac 1{t^2} -at\frac 1{t^2} + 0\cdot \left (1-\frac 2{t^2}\right)= 0,\;\;\operatorname{Var}(z_t) = 2a^2$$
The expected value and the variance of the estimator is therefore
$$E(\tilde \mu) = \mu,\;\;\operatorname{Var}(\tilde \mu) = \operatorname{Var}(\bar y_n) + 2a^2$$
Consider the probability distribution of $|z_n|$, $P\left(|z_n| \le \epsilon\right),\;\epsilon>0$: $|z_n|$ takes the value $0$ with probability $(1-2/n^2)$ and the value $an$ with probability $2/n^2$. So
$$P\left(|z_n| <\epsilon\right) \ge 1-2/n^2 = \lim_{n\rightarrow \infty}P\left(|z_n| < \epsilon\right) \ge 1 = 1$$
which means that $z_n$ converges in probability to $0$ (while its variance remains finite). Therefore
$$\operatorname{plim}\tilde \mu_n = \operatorname{plim}\bar y_n+\operatorname{plim} z_n = \mu$$
so this randomized estimator of the mean value of the $y$-stochastic process remains consistent. But its variance does not go to zero as $n$ goes to infinity, neither does it go to infinity.
Closing, why all the apparently useless elaboration with an autocorrelated stochastic process? Because Cardinal qualified his example by calling it "absurd", like "just to show that mathematically, we can have a consistent estimator with non-zero and finite variance".
I wanted to give a hint that it isn't necessarily a curiosity, at least in spirit: There are times in real life that new processes begin, man-made processes, that had to do with how we organize our lives and activities. While we usually have designed them, and can say a lot about them, still, they may be so complex that they are reasonably treated as stochastic (the illusion of complete control over such processes, or of complete a priori knowledge on their evolution, processes that may represent new ways to trade or produce, or arrange the rights-and-obligations structure between humans, is just that, an illusion). Being also new, we do not have enough accumulated realizations of them in order to do reliable statistical inference on how they will evolve. Then, ad hoc and perhaps "suboptimal" corrections are nevertheless an actual phenomenon, when for example we have a process where we strongly believe that its present depends on the past (hence the auto-correlated stochastic process), but we really don't know how as yet (hence the ad hoc randomization, while we wait for data to accumulate in order to estimate the covariances). And maybe a statistician would find a better way to deal with such kind of severe uncertainty -but many entities have to function in an uncertain environment without the benefit of such scientific services.
Estimators that converge in probability to a random variable do exist: the case of "spurious regression" comes to mind, where if we attempt to regress two independent random walks (i.e. non-stationary stochastic processes) on each other by using ordinary least squares estimation, the OLS estimator will converge to a random variable.
But a consistent estimator with non-zero variance does not exist, because consistency is defined as the convergence in probability of an estimator to a constant, which, by conception, has zero variance.