Statistical Estimation – Estimating Theta Based on Censored Data with Uniform Distribution

censoringestimatorsinferencemaximum likelihooduniform distribution

Suppose $(X_i)_{1\le i\le n}$ are i.i.d $\text{Uniform}(0,\theta)$ random variables where $\theta \ge 1$. We observe $Y_i=\min(X_i,1)$ instead of $X_i$. I wish to estimate $\theta$ based on the data $(Y_i)_{1\le i\le n}$.

I think the likelihood takes the form

\begin{align}
L(\theta\mid \boldsymbol y)&=\prod_{i:x_i<1} \left(\frac1{\theta}\mathbf1_{0<y_i<\theta}\right)\prod_{i:x_i\ge 1}\left(1-\frac1{\theta}\right)
\\&=\theta^{-r}(1-\theta^{-1})^{n-r}\mathbf1_{0<y_{(n)}<\theta}\,,
\end{align}

where $r=\sum_{i=1}^n \mathbf1_{x_i<1}=\sum_{i=1}^n \mathbf1_{y_i<1}$ is the number of observations less than $1$.

If this is the correct likelihood, then a sufficient statistic for $\theta$ seems to be $T=(R,Y_{(n)})$. Or is the sufficient statistic simply $R$? Is the sufficient statistic complete? This would help to answer if there is UMVUE of $\theta$.

On the other hand, what is MLE of $\theta$? If I ignore the indicator $\mathbf1_{0<y_{(n)}<\theta}$ in the likelihood and assume $\theta\ne 1$, then differentiation leads to the stationary point $\hat\theta =\frac{n}{r}$. I am not sure if this is a valid answer.

Best Answer

Some more details than the other answers. The probability distribution of $Y$ will be a mixture distribution with two components, one continuous and one discrete. To find the distribution of $Y$, write $$ \DeclareMathOperator{\P}{\mathbb{P}} \P(Y\in A) = \P(Y\in A \mid Y<1)\P(Y<1)+\P(Y\in A\mid Y=1)\P(Y=1)\\ =\frac1\theta\cdot\int_{A\cap [0,1]} \; dy + \frac{\theta-1}{\theta}\cdot \mathbb{1}(Y\in A) $$ leading directly to the likelihood $$ L_Y(\theta) = \left(\frac{\theta-1}{\theta}\right)^{n-r}\cdot \left( \frac1\theta\right)^r $$ showing that $R$ alone is a sufficient statistic. Now the usual procedure leads to the maximum likelihood estimator $$ \hat{\theta}_{ML}= 1+ \frac{n-r}r $$ which also seems intuitively reasonable.

Some more details Likelihood defined in this situation with a mixture distribution might be new to many, so some details might help. But first, this is discussed onsite earlier at Maximum likelihood function for mixed type distribution and Weighted normal errors regression with censoring. We will need some concepts from measure theory ... let $\mu^*$ be the measure given by $$ \mu^*(A)= \mu(A) + \mathbb{1}\{ 1\in A\} $$ where $\mu$ is Leb (Lebesgue) measure and the second term is an atom at $1$. Now, the distribution of $Y$ can be written as a density (in the sense of the Radon-Nikodym theorem) with respect to $\mu^*$. This looks like $$ \P(Y\in A) =\int_A f(y) \; \mu^*(dy) =\int_A f(y) \; \mu(dy) + \int_A f(y)\; d\delta_1(y) $$ where $\delta_1$ is the atom at $1$. The Radon-Nikodym density $f$ can be written $$ f(y)= \frac1\theta \mathbb{1}\{0\le y < 1\} + \frac{\theta-1}\theta\cdot \mathbb{1}\{y=1\} $$ Note that the first term in the density only contributes to the integral with respect to $d\mu$, the second term only to the integral with respect to $d\delta_1$.

So, defining the likelihood using the RN-derivative $f$, we get $$ L_Y(\theta)=\prod_i^n f(y_i)=\prod_i^n \left\{ \frac1\theta \mathbb{1}\{0\le y_i < 1\} + \frac{\theta-1}\theta\cdot \mathbb{1}\{y_i=1\} \right\} $$ and simplifying gives the likelihood above.