I'm having some problems with the following assignment:
Let $X_1, X_2, …,, X_n$ be samples from a exponential distribution with parameter $\lambda$, and let $c_1, c_2, …, c_n$ be a sequence of positive numbers. Define
\begin{align*}
Y_i = \min(X_i, c_i) \quad \text{and} \quad \Delta_i = \textbf{1} \{ Y_i = X_i \}.
\end{align*}
Determine the likelihood of the observed data $(Y_1, \Delta_1), (Y_2, \Delta_2), ….,(Y_n, \Delta_n)$.
My main problems are that I do not know how to calculate the pdf of the $Y_i$, and even if knowing those pdf's, I'm not sure how to use the $\Delta_i$ in the correct way.
What I've done so far:
I've first followed this post for calculating the pdf of $Y_i$: How to find the pdf of [min(RV.1,RV.2)]/RV.2
So, first notice that $Y_i$ is a random variable supported on $[ 0, c_i]$.
For $0 \leq y < c_i$ we see:
\begin{align*}
P(Y_i \leq y) = P(\min(X_i, c_i) \leq y) = P(X_i \leq y) = 1 – e^{-\lambda \; y}
\end{align*}
For $y = c_i$ we see:
\begin{align*}
P(Y_i = y) = P(\min(X_i, c_i) = y) = P(X_i > y) = e^{- \lambda \; c_i}
\end{align*}
Let $F_{Y_i}$ denote the distribution function of $Y_i$, then we see:
\begin{align*}
F_{Y_i}(y) &= 1 – e^{-\lambda \; y} &0 \leq y < c_i \\
F_{Y_i}(y) &= 1 &y = c_i
\end{align*}
and
\begin{align*}
f_{Y_i}(y) = \lambda e^{- \lambda y}, \quad 0 \leq y < c_i
\end{align*}
which obviously does not integrate to 1.
Disregarding that for the moment, it seems logical to me that the likelihoodfunction $\mathcal{L}(\lambda)$ then becomes
\begin{align*}
\mathcal{L}(\lambda) = \prod_{i = 1}^n \left( \Delta_i \lambda e^{- \lambda \; Y_i} + (1 – \Delta_i) e^{- \lambda \; c_i} \right).
\end{align*}
Logging this statement we get
\begin{align*}
\log \mathcal{L}(\lambda) = \sum_{i = 1}^n \log\left( \Delta_i \lambda e^{- \lambda \; Y_i} + (1 – \Delta_i) e^{- \lambda \; c_i} \right)
\end{align*}
and calculating the derivative gives us
\begin{align*}
\frac{ \mathrm{d} \log \mathcal{L}(\lambda)}{ \mathrm{d} \lambda} = \sum_{i = 1}^n \frac{\Delta_i e^{\lambda \; c_i}( \lambda Y_i – 1) – c_i (\Delta_i – 1)e^{Y_i \; \lambda} }{(\Delta_i – 1)\lambda e^{\lambda \; Y_i} – \Delta_i \lambda e^{\lambda \; c_i}}.
\end{align*}
But as far as I'm concerned, this brings us nowhere.
If someone could point out where I went wrong, I'd really appreciate it. Thanks in advance for any replies!
Best Answer
The approach is correct. Contrary to the title of the question, the $c$'s are designated as constants. So $Y_i$ has the distribution of $X_i$ but with a ceiling, and each time we hit the ceiling, the probability is allocated to $c_i$. So the distribution function is indeed
$$\begin{align*} F_{Y_i}(y) &= 1 - e^{-\lambda y} &0 \leq y < c_i \\ F_{Y_i}(y) &= 1 &y = c_i \end{align*}$$
The density is likewise step-wise, i.e.
$$\begin{align*} f_{Y_i}(y) &= \lambda e^{-\lambda y} &0 \leq y < c_i \\ f_{Y_i}(y)=1-F_{Y_i}(y\mid y<c_i) &= e^{- \lambda y} &y = c_i \end{align*}$$
which "integrates to unity" alright since (skipping formalities)
$$\int_{S_{Y_i}}f_{Y_i}(y)dy = \int_0^{c_i}\lambda e^{-\lambda y}dy + e^{- \lambda c_i} = - e^{-\lambda y} \Big |_0^{c_i}+ e^{- \lambda c_i} = -e^{- \lambda c_i} +1 +e^{- \lambda c_i} =1$$
Note that the density is discontinuous, except if $\lambda =1$.
As long as the $X_i$'s are assumed independent, your joint likelihood function is correct (just make the $y$'s lower-case), and it is a standard case of a likelihood function "regulated" by an indicator function. Note that there may be a small conceptual hurdle: if we observe $y_i = c_i$, does this mean that $y_i = x_i$ or not? Since $X_i$ is continuous and the probability of it acquiring a specific value is zero, we treat the case $Y_i = c_i$ as implying that $Y_i \neq X_i$. Your indicator function is equivalent to $\Delta_i = \textbf{1} \{ y_i <c_i \}$, totally deterministic, given the sample.
If you want to proceed with estimation, you are presumed to know the $c$-constants (otherwise, you do not have enough data to estimate them). You also have a sample of $y_i$'s. Given these, you can create the indicator function series. Then the MLE for $\lambda$ will run as an iterative numerical estimation procedure. What, were you hopping to obtain an analytical solution?
To verify that your gradient is correct, assume that you obtain a sample of $y_i$'s in which $y_i<c_i,\; \forall i$. Then $\Delta_i =1,\; \forall i$ and the gradient becomes
$$ \frac{ \mathrm{d} \log \mathcal{L}(\lambda)}{ \mathrm{d} \lambda} = -\sum_{i = 1}^n \frac{( \lambda y_i - 1)}{\lambda } = 0 \Rightarrow \frac 1 {\lambda} = \frac 1n \sum_{i = 1}^n y_i $$
which is as it should, since, if all $y_i$ -realizations are below the ceiling, then the $Y_i$'s are treated as proper exponential variables themselves, and the ceilings, being non-binding, do not affect the estimation.