In your example, it seems the density $f_{X,Y}(x,y)$ is zero when $x\leqslant0$ or when $y\leqslant0$ hence, to compute $\mathbb P(X+Y\leqslant1)$, the domain of integration is defined by the inequalities
$$
x\geqslant0,\qquad y\geqslant0,\qquad x+y\leqslant1.
$$
This is indeed equivalent to
$$
0\leqslant x\leqslant1,\qquad 0\leqslant y\leqslant 1-x,
$$
but I know no systematic way of deducing the latter from the former, except drawing a rough sketch of the domain of interest.
The approach is correct. Contrary to the title of the question, the $c$'s are designated as constants. So $Y_i$ has the distribution of $X_i$ but with a ceiling, and each time we hit the ceiling, the probability is allocated to $c_i$. So the distribution function is indeed
$$\begin{align*}
F_{Y_i}(y) &= 1 - e^{-\lambda y} &0 \leq y < c_i \\
F_{Y_i}(y) &= 1 &y = c_i
\end{align*}$$
The density is likewise step-wise, i.e.
$$\begin{align*}
f_{Y_i}(y) &= \lambda e^{-\lambda y} &0 \leq y < c_i \\
f_{Y_i}(y)=1-F_{Y_i}(y\mid y<c_i) &= e^{- \lambda y} &y = c_i
\end{align*}$$
which "integrates to unity" alright since (skipping formalities)
$$\int_{S_{Y_i}}f_{Y_i}(y)dy = \int_0^{c_i}\lambda e^{-\lambda y}dy + e^{- \lambda c_i} = - e^{-\lambda y} \Big |_0^{c_i}+ e^{- \lambda c_i} = -e^{- \lambda c_i} +1 +e^{- \lambda c_i} =1$$
Note that the density is discontinuous, except if $\lambda =1$.
As long as the $X_i$'s are assumed independent, your joint likelihood function is correct (just make the $y$'s lower-case), and it is a standard case of a likelihood function "regulated" by an indicator function. Note that there may be a small conceptual hurdle: if we observe $y_i = c_i$, does this mean that $y_i = x_i$ or not? Since $X_i$ is continuous and the probability of it acquiring a specific value is zero, we treat the case $Y_i = c_i$ as implying that $Y_i \neq X_i$. Your indicator function is equivalent to $\Delta_i = \textbf{1} \{ y_i <c_i \}$, totally deterministic, given the sample.
If you want to proceed with estimation, you are presumed to know the $c$-constants (otherwise, you do not have enough data to estimate them). You also have a sample of $y_i$'s. Given these, you can create the indicator function series. Then the MLE for $\lambda$ will run as an iterative numerical estimation procedure. What, were you hopping to obtain an analytical solution?
To verify that your gradient is correct, assume that you obtain a sample of $y_i$'s in which $y_i<c_i,\; \forall i$. Then $\Delta_i =1,\; \forall i$ and the gradient becomes
$$
\frac{ \mathrm{d} \log \mathcal{L}(\lambda)}{ \mathrm{d} \lambda} = -\sum_{i = 1}^n \frac{( \lambda y_i - 1)}{\lambda } = 0 \Rightarrow \frac 1 {\lambda} = \frac 1n \sum_{i = 1}^n y_i $$
which is as it should, since, if all $y_i$ -realizations are below the ceiling, then the $Y_i$'s are treated as proper exponential variables themselves, and the ceilings, being non-binding, do not affect the estimation.
Best Answer
There is no way to be sure what distribution gives rise to your data. First, there is no assurance that your data fit any 'named' distribution. Second, even if you guess the correct parametric distribution family, you still have to use the data to estimate the parameters. Here are several approaches that might be useful.
First, you might see whether a member of the beta family of distributions is a reasonable fit to your data. These distributions have support $(0, 1)$ and two shape parameters $\alpha > 0$ and $\beta > 0.$ Roughly speaking, these determine the shape of the density curve near 0 and near 1, respectively. It is possible to estimate $\alpha$ and $\beta$ from data. One way is to pick parameters that match the sample mean and variance (method of moments estimation). See the Wikipedia article on 'beta distribution' for particulars.
Below is a histogram of a sample of 1000 observations simulated according to the distribution $Beta(\alpha = 1, \beta = 10)$ along with the density function (solid blue curve) of that particular distribution. (I used R statistical software and show the R code. R is available free of charge at
www.r-project.org
for Windows, Mac and Linux.)It takes a lot of data to get really close estimates of the parameters. For example, here the sample mean is 0.3320 while the population mean is .3333. I will let you check how closely the sample and population variances match.
A second method is to use a density estimator of your data. (See Wikikpedia on 'density estimator' or google 'KDE' for 'kernel density estimator'.) The last line of code puts the dotted red density estimator onto the plot above. The function
density(x)
produces $(x,y)$ coordinates. These may be of use if a digitized approximation to the density is useful.By sorting the data, it seems to me that you are making an "empirical distribution function" (ECDF). ECDFs tend to match theoretical CDFs better than density estimators match histograms, partly because information is lost when data are sorted into bins to make a histogram. For a continuous distribution, you could try taking differences of small intervals in an ECDF to approximate the PDF, but I think density estimation is easier to use.
Below is the ECDF of the data generated above (heavy black 'stairstep', increasing by $1/n$ at each sorted datapoint), along with the population CDF (thin blue curve). Black tick marks at the horizontal axis show the locations of individual observations.
You do not say why you want to know the PDF for your data. By googling some of the terminology I have used here, you may be able to find a solution that matches your goals better than anything in my example.