The question was answered with a link in a comment, so let me just give here the argument from the link, for future completeness.
We assume a statistical model for data $X$ is parameterized by a parameter $\theta$ (which can be scalar or vector, or even more general). Let the likelihood function be $L(\theta)$ and the value of $\theta$ maximizing that be the maximum likelihood estimator $\hat{\theta}$ (mle). We do assume that estimator exists and is unique. Wanted is the mle of $g(\theta)$, a function of $\theta$. First we assume that $g$ is one-to-one. Then we can write
$$
L(\theta) = L(g^{-1}(g(\theta))
$$
and both functions are clearly maximized by $\hat{\theta}$, so
$$
\hat{\theta} = g^{-1}(\hat{g(\theta)})
$$ or
$$
g(\hat{\theta}) = \hat{g(\theta)}
$$
If $g$ is many-to-one, then $\hat{\theta}$ which maximizes $L(\theta)$ still corresponds to $g(\hat{\theta})$, so $g(\hat{\theta})$ still corresponds to the maximum of $L(\theta)$. (Argument paraphrased from the link in the comment above).
This isn't a stats question but a question relating to basic properties of calculus and algebra.
It may help to consider a simpler problem, to avoid any confusion about the issue:
$$\frac{\partial{}}{\partial{p_i}} \sum_{i = 1}^n p_i^2$$
Think of the summation written out:
$$
\frac{\partial{}}{\partial{p_i}} (p_1^2 + p_2^2 + ... + p_{i-1}^2 + p_i^2 + p_{i+1}^2 + ... + p_n^2)
$$
Take the derivative term by term:
$$
= \frac{\partial{p_1^2}}{\partial{p_i}} + \frac{\partial{p_2^2}}{\partial{p_i}} + ... + \frac{\partial{p_{i-1}^2}}{\partial{p_i}} + \frac{\partial{p_{i}^2}}{\partial{p_i}} + \frac{\partial{p_{i+1}^2}}{\partial{p_i}} + ... + \frac{\partial{p_{n}^2}}{\partial{p_i}}
$$
Now take those derivatives (leaving the $i^{\rm{th}}$ term unevaluated for the moment):
$$
= 0 + 0 + ... + 0 + \frac{\partial{p_{i}^2}}{\partial{p_i}} + 0 + ... + 0
$$
and we now see why the summation disappears - there's only one term that isn't zero:
$$
= \frac{\partial{p_{i}^2}}{\partial{p_i}} = 2p_i
$$
Your question is the same but with a different, slightly more complicated function.
Regarding the original problem:
$$
l(p_i,y_i) = \sum_{i = 1}^n \left( \ln(p_i) + y_i \ln(1 - p_i) \right)
$$
is fine, but as soon as you took derivatives, you went astray.
Best Answer
Most families of distributions $f_\theta$ have a fixed support, $$\text{supp}(f_\theta)=\{x\in\mathcal{X};\ f_\theta(x)>0\}$$ like the Normal or Binomial distributions, but some have a parameter dependent support, like uniforms $\text{U}(0,\theta)$ or $\text{U}(-\theta,\theta)$. For such families, it is important to keep the support constraint in the likelihood, because it brings in hard constraints on $\theta$, which means it can be better estimated in such settings. If one considers a sample $(x_1,\ldots,x_n)$ from the $\text{U}(-\theta,\theta)$ distribution, the likelihood is $$\prod_{i=1}^n \frac{1}{2\theta}\times\mathbb{I}_{(-\theta,\theta)}(x_i) =\frac{1}{(2\theta)^n}\times\prod_{i=1}^n \mathbb{I}_{-\theta\le x_i\le\theta}=\frac{1}{(2\theta)^n}\times\mathbb{I}_{\theta\ge\max |x_i|}$$ This indicator function at the end$$\mathbb{I}_{\theta\ge\max |x_i|}$$ is essential to restrict the set of possible $\theta$'s to $(\max |x_i|,\infty)$, rather than $(0,\infty)$. It thus determines the new parameter set given the sample.
Once the modified parameter set is determined, the likelihood function is defined on that modified set and the derivative does not cover the indicator function. For instance, in the Uniform example, $$\frac{\partial}{\partial\theta}\frac{1}{(2\theta)^n}=-\frac{n}{2^n\theta^{n+1}}$$which is always negative, meaning that the optimum value of $\theta$ is the smallest possible one, i.e., the smallest value in the modified parameter set, $\max |x_i|$.