Note that your second expression is just a special case of the first expression, where $n=1$. Hence it is sufficient to analyse your first assertion for a general $n\geq 1$ and see what happens in the case $n=1$
If you just look at a single observation (i.e $X_1$) instead of all observations (i.e. $X_1,...,X_n$) you are obviously discarding a lot of information which could be needed to estimate something unkown more precisely.
Suppose you are given an i.i.d sample $X_1,\dots,X_n$, $n\geq 1$, which are all sampled from a $Poisson(\lambda)$-distribution, with $\lambda$ unknown. Their joint density would be:
\begin{align*}
P(X_1=k_1, \dots, X_n=k_n)& = P(X_1=k_1) \cdot P(X_2=k_2) \cdot \ldots \cdot P(X_n=k_n)\\
& =\prod_{i=1}^n\frac{\lambda^{k_i}}{{k_i}!} \exp(-\lambda)
\end{align*}
which depends on the unknown parameter $\lambda$.
The idea of maximum likelihood is to look at the joint density function as a function of the unknown parameter $\lambda$ and maximize this target over all possible values of $\lambda$.
To better understand why we should use the joint density and not the "marginal" density of single observation we have to take a look at the result.
It is well known that the maximum likelihood estimator in the current case is
$\widehat{\lambda}_n = \frac{\sum_{i=1}^nX_i}{n}.$
But note, we have (since the $X_1,\dots, X_n$ are i.i.d):
$$E(\widehat{\lambda}_n) = \lambda$$ as well as $$Var(\widehat{\lambda}_n) = \frac{\lambda}{n}.$$
From this it is clear that $\widehat{\lambda}_n$ is an unbiased estimate for $\lambda$ for all $n$ (since $E(\widehat{\lambda}_n)$ does not depend on $n$) but the variance of this estimator will decrease with the sample size.
Hence using all $n$ observations from the sample and not only a single one (i.e. $n=1$) will lead to a "better/more precise" estimator! (this tells you: don't maximize your second assertion, since you can do better by maximizing your first assertion!)
It turns out that throwing away information (looking at $n=1$ instead of $n>1$) is not a good idea. This is very often the case in statistics.
\begin{align}
& \text{For } \mu \le \min\{x_1,\ldots,x_n\} \text{ and } \alpha>0, \text{we have} \\[10pt]
L(\mu,\alpha) & = \frac 1 {\Gamma(\alpha)^n} \left( \prod_{i=1}^n (x_i-\mu) \right)^{\alpha-1} \!\!\! \exp \left( -\sum_{i=1}^n (x_i-\mu) \right), \\[10pt]
\ell(\mu,\alpha) & = \log L(\mu,\alpha) = -n\log\Gamma(\alpha) + (\alpha-1) \sum_{i=1}^n \log(x_i-\mu) - \sum_{i=1}^n (x_i-\mu).
\end{align}
You gave us $\alpha<1.$
That implies $\alpha-1<0,$ so that $\ell(\mu,\alpha)$ is an increasing function of $\mu$ until $\mu$ gets as big as $\min\{x_1,\ldots,x_n\}.$
Therefore $\widehat\mu = \min\{x_1,\ldots,x_n\}.$ If we didn't have the constraint that $\alpha<1,$ then this would be more complicated.
This value of $\widehat\mu$ does not depend on $\alpha$ as long as $\alpha$ remains in that interval. Therefore we can just plug in $\min$ for $\mu$ and then seek the value of $\alpha\in(0,1)$ that maximizes $\ell(\min,\alpha).$
Now we have
$$
\ell(\min,\alpha) = -n\log\Gamma(\alpha) + (\alpha-1)A + \big( \text{constant} \big)
$$
where "constant" means not depending on $\alpha.$
$$
\frac {\partial\ell}{\partial\alpha} = -n\frac{\Gamma'(\alpha)}{\Gamma(\alpha)} + A.
$$
Etc.
Best Answer
Check the derivative of the log-likelihood. It's true that the derivative $\ell'(\theta)$ equals zero at $\hat\theta:=\frac{x_1+x_2}2$. Compute the second derivative to establish that $\hat\theta$ leads to a local maximum.
However, you'll find that setting the derivative of the log-likelihood to zero yields additional solutions for $\theta$ exactly when $|\Delta|>1$. In that case it's not obvious which of these solutions is in fact the maximizer.
ADDED: For $\ell'(\theta)$ I get a numerator of $$[1+(\theta-x_1)(\theta-x_2)](\theta-x_1 + \theta-x_2).$$ The quantity in square brackets on the left is a quadratic in $\theta$. Under what conditions does this quadratic, when set to zero, yield a solution? (Hint: check the discriminant $b^2-4ac$)