The likelihood function can be written as
$$
L(\theta)=\frac{1}{\theta^n}\mathbf{1}_{\theta\geq c},
$$
where $c=\max\{x_1,\ldots,x_n\}$. Therefore, $\theta\mapsto L(\theta)$ is not differentiable on the whole of $(0,\infty)$ and hence we cannot solve $L'(\theta)=0$ to look for maxima and minima. (Maxima and minima of a function $f$ have to be found among values of $x$ with either $f'(x)=0$ or $f'(x)$ being undefined)
Note however, that $L$ is differentiable on $(0,\infty)\setminus\{c\}$ and that $L(\theta)=0$ for $\theta\in (0,c)$ and by looking at $L'(\theta)$ on $(c,\infty)$ we see that $L$ is decreasing on $(c,\infty)$. Since
$$
L(c)=\frac{1}{c^n}>\frac{1}{\theta^n}=L(\theta),\quad \text{for all }\;\theta>c
$$ we see that $L(c)$ is the global maximum.
The density for each observation is $\displaystyle f_{X_i}(x) = \begin{cases} 1/2a & \text{if } -a\le x\le a, \\ 0 & \text{if } x<-a \text{ or } x>a. \end{cases}$
You didn't say your observations were independent but I will assume that was intended. The joint density is therefore
$$
f_{X_1,\ldots,X_n} (x_1,\ldots,x_n) = \begin{cases} 1/(2a)^n & \text{if for every } i\in\{1,\ldots,n\} \text{ we have } -a\le x_i \le a, \\ 0 & \text{otherwise}. \end{cases}
$$
The condition that for every $i\in\{1,\ldots,n\}$ we have $-a\le x_i\le a$ is the same as $\min\{x_1,\ldots,x_n\} \ge -a$ and $\max\{x_1,\ldots,x_n\}\le a.$ That condition on $\min$ is the same as $-\min\{x_1,\ldots,x_n\} \le a.$ So we need $a\ge\max$ and $a\ge -\min.$ I leave it as an exercise to show that
$$
\Big( a\ge \max\{x_1,\ldots,x_n\} \text{ and } a\le -\min\{x_1,\ldots,x_n\} \Big) \text{ if and only if } a \ge \max\{|x_1|,\ldots,|x_n|\}.
$$
Therefore the likelihood function is
$$
L(a) = \begin{cases} 1/(2a)^n & \text{if } a \ge \max\{|x_1|,\ldots,|x_n|\}, \\ 0 & \text{otherwise.} \end{cases}
$$
Now notice that $L(a)$ increases as $a$ decreases, until $a$ gets down to $\max\{|x_1|,\ldots,|x_n|\}.$ Therefore that maximum is the MLE.
Best Answer
You asked for an intuitive explanation. So let's think about what happens when we take a sample $x_1, x_2, \ldots, x_n$ from a uniform $[0,\theta]$ random variable, where $\theta$ is a fixed but unknown parameter. From the data, we would like to calculate a statistic (a function of the data) that estimates the value of the parameter; moreover, this estimator $\hat \theta$ maximizes the likelihood of having observed the particular sample that we obtained.
Clearly, $\theta$ cannot be smaller than the largest observation we made (or else we could not have observed it!). But in order to "maximize the likelihood," we also note that $\hat\theta$ should not be unnecessarily large. For instance, if we observed $x_1 = 2, x_2 = 7, x_3 = 4$, we could pick $\hat\theta = 100$, but intuitively this is not a good estimate--the chance that we observed $2, 7, 4$ given that the true value of the parameter is $100$ is quite small. It is much more likely that, given the information we have, the true value of the parameter is close to $7$. In fact, we can show mathematically that the probability of observing the sample given $\theta > \max_i x_i$ is smaller than if $\theta = \max_i x_i$.
We can also characterize $\hat\theta = \max_i x_i$ in terms of sufficiency: notice that once we observe the largest value, the other values give no additional information about the true value of $\theta$. So in our example, once we saw the maximum of $7$, the observations $x_1 = 2$ and $x_3 = 4$ are irrelevant for the purposes of estimating $\theta$, because we already know that $\theta \ge 7$.
Of course, we must formalize all of this with the appropriate mathematics, but this is essentially the underlying intuition involved.