Working with extrema requires care, but it doesn't have to be difficult. The crucial question, found near the middle of the post, is
... we need to show that $\frac{n+1}{n}{\hat \theta}^{n}$ is unbiased.
Earlier you obtained
$$\hat\theta = \max(-y_1, y_n/2) = \max\{-\min\{y_i\}, \max\{y_i\}/2\}.$$
Although that looks messy, the calculations become elementary when you consider the cumulative distribution function $F$. To get started with this, note that $0\le \hat\theta \le \theta$. Let $t$ be a number in this range. By definition,
$$\eqalign{
F(t) &= \Pr(\hat\theta\le t) \\&= \Pr(-y_1 \lt t\text{ and }y_n/2 \le t) \\
&= \Pr(-t \le y_1 \le y_2 \cdots \le y_n \le 2t).
}$$
This is the chance that all $n$ values lie between $-t$ and $2t$. Those values bound an interval of length $3t$. Because the distribution is uniform, the probability that any specific $y_i$ lies in this interval is proportional to its length:
$$\Pr(y_i \in [-t, 2t]) = \frac{3t}{3\theta} = \frac{t}{\theta}.$$
Because the $y_i$ are independent, these probabilities multiply, giving
$$F(t) = \left(\frac{t}{\theta}\right)^n.$$
The expectation can immediately be found by integrating the survival function $1-F$ over the interval of possible values for $\hat\theta$, $[0, \theta]$, using $y=t/\theta$ for the variable:
$$\mathbb{E}(\hat\theta) = \int_0^\theta \left(1 - \left(\frac{t}{\theta}\right)^n\right)dt = \int_0^1 (1-y^n)\theta dy = \frac{n}{n+1}\theta.$$
(This formula for the expectation is derived from the usual integral via integration by parts. Details are given at the end of https://stats.stackexchange.com/a/105464.)
Rescaling by $(n+1)/n$ gives
$$\mathbb{E}\left(\frac{n+1}{n}\,{\hat \theta}\right) = \theta,$$
QED.
The p.d.f for one $x_i$ is given as
$$
f(x| \theta) = \begin{cases}
\frac{1}{\theta} & & \text{if } 0 \leq x \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
Let's call $\vec{x} = (x_1, ..., x_n)$.
The $n$ observations are i.i.d. so the likelihood of observing the $n$-vector $\vec{x} = (x_1, ... x_n)$ is the product of the component-wise probabilities. Ignoring the issue of support for the moment, note that this product can be simply written as a power:
$$
f(\vec{x}| \theta) = \prod_i^n \frac{1}{\theta} = \frac{1}{\theta^n} = \theta^{-n}
$$
Next, we turn our attention to the support of this function. If any single component is outside its interval of support $(0, 1/\theta)$, then its contribution to this equation is a 0 factor, so the product of the whole will be zero. Therefore $f(\vec{x})$ only has support when all components are inside $(0, 1/\theta)$.
$$
f(\vec{x}| \theta) = \begin{cases}
\theta^{-n} & & \text{if } \forall i, \ 0 \leq x_i \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
By definition, this is also our likelihood:
$$
\mathcal{L}(\theta; \vec{x}) = f(\vec{x}| \theta) = \begin{cases}
\theta^{-n} & & \text{if } \forall i, \ 0 \leq x_i \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
The MLE problem is to maximize $\mathcal{L}$ with respect to $\theta$. But because $\theta > 0$ (given in the title of the problem) then $\theta^{-n} > 0$ therefore 0 will never be the maximum. Thus, this is a constrained optimization problem:
$$
\hat{\theta} = \text{argmin}_\theta \,\, \theta^{-n} \text{ s.t. } \forall i \,\, 0 \leq x_i \leq \theta
$$
This is easy to solve as a special case so we don't need to talk about the simplex method but can present a more elementary argument. Let $t = \text{max} \,\, \{x_1,...,x_n\}$. Suppose we have a candidate solution $\theta_1 = t - \epsilon$. Then let $\theta_2 = t - \epsilon/2$. Clearly both $\theta_1$ and $\theta_2$ are on the interior of the feasible region. Furthermore we have $\theta_2 > \theta_1 \implies \theta_2^{-n} < \theta_2^{-n}$. Therefore $\theta_1$ is not at the minimum. We conclude that the minimum cannot be at any interior point and in particular must not be strictly less than $t$. Yet $t$ itself is in the feasible region, so it must be the minimum. Therefore,
$$\hat{\theta} = \text{max} \,\, \{x_1,..., x_n\}$$
is the maximum likelihood estimator.
Note that if any observed $x_i$ is less than 0, then $\mathcal{L}$ is a constant 0 and the optimization problem has no unique solution.
Best Answer
Here is a homework problem from Mark Schervish's Theory of Statistics that addresses a similar question:
When only the center $\theta$ is unknown, the likelihood is constant over the set $$\mathfrak O = \bigcap_{i=1}^n \{\theta; d(x_i,\theta)\le r\}$$ which is therefore a (set-valued) "sufficient statistic". There is however no sufficient statistic in the classical sense and any value in $\mathfrak O$ is a MLE.
The problem can be considered from a different perspective, namely as a location model, since $$Z_1,\ldots,Z_n\sim\mathcal U(\mathcal B(\theta,r))$$ is the translation by $\theta$ of a $$U_1,\ldots,U_n\sim\mathcal U(\mathcal B(0,r))$$ [ancillary] sample. Therefore one could consider the best equivariant estimator¹ attached with this problem (and squared error loss), $$\hat\theta^\text P = \dfrac{\int \theta\,\mathbb I_{\max_i \vert\theta-z_i\vert<r}\,\text d\theta}{\int \mathbb I_{\max_i \vert\theta-z_i\vert<r}\,\text d\theta}$$ as established by Pitman (1933). This best equivariant estimator is
Both 5. and 7. indicate that this estimator is minimum variance in the weak sense that there is no other estimator with a strictly everywhere smaller maximal MSE.
¹See Theory of Point estimation, by Lehmann and Casella, for a textbook entry to equivariance and best equivariant estimators.