The probability that the maximum $Z$ of $n$ independent
$U(0,1)$ random variables is no larger than $z$, $0 < z < 1$ is simply
$\prod P\{X_i \leq z\} = z^n$ and so the density is $nz^{n-1}\mathbf 1_{0 < z < 1}$.
From this, it is easy to verify that
$$E[Z] = \int_0^1 z\cdot nz^{n-1}\,\mathrm dz = \left.\frac{n}{n+1}z^{n+1}\right|_0^1 = \frac{n}{n+1}.$$
If you want
to do it the hard way by first computing $E[Z\mid W]$ first (where $W$ is
the minimum of the $n$ random variables), then we need to find
the conditional distribution of $Z$ given $W = w$. Let us start by first
finding the joint distribution of $(Z,W)$. For $0 < w < z < 1$,
we have that
$$P\{z< Z < z+\Delta z, w < W < w+\Delta w\}
\approx f_{Z,W}(z,w)\cdot\Delta z \Delta w.$$
Any of the $n$ $X_i$ can be $Z$ and any one of the remaining $n-1$ $X_j$
can be $W$, and the remaining $n-2$ random variables must be in the interval
$[w,z]$. Thus we have
$$\begin{align}
f_{Z,W}(z,w) &= n(n-1)(z-w)^{n-2}, 0 < w < z < 1,\\
f_W(w) &= \int_w^1 n(n-1)(z-w)^{n-2}\,\mathrm dz\\
&= n (z-w)^{n-1}\bigr|_w^1\\
&= n\left(1-w\right)^{n-1},\\
f_{Z\mid W}(z\mid W=w) &= \frac{n-1}{(1-w)^{n-1}}(z-w)^{n-2}, ~ w < z < 1.\\
E[(Z -W) \mid W =w]
&= \int_w^1 (z-w)\cdot \frac{n-1}{(1-w)^{n-1}}(z-w)^{n-2},\mathrm dz\\
&= \frac{n-1}{n}(1-w),\\
E[Z\mid W = w] &= w + \frac{n-1}{n}(1-w),
\end{align}$$
so that we arrive at $E[Z\mid W] = \frac{W + n-1}{n}$. We finally get
to use the law of iterated expectation to arrive at
$$E[Z] = E[E[Z\mid W]] = E\left[\frac{W + n-1}{n}\right]
= \frac{\frac{1}{n+1} + n-1}{n} = \frac{n}{n+1}$$
if you remember that $E[W] = \frac{1}{n+1}$. If not, work it
out from the density of $W$ given above (hint: it is a Beta density).
Moral: Don't wrap your left arm twice around your head in order
to scratch
your right ear, and don't try to find the expected value of the maximum
via the law of
iterated expectation and conditioning on the value of the minimum
The p.d.f for one $x_i$ is given as
$$
f(x| \theta) = \begin{cases}
\frac{1}{\theta} & & \text{if } 0 \leq x \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
Let's call $\vec{x} = (x_1, ..., x_n)$.
The $n$ observations are i.i.d. so the likelihood of observing the $n$-vector $\vec{x} = (x_1, ... x_n)$ is the product of the component-wise probabilities. Ignoring the issue of support for the moment, note that this product can be simply written as a power:
$$
f(\vec{x}| \theta) = \prod_i^n \frac{1}{\theta} = \frac{1}{\theta^n} = \theta^{-n}
$$
Next, we turn our attention to the support of this function. If any single component is outside its interval of support $(0, 1/\theta)$, then its contribution to this equation is a 0 factor, so the product of the whole will be zero. Therefore $f(\vec{x})$ only has support when all components are inside $(0, 1/\theta)$.
$$
f(\vec{x}| \theta) = \begin{cases}
\theta^{-n} & & \text{if } \forall i, \ 0 \leq x_i \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
By definition, this is also our likelihood:
$$
\mathcal{L}(\theta; \vec{x}) = f(\vec{x}| \theta) = \begin{cases}
\theta^{-n} & & \text{if } \forall i, \ 0 \leq x_i \leq \theta \\
0 & & \text{otherwise}
\end{cases}
$$
The MLE problem is to maximize $\mathcal{L}$ with respect to $\theta$. But because $\theta > 0$ (given in the title of the problem) then $\theta^{-n} > 0$ therefore 0 will never be the maximum. Thus, this is a constrained optimization problem:
$$
\hat{\theta} = \text{argmin}_\theta \,\, \theta^{-n} \text{ s.t. } \forall i \,\, 0 \leq x_i \leq \theta
$$
This is easy to solve as a special case so we don't need to talk about the simplex method but can present a more elementary argument. Let $t = \text{max} \,\, \{x_1,...,x_n\}$. Suppose we have a candidate solution $\theta_1 = t - \epsilon$. Then let $\theta_2 = t - \epsilon/2$. Clearly both $\theta_1$ and $\theta_2$ are on the interior of the feasible region. Furthermore we have $\theta_2 > \theta_1 \implies \theta_2^{-n} < \theta_2^{-n}$. Therefore $\theta_1$ is not at the minimum. We conclude that the minimum cannot be at any interior point and in particular must not be strictly less than $t$. Yet $t$ itself is in the feasible region, so it must be the minimum. Therefore,
$$\hat{\theta} = \text{max} \,\, \{x_1,..., x_n\}$$
is the maximum likelihood estimator.
Note that if any observed $x_i$ is less than 0, then $\mathcal{L}$ is a constant 0 and the optimization problem has no unique solution.
Best Answer
I think this should help:
Starting off with a 2D "ball", i.e. a circle. Points are uniformly distributed within a circle.
http://blogs.sas.com/content/iml/2016/03/30/generate-uniform-2d-ball.html
This can then be extended to a 3D "ball", i.e. a sphere and onto higher dimensions.
http://www.statsblogs.com/2016/04/06/generate-points-uniformly-inside-a-d-dimensional-ball/
(I would have posted this as a comment, but I'm 9 short!)