Usually to find MLE's for density functions where the domain of the density depends on the parameter is not straightforward in the sense that differentating will not help.
Denote by $\textbf{1}_{[a,b]}(x)$ the indicator function meaning that
$$\textbf{1}_{[a,b]}(x)= \begin{cases} 1 \mbox{ if } x\in [a,b] \\ 0 \mbox{ if not.}\end{cases}$$
The density function of a uniform random variable $U(0,\theta)$ is given by
$$f_{\theta}(x) = \frac{1}{\theta} \textbf{1}_{[0,1]}(\theta).$$
Given a sample of $n$ random observations: $Y_1,\dots, Y_n$ (it is convenient to write observations with small letters and random variables with big letters, so when I write big $Y$'s means that these are computations before we collect data) the likelihood funtcion is given by:
$$L(\theta|Y_1,\dots, Y_n)= \prod_{i=1}^n f_{\theta}(Y_i) = \frac{1}{\theta^n}\prod_{i=1}^n \textbf{1}_{[0,\theta]}(Y_i).$$
Now look at the product of indicator functions. We should try to write this as a function of $\theta$ instead of as a function of all the $Y_i$'s in order to know exactly what is the domain of $L$. This product is not zero only if $0<Y_i<\theta$ for all the $i=1,\dots,n$. In other words, $0<\min_i Y_i < \max_i Y_i < \theta$. So the domain of the likelihood function is only for $\theta>\max_{i=1,\dots,n} Y_i$. That is
$$L(\theta|Y_1,\dots, Y_n)= \frac{1}{\theta^n} \textbf{1}_{[\max_{i=1,\dots,n} Y_i, \infty)}(\theta).$$
Now, the function $1/\theta^n$ function is strictly decreasing on $[0,\infty]$. So its maximum must be attained at the left value. In conclusion, the MLE of $\theta$ is given by the maximum values of the sample of $Y_i$'s which is quite intuitive:
$$\hat{\theta}_{MLE} = \max_{i=1,\dots,n} Y_i.$$
Now try to reproduce the same computations and ideas when we have a uniform distribution depending on two parameters: $f_{\theta_1, \theta_2} (x) = \frac{1}{\theta_2- \theta_1}$ for $\theta_1 < y < \theta_2$. Intuition tells that if you have a sample of $Y_1,\dots, Y_n$ then the MLE's of $\theta_1$ and $\theta_2$ should be:
$$ \hat{\theta_1} = \min_{n=1,\dots,n} Y_i \mbox{ and } \hat{\theta_2} = \max_{n=1,\dots,n} Y_i.$$
Observe that the estimators are expressed with capital letters. So they are random. Each time you collect a new sample you get different estimates. Hence when you collect a sample of observations $y_1,\dots, y_n$ (small letters) the estimates are:
$$ \hat{\theta_1} = \min_{n=1,\dots,n} y_i \mbox{ and } \hat{\theta_2} = \max_{n=1,\dots,n} y_i.$$
(Just substitute the values you got in the exercise)
I hope this helped! ;)
First rewrite the density with the new parametrization
$$f(y|\theta)=\frac{ky^{k-1}}{\theta}e^{-\frac{y^k}{\theta}}$$
Calculate the likelihood
$$L(\theta)\propto \theta^{-n}e^{-\frac{\Sigma_i y_i^k}{\theta}}$$
proceeding in the calculation you find that the score function (derivative of the log likelihood with respect to $\theta$) is
$$l^*=-\frac{n}{\theta}+\frac{1}{\theta^2}\Sigma_i y_i^k$$
And thus
$$T=\hat{\theta}_{ML}=\frac{\Sigma_i y_i^k}{n}$$
To show that $\mathbb{E}[T]=\theta$ let's rewrite the score function in the following way
$$l^*=-\frac{n}{\theta}+\frac{nT}{\theta^2}$$
Now simply remembering that (First Bartlett Identity)
$$\mathbb{E}[l^*]=0$$
you get
$$\frac{n}{\theta}=\frac{n\mathbb{E}[T]}{\theta^2}$$
that is also
$$\mathbb{E}[T]=\theta$$
To calculate its variance, using II Bartlett Identity, that is
$$\mathbb{E}[l^{**}]=-\mathbb{E}[(l^*)^2]$$
This identity leads to
$$\mathbb{V}\Bigg[\frac{nT}{\theta^2}-\frac{n}{\theta}\Bigg]=-\mathbb{E}\Bigg[\frac{n}{\theta^2}-\frac{2nT}{\theta^3}\Bigg]$$
that is
$$\frac{n^2}{\theta^4}\mathbb{V}[T]=\frac{n}{\theta^2}$$
$$\mathbb{V}[T]=\frac{\theta^2}{n}$$
Alternative method to calculate expectation and variance of T
Simply transforming
$$W=Y^k$$
you get that $W\sim Exp\Big(\frac{1}{\theta}\Big)$ thus
$$T\sim Gamma\Big(n;\frac{n}{\theta}\Big)$$
thus immediately you get
$$\mathbb{E}[T]=\frac{n}{\frac{n}{\theta}}=\theta$$
$$\mathbb{V}[T]=\frac{n}{\Big(\frac{n}{\theta}\Big)^2}=\frac{\theta^2}{n}$$
Best Answer
Your question can be rephrased somewhat into 'does the expected value of the derivative of the log-likelihood always point towards the correct value?' (if it doesn't then you can turn it into a counter example of your hypothesis by optionally flipping the sign of $\theta$).
This won't be true in general, you could for instance come up with a distribution like:
$$ \sin(x + \theta)^2 / (1+x^2) $$
which is periodic in $\theta$, clearly the derivative at $\theta + 2\pi$ must be equal to the one at $\theta$, and clearly $E_\theta[S(\theta_1, X)] = E_{\theta+2\pi}[S(\theta_1, X)]$ so both can't point to the 'correct' value at the same time.
However having a probability distribution where several $\theta$ are equivalent is clearly not usually the case. So we need to require some kind of 'unimodality'. To see what kind we need it's instructive to pull the derivative outside of the expectation:
$$ \begin{align} \int f(z;\theta) \frac{\partial \log f(z;\theta_1)}{\partial \theta_1} \,\mathrm{d}z &=\frac{\partial}{\partial \theta_1} \int f(z;\theta) \log f(z;\theta_1) \,\mathrm{d}z \end{align} $$
so now we're looking at the (negative) derivative of the cross entropy (which is also the derivative of the Kullback-Leibler divergence), which is a measure of how close the distribution $f(z;\theta_1)$ is to the 'true' distribution $f(z;\theta)$. It's now clear why its derivative is usually pointing the right way, since we'd generally expect the model to get better if the parameters are closer to their actual values.
Anyway from this we can extract, a sufficient, but maybe not necessary condition, which is for the probability distribution to be log concave (i.e. $\log(f(z;\theta_1))$ is concave w.r.t. $\theta_1$), in that case it's expected value
$$ \int f(z;\theta) \log f(z;\theta_1) \,\mathrm{d}z $$
is also concave, which in particular means that it's derivative is monotonically non-increasing and is $0$ at $\theta_1 = \theta$, this is enough to conclude that $E_{\theta}[S(\theta_1, X)]$ is pointing towards $\theta$.
The exponential distribution and normal distribution are all log-concave w.r.t all their parameters, but keep in mind that most distributions are called log-concave when they're log-concave w.r.t to the value (here $z$) not the parameters (here $\theta_1$).