(I am going to fully answer this home study, because it appears that the OP is rather away from any path that could be steered in the right direction by simple hints).
It can be shown that the MLE for $\theta$ is the minimum-order statistic,
$$\hat \theta_n = X_{1:n}$$
In fact, one can derive that $\hat \theta_n$ follows itself a Pareto distribution with scale parameter $\theta$ and (since here the original shape parameter is $1$), with shape parameter $n$, i.e. with distribution function
$$F(\hat \theta_n) = 1- \left(\frac {\hat \theta_n}{\theta}\right)^{-n}$$
So in this case we have available the finite sample distribution of the ML estimator.
Still, the OP is asked to consider asymptotics for the function $\sqrt {n} (\hat \theta _n - \theta)$.
Subtracting $\theta$ does not make the distribution centered at zero or anything like that, since we always have
$$\hat \theta_n = X_{1:n} \geq \min {X}= \theta$$
By this observation alone we know that even if $\sqrt {n} (\hat \theta _n - \theta)$ has a limiting distribution, it won't be the normal.
Now, $(\hat \theta_n - \theta)$ has an exact Lomax distribution, with the same parameters as the distribution of $\hat \theta_n$. The Variance of the Lomax distribution is the same as the variance of the Pareto distribution so in our case
$$\text{Var}[(\hat \theta_n - \theta)] = \frac {\theta^2 n}{(n-1)^2(n-2)}$$
Note that the denominator has leading term $n^3$. So to stabilize this variance as $n$ goes to infinity, so that it neither explodes nor goes to zero, one needs to multiply the variable by $n$, not $\sqrt {n}$.
$$\text{Var}[n(\hat \theta_n - \theta)] = \frac {\theta^2 n^3}{(n-1)^2(n-2)} \to \theta^2$$
So $\sqrt {n} (\hat \theta _n - \theta)$ goes to the constant zero (convergence of the estimator is faster than usual and multiplying by $\sqrt {n}$ "is not enough" to maintain a distribution, we need to multiply by $n$). Standardized by its variance (which goes to zero), it explodes.
Can we obtain the limiting distribution of $Z_n = n(\hat \theta_n - \theta)$?
We have for the distribution function
$$F_{Z_n}(z) = \text{Prob}(Z_n \leq z) = \text{Prob}(n(\hat \theta_n - \theta) \leq z) = \text{Prob}\left(\hat \theta_n \leq \theta + \frac {z}{n} \right)$$
$$ \implies F_{Z_n}(z) = 1- \left(\frac {\theta + \frac {z}{n}}{\theta}\right)^{-n} = 1- \left(1+ \frac {(z/\theta)}{n}\right)^{-n}$$
$$\implies \lim_{n \to \infty}F_{Z_n}(z) = 1 - \exp\{-z/\theta\}$$
which is the CDF of the Exponential distribution, with expected value $\theta$ and variance $\theta^2$.
If $Y \sim \mathcal N(X\beta, \sigma^2 I)$ then the log likelihood is
$$
l(\beta, \sigma^2|y) = -\frac n2 \log (2\pi) - \frac n2 \log(\sigma^2) - \frac 1{2\sigma^2}||y-X\beta||^2
$$
and assuming non-stochastic and full rank predictors. From this we find that
$$
\frac{\partial l}{\partial \sigma^2} = 0 \implies \hat \sigma^2 = \frac 1n ||Y-X\hat \beta||^2.
$$
We want to know when the MLE $\hat \sigma^2$ is equal to the sample variance of the residuals
$$
\tilde \sigma^2 = \frac{1}{n}\sum_{i=1}^n (e_i - \bar e)^2
$$
where $e = Y - \hat Y$ are the residuals. We know
$$
n\tilde \sigma^2 = e^Te - n\bar e^2
$$
while
$$
n\hat \sigma^2 = e^Te
$$
so this tells us the two are equal when the constant vector $\mathbf 1$ is in the column space of $X$, which means $\bar e = 0$. If that is not the case then the two won't be exactly equal.
I'm leaving the rest of my answer here but as I understand OP's question better I don't think it applies.
Note
$$
||Y - X\hat \beta||^2 = (Y - HY)^T(Y - HY) = Y^T(I-H)Y
$$
where $H = X(X^TX)^{-1}X^T$. This means that we have a Gaussian quadratic form, so
$$
Var\left(Y^T (I-H)Y\right) = 2\sigma^4 \text{tr}(I-H) + 4\sigma^2 \beta^T X^T(I-H)X\beta.
$$
$X^T(I-H)X = X^TX - X^TX(X^TX)^{-1}X^TX = 0$ and $\text{tr}(I-H) = n-p$ so we have
$$
Var(\hat \sigma^2) = \frac{2\sigma^4(n-p)}{n^2}.
$$
The standard estimate of $\sigma^2$ is probably $\tilde \sigma^2 := \frac{1}{n-p}||Y - X\hat \beta||^2$ (which is unbiased, as we can see by computing $E\left(Y^T(I-H)Y\right)$) so
$$
Var(\tilde \sigma^2) = \frac{2\sigma^4}{n-p}.
$$
I'm not entirely sure what more than this you're looking for, as technically what you asked for was the variance of the residuals which is
$$
Var(e) = Var\left((I-H)Y\right) =\sigma^2 (I-H)
$$
but I don't think that's what you mean. Or if that is what you mean, then we can directly compare this to $Var(\varepsilon) = \sigma^2 I$ and the difference comes down to $\sigma^2 H$.
Best Answer
The method you mention in 2 is similar to the way the variance is usually estimated--through the observed information. The problem with the Fisher Information is that it depends upon an unknown parameter, $\theta$. Using the plug-in estimator defined in the link I specified will allow you to compute an estimate of the Fisher Information. The second Wikipedia quote you mention is equivalent to (2.).