While your initial formulas and computations look correct, I am getting a slightly different result for the CRLB of unbiased estimators for $\theta^2$. It might be a little tricky to get the derivatives with respect to terms that do not appear in the likelihood so let me show you a shortcut.
The basic ingredient of the CRLB is the Fisher information of course. Assume then that we have the Fisher Information for a parameter $\theta$ and we wish to derive the Fisher Information for a function of $\theta$, say $g(\theta)$. In your notation, we wish to compute
$$ I \left( g\left(\theta \right) \right) = E_{\theta}\left\{ \left[ \frac{\partial}{\partial g(\theta)} \log f\left(\mathbf{x};\theta\right) \right]^2 \right\} $$
would you agree? But notice what happens when we apply the chain rule along with the definition of the derivative of the inverse function,
\begin{align} I \left( g\left(\theta \right) \right) = E_{\theta}\left\{ \left[ \frac{\partial}{\partial g(\theta)} \log f\left(\mathbf{x};\theta\right) \right]^2 \right\} & = E_{\theta} \left\{ \left[ \frac{\partial}{\partial \theta} \log f\left(\mathbf{x};\theta\right) \frac{\partial \theta}{\partial g\left( \theta \right)} \right]^2 \right\} \\ & = E_{\theta} \left\{ \left[ \frac{\partial}{\partial \theta} \log f\left(\mathbf{x};\theta\right) \frac{1}{ g ^{\prime}\left( \theta \right)} \right]^2 \right\} \\ & = \frac{I(\theta)}{\left[g ^{\prime}\left( \theta \right) \right]^2} \end{align}
which simplifies matters. With this definition, if we are looking for the Information for $\theta^2$, then since $g^{\prime} (\theta) = 2\theta$ and $I(\theta) = \frac{1}{\theta^2}$, we see that
$$I\left(\theta^2 \right) = \frac{1}{\theta^2} \frac{1}{4\theta^2} $$
Multiply by $n$ and take the reciprocal of this to arrive at your bound, $\frac{4 \theta^4}{n}$.
There are several instances of (2), namely the case where the variance of a UMVU estimator exceeds the Cramer-Rao lower bound. Here are some common examples:
- Estimation of $e^{-\theta}$ when $X_1,\ldots,X_n$ are i.i.d $\mathsf{Poisson}(\theta)$:
Consider the case $n=1$ separately. Here we are to estimate the parametric function $e^{-\theta}=\delta$ (say) based on $X\sim\mathsf{Poisson}(\theta) $.
Suppose $T(X)$ is unbiased for $\delta$.
Therefore, $$E_{\theta}[T(X)]=\delta\quad,\forall\,\theta$$
Or, $$\sum_{j=0}^\infty T(j)\frac{\delta(\ln (\frac{1}{\delta}))^j}{j!}=\delta\quad,\forall\,\theta$$
That is, $$T(0)\delta+T(1)\delta\cdot\ln\left(\frac{1}{\delta}\right)+\cdots=\delta\quad,\forall\,\theta$$
So we have the unique unbiased estimator (hence also UMVUE) of $\delta(\theta)$:
$$T(X)=\begin{cases}1&,\text{ if }X=0 \\ 0&,\text{ otherwise }\end{cases}$$
Clearly,
\begin{align}
\operatorname{Var}_{\theta}(T(X))&=P_{\theta}(X=0)(1-P_{\theta}(X=0))
\\&=e^{-\theta}(1-e^{-\theta})
\end{align}
The Cramer-Rao bound for $\delta$ is $$\text{CRLB}(\delta)=\frac{\left(\frac{d}{d\theta}\delta(\theta)\right)^2}{I(\theta)}\,,$$
where $I(\theta)=E_{\theta}\left[\frac{\partial}{\partial\theta}\ln f_{\theta}(X)\right]^2=\frac1{\theta}$ is the Fisher information, $f_{\theta}$ being the pmf of $X$.
This eventually reduces to $$\text{CRLB}(\delta)=\theta e^{-2\theta}$$
Now take the ratio of variance of $T$ and the Cramer-Rao bound:
\begin{align}
\frac{\operatorname{Var}_{\theta}(T(X))}{\text{CRLB}(\delta)}&=\frac{e^{-\theta}(1-e^{-\theta})}{\theta e^{-2\theta}}
\\&=\frac{e^{\theta}-1}{\theta}
\\&=\frac{1}{\theta}\left[\left(1+\theta+\frac{\theta^2}{2}+\cdots\right)-1\right]
\\&=1+\frac{\theta}{2}+\cdots
\\&>1
\end{align}
With exactly same calculation this conclusion holds here if there is a sample of $n$ observations with $n>1$. In this case the UMVUE of $\delta$ is $\left(1-\frac1n\right)^{\sum_{i=1}^n X_i}$ with variance $e^{-2\theta}(e^{\theta/n}-1)$.
- Estimation of $\theta$ when $X_1,\ldots,X_n$ ( $n>1$) are i.i.d $\mathsf{Exp}$ with mean $1/\theta$:
Here UMVUE of $\theta$ is $\hat\theta=\frac{n-1}{\sum_{i=1}^n X_i}$, as shown here.
Using the Gamma distribution of $\sum\limits_{i=1}^n X_i$, a straightforward calculation shows $$\operatorname{Var}_{\theta}(\hat\theta)=\frac{\theta^2}{n-2}>\frac{\theta^2}{n}=\text{CRLB}(\theta)\quad,\,n>2$$
Since several distributions can be transformed to this exponential distribution, this example in fact generates many more examples.
- Estimation of $\theta^2$ when $X_1,\ldots,X_n$ are i.i.d $N(\theta,1)$:
The UMVUE of $\theta^2$ is $\overline X^2-\frac1n$ where $\overline X$ is sample mean. Among other drawbacks, this estimator can be shown to be not attaining the lower bound. See page 4 of this note for details.
Best Answer
It's difficult to identify the correct level of rigor for an answer. I added the "regularity" condition to your question, since there are unbiased estimators that beat the Cramer-Rao bound.
Regular exponential families have score functions for parameters that take this linear form. So we have some idea that this notation is not arbitrary; it comes from estimating "usual" things that produce reasonable outcomes.
As you know, obtaining maxima of a functional (like a likelihood or log-likelihood) involves finding the root of its derivative if it's smooth and the root is continuous. For regular exponential families, the linear form means the solution is obtainable in closed form.
When the score has that form, its expectation is 0 and its variance is an information matrix. It was a revelation to me to think of a score as a random variable, but indeed it's a function of $X$. Using the Cauchy–Schwartz inequality, you can show that any biased estimator is the sum of an unbiased estimator and the bias of the original estimator. Therefore the variance is greater in the sum of these two functions.