Define
$$
I(\lambda)\equiv E\left(\left[\frac{\partial\log l(\boldsymbol{X};\lambda)}{\partial\lambda}\right]^2\right)
$$
where $l(\boldsymbol{X};\lambda)$ denotes the joint likelihood:
$$
l(\boldsymbol{X};\lambda)=\prod_i\frac{1}{\lambda}\exp(-X_i/\lambda)=\frac{1}{\lambda^n}\exp\left(-\sum_iX_i/\lambda\right)\implies\log l(\boldsymbol{X};\lambda)=-\frac{S_n}{\lambda}-n\log(\lambda).
$$
Here $S_n\equiv\sum_iX_i$. With this, you can compute:
$$
\left[\frac{\partial\log l(\boldsymbol{X};\lambda)}{\partial\lambda}\right]^2=\left(\frac{S_n}{\lambda^2}-\frac{n}{\lambda}\right)^2=\frac{1}{\lambda^4}(S_n^2-2\lambda n S_n+\lambda^2n^2).
$$
Because of independent sampling,
$$
E\left[\left(\sum_iX_i\right)^2\right]=n E(X_i^2)+(n^2-n)E(X_i)^2=n(2\lambda^2)+(n^2-n)\lambda^2=(n^2+n)\lambda^2,\\
E\left(\sum_iX_i\right)=n E(X_i)=n\lambda.
$$
It follows that
$$
E\left(\left[\frac{\partial\log l(\boldsymbol{X};\lambda)}{\partial\lambda}\right]^2\right)=\frac{1}{\lambda^2}(n^2+n-2n^2+n^2)=\frac{n}{\lambda^2}\cdot
$$
This $I$ that we have computed is called the Fisher information for $\lambda$ for the joint likelihood $l(\boldsymbol{x};\lambda)$. Now the Cramer-Rao lower bound (aka the Frechet-Darmois-Cramer-Rao lower bound) for estimating $g(\lambda)=\lambda^2$ is given by:
$$
\frac{[g'(\lambda)]^2}{I(\lambda)}=\boxed{\frac{4\lambda^4}{n}}.
$$
This completes (a). For (b), note that $E(X_i^2)=2\lambda^2$ so $k=\frac{1}{2n}$ makes $W=k\sum_iX_i^2$ unbiased for $\theta=\lambda^2$. We compute:
$$
E(W^2)=k^2E\left[\left(\sum_iX_i^2\right)^2\right]=k^2(nE[X_i^4]+(n^2-n)E(X_i^2)^2)=\lambda^4\left(1+\frac{5}{n}\right)\cdot
$$
This implies
$$
\text{Var}(W)=E(W^2)-E(W)^2=\frac{5\lambda^4}{n}>\frac{4\lambda^4}{n}.
$$
The last inequality means $W$ is inefficient for $\lambda^2$.
Simplifications:
- (a) The Fisher information for $\lambda$ for the joint likelihood is $n$ times the Fisher information for $\lambda$ for the individual likelihood. The latter is easier to compute.
- (b) In fact, a general result implies that only affine transformations of $\lambda$ can be estimated efficiently. Because $\lambda\mapsto\lambda^2$ is not affine, you can conclude without any computation that $W$ with $k=1/(2n)$ is unbiased but inefficient for $\lambda^2$.
Edit: Included computation of the cramer-Rao-bound
Note that $Y|X \sim \mathcal{N}(\theta X;\sigma_\eta^2)$
We have that $I_{(X,Y)}(\theta) = I_{Y|X}(\theta) = \mathbb{E}_{\theta,x} [ \frac{\partial \ln p(y|x)}{\partial \theta}^2] = \mathbb{E}_{\theta,x}[\frac{X^2}{\sigma_\eta^2}] = \frac{\sigma_x^2}{\sigma_\eta^2} $
By the "chain rule" for fisher information.
Also, for the asymptotics you will not need to evaluate the density if you do it in the following way:
First, separate the terms:
$\frac{\sum_n x_n y_n}{\sum_n x_n^2} = \theta \frac{\sum_n x_n^2}{\sum_n x_n^2} + \frac{\sum_n x_n \eta_n}{\sum_n x_n^2} = \theta + \frac{\sum_n x_n \eta_n}{\sum_n x_n^2} $
Now check the asymptotics:
$ \sqrt{n}(\theta + \frac{\sum_n x_n \eta_n}{\sum_n x_n^2} - \theta) = \sqrt{n} ( \frac{\sum_n x_n \eta_n}{\sum_n x_n^2}) = \sqrt{n} ( \frac{\frac{\sum_n x_n \eta_n}{n}}{\frac{\sum_n x_n^2}{n}}) = \sqrt{n} \frac{\overline{X \eta}}{\overline{X^2}}$
Now, by the law of large numbers,
$ \overline{X^2} \overset{P}{\rightarrow} \sigma_x^2 $
And, by the central limit theorem (and the independence of $X$,$\eta$):
$\sqrt{n} \overline{X \eta} \overset{D}{\rightarrow} \mathcal{N}(0,\sigma_x^2 \sigma_\eta^2)$
Hence, by the Slutsky theorem/continuos mapping theorem, you get that:
$ \sqrt{n} \frac{\overline{X \eta}}{\overline{X^2}} \overset{D}{\rightarrow} \mathcal{N}(0,\frac{\sigma_\eta^2}{\sigma_x^2})$
which is the inverse of the Fisher information, as required.
Best Answer
The fisher information only has a precise meaning when you are dealing with a normally distributed value. In that case, the log likelihood function will be parabolic, and the fisher information will equal the curvature at the MLE. It turns out mathematically that the curvature of the log likelihood is the inverse of the variance of the associated normal random variable.
This is what guides the intuition surrounding fisher information, even though it will only hold approximately for non-normal variables (although, subject to some tehnical conditions, it will usually be asymptotically true.) Also, it serves as a good lower bound on the variance, see here.
Demonstration of the relationship between $I$ and $\sigma$ for gaussian likelihood for the mean
Let $f(X;\theta):= f(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
Take the logarithm of this:
$$\log(f) = -\log\sqrt{2\pi} - \log\sigma -\frac{(x-\mu)^2}{2\sigma^2}$$
We are looking at the likelihood for the mean ($\mu$) given a sample of data ($\mathbf{x}$) so we treat the above as a function of $\mu$. Given a sample $\mathbf{x}$, we can formulate the likelihood function for $\mu$.
$$L(\mu|\mathbf{x},\sigma) =-n\log\sqrt{2\pi} - n\log\sigma -\frac{1}{2\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu)^2$$
This function will be quadratic in $\mu$. Therefore, when we take $\frac{d^2L}{d\mu^2}$ we will get a constant value (due to it being quadratic). Specifically:
$$\frac{dL}{d\mu} =\frac{1}{\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu) \implies \frac{d^2L}{d\mu^2} = \frac{-n}{\sigma^2} = \textrm{constant} $$
Therefore,
$$-E_{\mu}\left(\frac{-n}{\sigma^2}\right) = \frac{n}{\sigma^2} = I(\mu)$$
Which is the Fisher Information about $\mu$, but:
$$ se(\hat \mu) = \frac{\sigma}{\sqrt{n}} \implies I(\mu) = \frac{1}{se(\mu)^2} = \frac{1}{\sigma^2_{\hat \mu}}$$
Therefore, the Fisher Information is the inverse of the variance of the MLE.