Comparing the question with the actual proof from the referred book, some subtle but important aspects have been left out from the former:
1) This part of the proof is about existence of a solution to the likelihood equation $\frac{\partial l(\theta)}{\partial \theta}=0$, that converges to the true parameter, and not about "consistency of the mle estimator".
2) The probability of $S_n$ tends to $1$. Then, by necessity, a $\hat \theta: \hat \theta \in (\theta_0 -a, \theta_0 +a)$ will exist for the $\mathbf X$ that forms the elements of $S_n$.
Then the proof states that as a consequence,
$$S_n \subset \{ \mathbf{X}: | \hat{ \theta_{n} } \left( \mathbf{X} \right) -\theta_{0} | < a \} \cap \{ \mathbf{X}: l^{ \prime} \left( \hat{\theta_n} \left( \mathbf{X} \right) \right) =0 \}$$
What does the RHS-set intersection describe?
It describes a data set for which a) $\hat \theta$ is a solution to the likelihood equation (2nd set) and
b) it is less than $a$-away from the true parameter $\theta_0$ (1st set). And it asserts that the data set of $S_n$ is a subset of this data set of the intersection.
And indeed it is, since $\hat \theta$ may satisfy the two conditions (being a solution to the likelihood equation and being less than $a$-away from the true parameter) for a data set larger than the data set that forms $S_n$ and which is characterized by a condition related to the value of the likelihood at the true parameter (unrelated to $\hat \theta$).
The proof then goes on to show that these imply that $\hat \theta$ will be less than $a$-away from the true parameter in probability, and then, that if $\hat \theta$ is unique, and so coincides with the MLE estimator, the latter is consistent.
In a reply concerning interpreting probability densities I have argued for the merits of including the differential terms ($dx$ and $dy$ in this case) in the PDF, so let's write
$$f(x,y; \theta,\lambda) = \frac{1}{\lambda x \sqrt{2\pi}} \exp\left(-\frac{(y-x\theta)^2}{2x^2} - \frac{x}{\lambda}\right)\,dx\,dy.$$
In (d), attention is focused on $\hat{\theta}$ which appears to be a multiple of $\sum_i y_i/x_i$ where each $(x_i,y_i)$ is an independent realization of the bivariate random variable described by $f$. To tackle this, let's consider the distribution of $Z=Y/X$--and then we will later have to sum a sequence of independent versions of this variable in order to determine the distribution of $\hat\theta$.
In contemplating the change of variable $(X,Y)\to (X,Z) = (X,Y/X)$, we recognize that the non-negativity of $X$ assures this induces an order-preserving, one-to-one correspondence between $Y$ and $Z$ for each $X$. Therefore all we need to do is change the variable in the integrand, writing $y=x z$:
$$f(x,x z; \theta,\lambda) = \frac{1}{\lambda x \sqrt{2\pi}} \exp\left(-\frac{(x z-x\theta)^2}{2x^2} - \frac{x}{\lambda}\right)\,dx\,d(x z).$$
From the product rule $d(x z) = z dx + x dz$, and understanding the combination $dx\, d(x z)$ in the sense of differential forms $dx \wedge d(x z)$, we mechanically obtain
$$dx\, d(x z) = dx \wedge d(x z) = dx \wedge (z dx + x dz) = z dx \wedge dx + x dx \wedge dz = x\,dx\,dz.$$
(This is the easy way to obtain the Jacobian determinant of the transformation.)
Therefore the distribution of $(X,Z)$ has PDF
$$\eqalign{
g(x,z;\theta,\lambda) = f(x, x z; \theta, \lambda) &= \frac{1}{\lambda x \sqrt{2\pi}} \exp\left(-\frac{(x z-x\theta)^2}{2x^2} - \frac{x}{\lambda}\right)\,x dx\,dz \\
&= \frac{1}{\lambda} \exp\left( - \frac{x}{\lambda}\right)\frac{1}{\sqrt{2\pi}}\,dx\ \exp\left(-\frac{( z-\theta)^2}{2}\right) \,dz.
}$$
Because this has separated into a product of a PDF for $X$ and a PDF for $Z$, we see without any further effort that (1) $X$ and $Z$ are independent and (2) $Z$ has a Normal$(\theta,1)$ distribution.
Finding the joint distribution of $(\hat{\lambda}, \hat\theta)$, which are easily expressed in terms of $X$ and $Z$, is now straightforward.
Best Answer
The log likelihood funciton is $$l(\theta_0)=\sum_{i=1}^n log(f(x_i)) \tag{1}$$ Since $\hat{\theta}$ is a solution of the maximum of log likelihood funtion $l(\theta_0)$ we know that $l'(\hat{\theta})=0$.
Next we do a Taylor expanson of $l'(\hat{\theta})$ around $\theta_0$
$$l'(\hat{\theta})=l'(\theta_0)+\frac{l''(\theta_0)}{1!}(\hat{\theta}-\theta_0)+\frac{l'''(\theta)}{2!}(\hat{\theta}-\theta_0)^2$$
Since $l'(\hat{\theta})=0$, we do some rearrangements here,
$$-l''(\theta_0)(\hat{\theta}-\theta_0)-\frac{l'''(\theta_0)}{2}(\hat{\theta}-\theta_0)^2=l'(\theta_0)$$ $$(\hat{\theta}-\theta_0)=\frac{l'(\theta_0)}{-l''(\theta_0)-\frac{l'''(\theta)}{2}(\hat{\theta}-\theta_0)}$$
We multiply $\sqrt{n}$ at both sides we get
$$\sqrt{n}(\hat{\theta}-\theta_0)=\frac{\frac{1}{\sqrt{n}}l'(\theta_0)}{-\frac{1}{n}l''(\theta_0)-\frac{l'''(\theta)}{2n}(\hat{\theta}-\theta_0)} \tag{2}$$
Next we need to show that $\frac{1}{\sqrt{n}} l'(\theta_0)$ has a $N(0,I(\theta_0))$ distribution.
From $(1)$ we get
$$l'(\theta_0)=\sum_{i=1}^n\frac{\partial log(f(x_i))}{\partial \theta_0}$$
We multiply $\frac{1}{\sqrt{n}}$ at both side.
$$\frac{1}{\sqrt{n}}l'(\theta_0)=\frac{1}{\sqrt{n}}\sum_{i=1}^n\frac{\partial log(f(x_i))}{\partial \theta_0} \tag{3}$$
Now we use CLT for the right hand side of $(3)$
We treat $\frac{\partial log(f(x_i))}{\partial \theta_0}$ as a random variable here.
And we can show $E(\frac{\partial log(f(x_i))}{\partial \theta_0})=0$ by following procedures:
$$1=\int_{-\infty}^{\infty}f(x)dx$$ take derivative of both sides
$$0=\int_{-\infty}^{\infty}\frac{\partial f(x)}{\partial \theta_0}dx=\int_{-\infty}^{\infty}\frac{\partial f(x)}{\partial \theta_0 f(x)}f(x)dx=\int_{-\infty}^{\infty}\frac{\partial log(f(x))}{\partial \theta_0}f(x)dx$$
Which shows that $E(\frac{\partial log(f(x_i))}{\partial \theta_0})=0$
We can show the variance of $\frac{\partial log(f(x_i))}{\partial \theta_0}$ is $I(\theta_0)$
Therefore, $$\frac{1}{\sqrt{n}}l'(\theta_0)\sim N(0,I(\theta_0))$$
We also can show that $-\frac{1}{n}l''(\theta_0)=I(\theta_0)$. I will not do detailed derivations here.
We also ignore the $-\frac{l'''(\theta)}{2}(\hat{\theta}-\theta_0)$ part in $(2)$
Now we wrap up $(2)$
$$\sqrt{n}(\hat{\theta}-\theta_0) \sim \frac{N(0,I(\theta_0))}{I(\theta_0)}=N(0,\frac{1}{I(\theta_0)})$$
By some rearrangements, you can see $\hat{\theta}$ also normally distributed.