We first consider the relation:
$$n\delta{\lambda} = d\delta{\theta}\cos{\theta}$$
It's content is that the $n^{th}$ order maximum of a wavelength $\lambda + \delta{\lambda}$ is displaced from the corresponding maximum for a wavelength $\lambda$ by the angle $\delta{\theta}$, related to $\delta{\lambda}$ by the above equation.
Now, we can ask the question, "for what (minimum) value of $\delta{\lambda}$ can we clearly distinguish between the $n^{th}$ order maxima of $\lambda$ and $\lambda + \delta{\lambda}$?" The answer is that we can certainly do this (using the Rayleigh criterion) when the angular width $\delta{\phi}$ of the $n^{th}$ order maximum of light of wavelength $\lambda$, on either side of the maximum, is less than the separation of the maxima, $\delta{\theta}$ i.e. when
$$\delta{\phi} \le \delta{\theta}$$
or, the minimum value of $\delta{\lambda}$ that can be just resolved is one for which
$$\delta{\phi} = \delta{\theta}$$
Now, what about the spread $\delta{\phi}$ of the $n^{th}$ order maxmimum? When considering the grating as a series of a large number of slits $N \gg 1$ with separation $d\cos{\theta}$, you can see that a minimum occurs at an angle for which the contribution from a slit of position $m \le \frac{N}{2}$ is out of phase with that of position $m + \frac{N}{2}$, so that each of these pairs have a net zero contribution (note that we can always consider $N$ to be even, when it is large, by neglecting the contribution from one slit if necessary). Therefore, with the diffraction grating width of $W = Nd$, we see that the required criterion is that slits at a separation of $\frac{W\cos{\theta}}{2}$ are out of phase i.e. that (for the first minima from the centre)
$$\frac{2\pi}{\lambda} \frac{W\cos{\theta}}{2} \delta{\phi} = \pi$$
$$\delta{\phi} = \frac{\lambda}{W\cos{\theta}}$$
Note that a similar argument can be used for diffraction from a single, continuous wide slit (which is comparable to this case as both deal with a large number of point sources).
Thus, we now have, on equating $\delta{\phi}$ and $\delta{\theta}$,
$$\delta{\lambda} = \frac{\lambda d}{nW} = \frac{\lambda}{nN} \implies \frac{\lambda}{\delta{\lambda}} = nN$$
This result is independent of your methods of observation (aperture or otherwise) so long as you take care to observe all parallel rays inclined at an angle $\theta$, focused at a point.
Imaginary wavevectors are possible and, as ptomato's answer correctly points out betoken evanescence. I'd like to add a few words to his answer that might help clear up your confusion.
Imaginary wavenumbers always betoken Evanescence. Sometimes the vague term "nearfield" is used to connote something not propagating. Evanescence is NOT dissipative; this is in contrast with the situation where the refractive index, rather than the wavevector, has an imaginary part owing to ohmic material losses. Evanescence represents non propagating stores of energy, shuttling back and forth between neighboring parts of the structure in question; energy shuttles rather than being transported and the situation is very like the reactive power shuttling that happens in a parallel resonant LC tank circuit: reactive currents in a tank circuit can be huge to transport this energy, but the current trickling in and out of the tank system is tiny, theoretically nought. Indeed, evanescent regions are like little "water tanks": when the incident field first arrives, they fleetingly "drain" energy from the propagating field whilst they "fill up". Once they are topped up, steady state energy shuttling begins: we have a little "water tank" of stored energy confined to a narrow reason in the $z$ direction.
A prototype evanescence situation that is very like the one you are dealing with is the phenomenon to total internal reflexion and the associated Goos-Hänchen shift. Here, instead of seeing a short period grating, the field sees a different refractive index that is too low to support propagating waves at the field's frequency, but the situation and its consequences are almost the same. So, intuitively, in both situations, the field is turned back, and the solution of Maxwell's equations with the right boundary conditions shows this turning back in detail. A layer of shuttling energy is set up just beyond the totally internally reflecting layer or, in your case, the grating, leading to the plot in ptomato's answer.
I work out the evanescent mechanisms in detail for total internal reflexion in this answer here and a vector extension to this calculation here. Beware: in the second one, I use imaginary refractive indices, but this is only a device to make the Fresnel equations work - they are a way of talking about imaginary wavevectors.
Also see the detailed description in §1.5 "Reflexion and Refraction of a Plane Wave" in the seventh edition of Born and Wolf, "Principles of Optics".
AS for momentum: this is not relevant here as the reaction force from the grating supplies the change in momentum, but I think your title doesn't summarize your true question as well as it might.
Best Answer
As a first observation, there is no maximum order. There is however a maximum propagating order, for which $\sin \theta_s = 1$. Higher orders will not propagate but exponentially decay in the propagation direction.
The maximum propagating order propagates at 90$^{\circ}$ so parallel to the grating. It has $m=d/\lambda$ for perpendicular or $m=2d/\lambda$ for maximally oblique incidence.