Sure, you can generalize the mean free path to a different number of dimensions. But first, let's understand the derivation in 3D.
A particle will collide with any other particle that it comes within a distance $d$ of. So if it moves a length $\ell$, it will collide if there is another particle in a cylindrical volume $\pi d^2 \ell$. Call this the volume swept out by the particle.
Imagine a region of volume $V$ filled with gas, and let $\bar{v}$ be the mean speed of the particles of that gas in the rest frame of the entire volume. Then, in the reference frame of one of the gas particles, the other particles have an average speed of $\sqrt{2}\bar{v}$. (See this derivation of the factor of $\sqrt{2}$, and note the presence of $\bar{v}$ instead of $v_\text{rms}$ - see this question for details on that)
In the reference frame of that one particle, the total volume swept out by all the other particles in a time $\Delta t$ is $N\pi d^2 \sqrt{2}\bar{v}\Delta t$, where $N$ is the number of particles.
The probability that the chosen at-rest particle experiences an interaction during $\Delta t$ is equal to the fraction of the total volume ($V$) swept out, namely
$$P(\text{int.}) = \frac{N\pi d^2 \sqrt{2}\bar{v}\Delta t}{V}$$
To calculate the mean free time $\tau$, I technically should find the probability distribution of interaction times and compute its mean. But to make the calculation simple, I'll take advantage of a handy coincidence (which I am not justifying here): $\tau$ happens to be equal to the time after which this probability would reach 1 if it increased at a fixed rate over time. So I can replace $\Delta t$ with $\tau$ and $P(\text{int.})$ with $1$, and I get
$$\tau = \frac{V}{N\pi d^2 \sqrt{2}\bar{v}}$$
The mean free path is then given by
$$\lambda = \bar{v}\tau = \frac{V}{\sqrt{2}N\pi d^2}$$
If you assume the particles follow the ideal gas law, you can replace $\frac{V}{N} = \frac{k_B T}{p}$ and recover the formula from Wikipedia.
To modify this to 2D, we just need to change step 1 and follow the argument from there.
Instead of a particle sweeping out a volume $\pi d^2\ell$, it sweeps out an area $2d\ell$.
No change
The total area swept out by all $N$ particles is then $Nd\sqrt{8}\bar{v}\Delta t$
The probability is equal to the fraction of the total area,
$$P(\text{int.}) = \frac{Nd \sqrt{8}\bar{v}\Delta t}{A}$$
The mean free time is again the time after which this probability would reach 1,
$$\tau = \frac{A}{Nd\sqrt{8}\bar{v}}$$
and the mean free path is
$$\lambda = \bar{v}\tau = \frac{A}{\sqrt{8}Nd}$$
Bear in mind that in 2D, the ideal gas law would be modified; it would have area instead of volume, and you would have to use a 2D version of pressure, which would be force per unit length, not per unit area. It may be easier to just work with $\lambda = \frac{A}{\sqrt{8}Nd}$ directly, since you can just set $A = L^2$ if you know the box side length.
how is momentum being transported throughout the gas - if this were the case, surely layers would be changing their horizontal speeds and so we wouldn't have this steady state.
The gas is in a steady state, and momentum is being transferred. This means that a steady horizontal force on the top plate is required, and an equal and opposite force on the bottom plate. Each layer of gas receives momentum from above, and transfers the same momentum to the layer below it, so it undergoes no acceleration.
$F/A$ represents the change in momentum in unit time of unit area of this layer
The layer has a steady speed, so it has no change in momentum per unit time. $F/A$ is the momentum transferred from one layer to the one below it.
However how does this equal the x direction momentum of the molecules that passes through unit area in unit time of this layer? They seem to be unrelated.
At the microscopic level, the only way one layer can impart momentum to the adjacent layer is by swapping molecules. $F/A$ is the net momentum carried across the imaginary boundary from the upper to the lower layer.
why don't we just find the total momentum rather than the amount by which it exceeds the momentum of that layer?
$F/A$ is the net momentum carried across the imaginary boundary. For every molecule passing downwards through the boundary, there is on average a molecule passing upwards, carrying on average slightly less $x$ momentum, characteristic of the region it came from.
As you can see (I hope), the derivation was correct - all I have done is tweak the words. There is often a tendency in textbooks to assume that if a quantity obviously has the correct dimensions, then it must be what is wanted. This can lead to a casual use of words when describing the logic. hth
I notice that you also have solid plates at the top and bottom of your diagram, which require that you use the no-slip boundary condition. This is always good when the mean free path is small, but for an accurate result for arbitrary mean free path you would need to know something about the surface, and how it interacts with incident molecules.
Best Answer
The gas as a whole is moving in the $x$ direction everywhere, but at different heights it is moving different speeds, so we have $u_x(y)$.
Even though the gas as a whole has velocity in the $x$ direction only, individual molecules have motion in the $y$ direction (and $z$-direction, but that doesn't matter here).
So let's suppose that at $u_x(y_0) = 2 m/s$. And let's say that $u_x$ is an increasing function of $y$. Then the molecules from higher values of $y$ will sometimes drift down past $y_0$, and they will be moving faster than $2 m/s$ in the x-direction, on average. This is because they come from higher $y$ values, where $u_x$ is higher on average.
Will they be going $2.1 m/s$ or $2.01 m/s$ in the $x$-direction on average? In your language, is $\langle u_x \rangle$ equal to $.1 m/s$ or $.01 m/s$? That depends. If one particular molecule drifts down to $y = y_0$ from $y = y_0 + \Delta y$, then on average its $x$ velocity should be $u_x(y_0 + \Delta y) \approx u_x(y_0) + \frac{\partial u_x}{\partial y}(y_0) \Delta y$.
The document you were reading was simply approximating $\Delta y \sim \lambda$, which is not exact. If you do it carefully, I believe you should actually get a factor of $1/3$, not $1/2$ as stated in the document you linked.
This is not what it means at all. It says that the excess $x$ velocity that a molecule has, on average, above $u_x(y_0)$, increases linearly in $y$. This has nothing to do with particles' motion in the $y$ direction.